ScalarR/Some Ideas
This article is a stub. You can help RoboWiki by expanding it. |
I’m not going into full details for now, but rather some crucial ideas that drives ScalarR and have been proved to work well. Instead of encouraging people to follow the ScalarR way, I believe that sharing initial motivations could inspire even more innovations, and push the bar of top bots even further.
Contents
Targeting
Anti-General/Random
By anti-general movement / anti-random movement, a basic assumption is that the opponent has a finite & fixed set of movements. Once the specific movement is determined, data points are iid. Then either some offline learning, online learning or a mix of both works pretty well. The only trick is to make targeting space (e.g. guess factor & corresponding features) and physical space (robocode physics) consistent. To be more specific, maintain accurate 1-to-1 correspondence between both spaces.
Anti-Surfing
The problem of traditional targeting against surfers is that all observations are biased (since data points aren't iid at all). Using hit/collide waves? Biased. Using tick waves? Biased. Using historical waves only? Biased. Using recent waves only? Biased. Anyway being biased isn't a bad thing, a lot of guns work by adding bias manually, and it works. ScalarR isn't doing anything truly special right now.
Anti-Flattening
Anti-Flattening is a different story than Anti-Surfing. To hit surfers, you need recency. To hit flatteners, you need something to exploit weakness (e.g. repeated patterns), otherwise random targeting works sufficiently well. Since flattening is generally used with traditional surfing, things in anti-surfing generally mix-in. ScalarR isn't doing anything truly special right now.
Surfing
Surfing is another story, a complete different story. But if I was asked to give Wave Surfing another name, I would simply choose Minimum Risk Movement, redefined. Back to the simple story where agent (robot) was facing an environment (mainly walls and other bots), which action (mainly movement) will you choose to maximize objective, e.g. survival? This is not a simple question, but many non-simple questions receive some simple solution — why not simply simulate what's going next, and let the simulation result decide? Then many techniques are used, e.g. Precise Intersection, Bullet Shadow/Correct, everything is making the simulation closer to what's truly happening, by making it more precise, and more exact. But there's one thing left, something is unknown either in melee or in 1v1, and that thing is what separates a top bot from the others — danger estimation.
Instead of inventing and experimenting some novel approach like in targeting, the best way that worked is to test against real opponents, say the entire rumble, since danger estimation is essentially fitting the targeting of existing robots in the rumble. Inventing something fancy doesn't help much, but waching a lot of battles, reading a lot of targeting code does work well. And once most problem bots are solved, you got some fairly good result, done.
Basic Movement
Most robots are calling setAhead & setTurn, and leave the reset of movement to robocode. ScalarR is directly controlling velocity & turn rate instead, and is encapsulating details into movement drivers (e.g. goto driver). By doing this, ScalarR can use faster methods to derivate velocity & turn rate, making movement prediction easier, faster and preciser. Wall smoothing is also encapsulated into drivers, so that I can directly control full details near wall, avoid losing score occasionally.
Energy Management
Energy Management is not simply Power Selection. By doing energy management, you are solving the same problem as in surfing — what's the best action (now bullet power) that maximizes objective, e.g. survival. This involves precise simulation sometimes but simulations are much harder here given the nature of random process. And most observations are biased, so do most simulations. Some very simple strategy possibly in one line can give you astonishing result, but coming up with it may require days of watching battles, and even more days of failing attempts. And past tuning of energy management gets outdated soon after mutating surf & targeting. So this is generally the last thing I do. And keeping it as simple as possible until the very last time does save you a lot of time.