Difference between revisions of "ScalarR/Some Ideas"

From Robowiki
Jump to navigation Jump to search
m (minor fix)
m (rewrite ==targeting==)
Line 1: Line 1:
 
{{Stub}}
 
{{Stub}}
  
I’m not going into full details for now, but rather some crucial ideas that drives [[ScalarR]] and have been proved to work well. Instead of encouraging people to follow the [[ScalarR]] way, I believe that sharing initial motivations could inspire even more innovations, and push the bar of top bots even further.  
+
I’m not going into full details for now, but rather some crucial ideas that drives [[ScalarR]] and have been proved to work well. Instead of encouraging people to follow the [[ScalarR]] way, I believe that sharing initial motivations could inspire even more innovations, and push the bar of top bots even further.
  
 
== Targeting ==
 
== Targeting ==
Targeting is essentially modeling the physical thing in some other space, e.g. [[GuessFactor]] space. Everything happened in physical space have some projection in “targeting space”, and vice versa. By designing features, [[wikipedia:Kernel Density Estimation|Kernel Density Estimation]], lateral direction, etc., you are essentially building some relationships between the two spaces. So it’s best that each point in physical space that matters for targeting have some unique representation in targeting space, and points in targeting space can be mapped to physical space precisely, or exactly. In simple words, everything blends in harmony. This is the most important thing in targeting, even more importantly than things like “KNN weights” etc. By keeping this in mind, you easily filter out targeting models that are essentially bad, leaving only the most possible ones.  
+
 
 +
== Anti-General/Random ==
 +
By anti-general movement / anti-random movement, a basic assumption is that the opponent has a finite & fixed set of movements. Once the specific movement is determined, data points are iid. Then either some offline learning, online learning or a mix of both works pretty well. The only trick is to make targeting space (e.g. guess factor & corresponding features) and physical space (robocode physics) consistent. To be more specific, maintain accurate 1-to-1 correspondence between both spaces.
 +
 
 +
=== Anti-Surfing ===
 +
The problem of traditional targeting against surfers is that all observations are biased (since data points aren't iid at all). Using hit/collide waves? Biased. Using tick waves? Biased. Using historical waves only? Biased. Using recent waves only? Biased. Anyway being biased isn't a bad thing, a lot of guns work by adding bias manually, and it works. ScalarR isn't doing anything truly special right now.
 +
 
 +
=== Anti-Flattening ==
 +
Anti-Flattening is a different story than Anti-Surfing. To hit surfers, you need recency. To hit flatteners, you need something to exploit weakness (e.g. repeated patterns), otherwise random targeting works sufficiently well. Since flattening is generally used with traditional surfing, things in anti-surfing generally mix-in. ScalarR isn't doing anything truly special right now.  
  
 
== Surfing ==
 
== Surfing ==
Line 10: Line 18:
  
 
Instead of inventing and experimenting some novel approach like in targeting, the best way that worked is to test against real opponents, say the entire rumble, since danger estimation is essentially fitting the targeting of existing robots in the rumble. Inventing something fancy doesn't help much, but waching a lot of battles, reading a lot of targeting code does work well. And once most problem bots are solved, you got some fairly good result, done.
 
Instead of inventing and experimenting some novel approach like in targeting, the best way that worked is to test against real opponents, say the entire rumble, since danger estimation is essentially fitting the targeting of existing robots in the rumble. Inventing something fancy doesn't help much, but waching a lot of battles, reading a lot of targeting code does work well. And once most problem bots are solved, you got some fairly good result, done.
 +
 +
=== Basic Movement ===
 +
Most robots are calling setAhead & setTurn, and leave the reset of movement to robocode. ScalarR is directly controlling velocity & turn rate instead, and is encapsulating details into movement drivers (e.g. goto driver). By doing this, ScalarR can use faster methods to derivate velocity & turn rate, making movement prediction easier, faster and preciser. Wall smoothing is also encapsulated into drivers, so that I can directly control full details near wall, avoid losing score occasionally.
  
 
== Energy Management ==
 
== Energy Management ==
 
Energy Management is not simply Power Selection. By doing energy management, you are solving the same problem as in surfing — what's the best action (now bullet power) that maximizes objective, e.g. survival. This involves precise simulation sometimes but simulations are much harder here given the nature of random process.  And most observations are biased, so do most simulations. Some very simple strategy possibly in one line can give you astonishing result, but coming up with it may require days of watching battles, and even more days of failing attempts. And past tuning of energy management gets outdated soon after mutating surf & targeting. So this is generally the last thing I do. And keeping it as simple as possible until the very last time does save you a lot of time.
 
Energy Management is not simply Power Selection. By doing energy management, you are solving the same problem as in surfing — what's the best action (now bullet power) that maximizes objective, e.g. survival. This involves precise simulation sometimes but simulations are much harder here given the nature of random process.  And most observations are biased, so do most simulations. Some very simple strategy possibly in one line can give you astonishing result, but coming up with it may require days of watching battles, and even more days of failing attempts. And past tuning of energy management gets outdated soon after mutating surf & targeting. So this is generally the last thing I do. And keeping it as simple as possible until the very last time does save you a lot of time.

Revision as of 11:59, 28 June 2021

This article is a stub. You can help RoboWiki by expanding it.

I’m not going into full details for now, but rather some crucial ideas that drives ScalarR and have been proved to work well. Instead of encouraging people to follow the ScalarR way, I believe that sharing initial motivations could inspire even more innovations, and push the bar of top bots even further.

Targeting

Anti-General/Random

By anti-general movement / anti-random movement, a basic assumption is that the opponent has a finite & fixed set of movements. Once the specific movement is determined, data points are iid. Then either some offline learning, online learning or a mix of both works pretty well. The only trick is to make targeting space (e.g. guess factor & corresponding features) and physical space (robocode physics) consistent. To be more specific, maintain accurate 1-to-1 correspondence between both spaces.

Anti-Surfing

The problem of traditional targeting against surfers is that all observations are biased (since data points aren't iid at all). Using hit/collide waves? Biased. Using tick waves? Biased. Using historical waves only? Biased. Using recent waves only? Biased. Anyway being biased isn't a bad thing, a lot of guns work by adding bias manually, and it works. ScalarR isn't doing anything truly special right now.

= Anti-Flattening

Anti-Flattening is a different story than Anti-Surfing. To hit surfers, you need recency. To hit flatteners, you need something to exploit weakness (e.g. repeated patterns), otherwise random targeting works sufficiently well. Since flattening is generally used with traditional surfing, things in anti-surfing generally mix-in. ScalarR isn't doing anything truly special right now.

Surfing

Surfing is another story, a complete different story. But if I was asked to give Wave Surfing another name, I would simply choose Minimum Risk Movement, redefined. Back to the simple story where agent (robot) was facing an environment (mainly walls and other bots), which action (mainly movement) will you choose to maximize objective, e.g. survival? This is not a simple question, but many non-simple questions receive some simple solution — why not simply simulate what's going next, and let the simulation result decide? Then many techniques are used, e.g. Precise Intersection, Bullet Shadow/Correct, everything is making the simulation closer to what's truly happening, by making it more precise, and more exact. But there's one thing left, something is unknown either in melee or in 1v1, and that thing is what separates a top bot from the others — danger estimation.

Instead of inventing and experimenting some novel approach like in targeting, the best way that worked is to test against real opponents, say the entire rumble, since danger estimation is essentially fitting the targeting of existing robots in the rumble. Inventing something fancy doesn't help much, but waching a lot of battles, reading a lot of targeting code does work well. And once most problem bots are solved, you got some fairly good result, done.

Basic Movement

Most robots are calling setAhead & setTurn, and leave the reset of movement to robocode. ScalarR is directly controlling velocity & turn rate instead, and is encapsulating details into movement drivers (e.g. goto driver). By doing this, ScalarR can use faster methods to derivate velocity & turn rate, making movement prediction easier, faster and preciser. Wall smoothing is also encapsulated into drivers, so that I can directly control full details near wall, avoid losing score occasionally.

Energy Management

Energy Management is not simply Power Selection. By doing energy management, you are solving the same problem as in surfing — what's the best action (now bullet power) that maximizes objective, e.g. survival. This involves precise simulation sometimes but simulations are much harder here given the nature of random process. And most observations are biased, so do most simulations. Some very simple strategy possibly in one line can give you astonishing result, but coming up with it may require days of watching battles, and even more days of failing attempts. And past tuning of energy management gets outdated soon after mutating surf & targeting. So this is generally the last thing I do. And keeping it as simple as possible until the very last time does save you a lot of time.