Difference between revisions of "ScalarR/Some Ideas"
m (KNN Stats & Bandwidth) |
m |
||
(2 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
I’m not going into full details for now, but rather some crucial ideas that drives [[ScalarR]] and have been proved to work well. Instead of encouraging people to follow the [[ScalarR]] way, I believe that sharing initial motivations could inspire even more innovations, and push the bar of top bots even further. | I’m not going into full details for now, but rather some crucial ideas that drives [[ScalarR]] and have been proved to work well. Instead of encouraging people to follow the [[ScalarR]] way, I believe that sharing initial motivations could inspire even more innovations, and push the bar of top bots even further. | ||
− | + | = Targeting = | |
− | + | == Anti-General/Random == | |
By anti-general movement / anti-random movement, a basic assumption is that the opponent has a finite & fixed set of movements. Once the specific movement is determined, data points are iid. Then either some offline learning, online learning or a mix of both works pretty well. The only trick is to make targeting space (e.g. guess factor & corresponding features) and physical space (robocode physics) consistent. To be more specific, maintain accurate 1-to-1 correspondence between both spaces. | By anti-general movement / anti-random movement, a basic assumption is that the opponent has a finite & fixed set of movements. Once the specific movement is determined, data points are iid. Then either some offline learning, online learning or a mix of both works pretty well. The only trick is to make targeting space (e.g. guess factor & corresponding features) and physical space (robocode physics) consistent. To be more specific, maintain accurate 1-to-1 correspondence between both spaces. | ||
− | + | == Anti-Surfing == | |
The problem of traditional targeting against surfers is that all observations are biased (since data points aren't iid at all). Using hit/collide waves? Biased. Using tick waves? Biased. Using historical waves only? Biased. Using recent waves only? Biased. Anyway being biased isn't a bad thing, a lot of guns work by adding bias manually, and it works. ScalarR isn't doing anything truly special right now. | The problem of traditional targeting against surfers is that all observations are biased (since data points aren't iid at all). Using hit/collide waves? Biased. Using tick waves? Biased. Using historical waves only? Biased. Using recent waves only? Biased. Anyway being biased isn't a bad thing, a lot of guns work by adding bias manually, and it works. ScalarR isn't doing anything truly special right now. | ||
− | + | == Anti-Flattening == | |
Anti-Flattening is a different story than Anti-Surfing. To hit surfers, you need recency. To hit flatteners, you need something to exploit weakness (e.g. repeated patterns), otherwise random targeting works sufficiently well. Since flattening is generally used with traditional surfing, things in anti-surfing generally mix-in. ScalarR isn't doing anything truly special right now. | Anti-Flattening is a different story than Anti-Surfing. To hit surfers, you need recency. To hit flatteners, you need something to exploit weakness (e.g. repeated patterns), otherwise random targeting works sufficiently well. Since flattening is generally used with traditional surfing, things in anti-surfing generally mix-in. ScalarR isn't doing anything truly special right now. | ||
− | + | == Virtual Guns == | |
Virtual Gun scores are unbiased against non-adaptive movement (without considering bullet collision), but are biased when either facing adaptive movement or with bullet collision considered (since different guns may have different bullet collision rate). Some hard-coded hit-rate threshold works, but are there better solutions? | Virtual Gun scores are unbiased against non-adaptive movement (without considering bullet collision), but are biased when either facing adaptive movement or with bullet collision considered (since different guns may have different bullet collision rate). Some hard-coded hit-rate threshold works, but are there better solutions? | ||
− | + | = Surfing = | |
Surfing is another story, a complete different story. But if I was asked to give [[Wave Surfing]] another name, I would simply choose [[Minimum Risk Movement]], redefined. Back to the simple story where agent (robot) was facing an environment (mainly walls and other bots), which action (mainly movement) will you choose to maximize objective, e.g. survival? This is not a simple question, but many non-simple questions receive some simple solution — why not simply simulate what's going next, and let the simulation result decide? Then many techniques are used, e.g. [[Waves/Precise Intersection|Precise Intersection]], [[Bullet Shadow/Correct]], everything is making the simulation closer to what's truly happening, by making it more precise, and more exact. | Surfing is another story, a complete different story. But if I was asked to give [[Wave Surfing]] another name, I would simply choose [[Minimum Risk Movement]], redefined. Back to the simple story where agent (robot) was facing an environment (mainly walls and other bots), which action (mainly movement) will you choose to maximize objective, e.g. survival? This is not a simple question, but many non-simple questions receive some simple solution — why not simply simulate what's going next, and let the simulation result decide? Then many techniques are used, e.g. [[Waves/Precise Intersection|Precise Intersection]], [[Bullet Shadow/Correct]], everything is making the simulation closer to what's truly happening, by making it more precise, and more exact. | ||
− | + | == Movement Basis == | |
Most robots are calling setAhead & setTurn, and leave the reset of movement to robocode. ScalarR is directly controlling velocity & turn rate instead, and is encapsulating details into movement drivers (e.g. goto driver). By doing this, ScalarR can use faster methods to derivate velocity & turn rate, making movement prediction easier, faster and preciser. Wall smoothing is also encapsulated into drivers, so that I can directly control full details near wall, avoid losing score occasionally. | Most robots are calling setAhead & setTurn, and leave the reset of movement to robocode. ScalarR is directly controlling velocity & turn rate instead, and is encapsulating details into movement drivers (e.g. goto driver). By doing this, ScalarR can use faster methods to derivate velocity & turn rate, making movement prediction easier, faster and preciser. Wall smoothing is also encapsulated into drivers, so that I can directly control full details near wall, avoid losing score occasionally. | ||
− | + | === Anti-Ram === | |
A robot ramming at you can be considered as a bullet moving in 8 per second, then the existing code of danger estimation & wave surfing can be reused. Simple & effective, done. | A robot ramming at you can be considered as a bullet moving in 8 per second, then the existing code of danger estimation & wave surfing can be reused. Simple & effective, done. | ||
+ | |||
+ | == True Surfing == | ||
=== Melee Surfing === | === Melee Surfing === | ||
Line 32: | Line 34: | ||
Once you had the melee stuff work, adding some 1v1 surfing is merely some changes in paths generated. Then, danger estimation get diverse, and a lot of details are retuned. But they aren't truly different, right? | Once you had the melee stuff work, adding some 1v1 surfing is merely some changes in paths generated. Then, danger estimation get diverse, and a lot of details are retuned. But they aren't truly different, right? | ||
− | + | == Danger Estimation == | |
Instead of inventing and experimenting some novel approach like in targeting, the best way that worked is to test against real opponents, say the entire rumble, since danger estimation is essentially fitting the targeting of existing robots in the rumble. Inventing something fancy doesn't help much, but waching a lot of battles, reading a lot of targeting code does work well. And once most problem bots are solved, you got some fairly good result, done. | Instead of inventing and experimenting some novel approach like in targeting, the best way that worked is to test against real opponents, say the entire rumble, since danger estimation is essentially fitting the targeting of existing robots in the rumble. Inventing something fancy doesn't help much, but waching a lot of battles, reading a lot of targeting code does work well. And once most problem bots are solved, you got some fairly good result, done. | ||
Line 38: | Line 40: | ||
There seems tendency to use many trees to serve as some "crowd" surfing, from simple to complex. However summing stats together is linear and results simple stats rather than complex stats, which makes it weak against Anti-Surfer guns & Pattern Matcher. I'm using only big trees, and am simply reusing features & weights from guns, and it worked pretty well against top bots even without a flattener. | There seems tendency to use many trees to serve as some "crowd" surfing, from simple to complex. However summing stats together is linear and results simple stats rather than complex stats, which makes it weak against Anti-Surfer guns & Pattern Matcher. I'm using only big trees, and am simply reusing features & weights from guns, and it worked pretty well against top bots even without a flattener. | ||
− | + | === Bandwidth === | |
Bandwidth is almost as important as KNN stats, as it serves as the "air" of danger estimation, and actual surfing end-points are determined by "air", not actual danger points. By bandwidth, two things are modeled together, the probability itself, and some hint when dangers are far apart ("air"). And model of probability itself is determined by whether the opponent is using simple guns, learning guns, guess factor or not, etc. | Bandwidth is almost as important as KNN stats, as it serves as the "air" of danger estimation, and actual surfing end-points are determined by "air", not actual danger points. By bandwidth, two things are modeled together, the probability itself, and some hint when dangers are far apart ("air"). And model of probability itself is determined by whether the opponent is using simple guns, learning guns, guess factor or not, etc. | ||
− | + | === Flattening === | |
Flattening can be considered as some special form of danger estimation, when the opponent is either firing with only recent waves, or trying to spot repeated patterns. Firing with only recent waves doesn't work well already against surfers with complex danger estimation (and bullet shadow), and when they are trying to spot weakness, it's merely racing that who can spot weakness better. At some point, every targeting method is either performing the same as random or worse than random, then done. | Flattening can be considered as some special form of danger estimation, when the opponent is either firing with only recent waves, or trying to spot repeated patterns. Firing with only recent waves doesn't work well already against surfers with complex danger estimation (and bullet shadow), and when they are trying to spot weakness, it's merely racing that who can spot weakness better. At some point, every targeting method is either performing the same as random or worse than random, then done. | ||
− | + | === Bullet Shadow === | |
− | Implementing bullet shadow [[Bullet Shadow/Correct | correctly]] is important. Getting every detail correct is hard, but is worth it. Getting bullet shadow to work with [[Waves/Precise Intersection | Precise Intersection]] is even harder, because you need to define wave danger as some [[wikipedia:Probability Density Function | pdf]] for unbiased shadowing. But APS does pay off when all of the above are done right. | + | Implementing bullet shadow [[Bullet Shadow/Correct | correctly]] is important. Getting every detail correct is hard, but is worth it. Getting bullet shadow to work with [[Waves/Precise Intersection | Precise Intersection]] is even harder, because you need to define wave danger as some [[wikipedia:Probability Density Function | pdf]] for unbiased shadowing. But APS does pay off when all of the above are done right. |
− | + | = Energy Management = | |
Energy Management is not simply Power Selection. By doing energy management, you are solving the same problem as in surfing — what's the best action (now bullet power) that maximizes objective, e.g. survival. This involves precise simulation sometimes but simulations are much harder here given the nature of random process. And most observations are biased, so do most simulations. Some very simple strategy possibly in one line can give you astonishing result, but coming up with it may require days of watching battles, and even more days of failing attempts. And past tuning of energy management gets outdated soon after mutating surf & targeting. So this is generally the last thing I do. And keeping it as simple as possible until the very last time does save you a lot of time. | Energy Management is not simply Power Selection. By doing energy management, you are solving the same problem as in surfing — what's the best action (now bullet power) that maximizes objective, e.g. survival. This involves precise simulation sometimes but simulations are much harder here given the nature of random process. And most observations are biased, so do most simulations. Some very simple strategy possibly in one line can give you astonishing result, but coming up with it may require days of watching battles, and even more days of failing attempts. And past tuning of energy management gets outdated soon after mutating surf & targeting. So this is generally the last thing I do. And keeping it as simple as possible until the very last time does save you a lot of time. | ||
+ | |||
+ | __NOEDITSECTION__ |
Latest revision as of 02:15, 29 June 2021
This article is a stub. You can help RoboWiki by expanding it. |
I’m not going into full details for now, but rather some crucial ideas that drives ScalarR and have been proved to work well. Instead of encouraging people to follow the ScalarR way, I believe that sharing initial motivations could inspire even more innovations, and push the bar of top bots even further.
Contents
Targeting
Anti-General/Random
By anti-general movement / anti-random movement, a basic assumption is that the opponent has a finite & fixed set of movements. Once the specific movement is determined, data points are iid. Then either some offline learning, online learning or a mix of both works pretty well. The only trick is to make targeting space (e.g. guess factor & corresponding features) and physical space (robocode physics) consistent. To be more specific, maintain accurate 1-to-1 correspondence between both spaces.
Anti-Surfing
The problem of traditional targeting against surfers is that all observations are biased (since data points aren't iid at all). Using hit/collide waves? Biased. Using tick waves? Biased. Using historical waves only? Biased. Using recent waves only? Biased. Anyway being biased isn't a bad thing, a lot of guns work by adding bias manually, and it works. ScalarR isn't doing anything truly special right now.
Anti-Flattening
Anti-Flattening is a different story than Anti-Surfing. To hit surfers, you need recency. To hit flatteners, you need something to exploit weakness (e.g. repeated patterns), otherwise random targeting works sufficiently well. Since flattening is generally used with traditional surfing, things in anti-surfing generally mix-in. ScalarR isn't doing anything truly special right now.
Virtual Guns
Virtual Gun scores are unbiased against non-adaptive movement (without considering bullet collision), but are biased when either facing adaptive movement or with bullet collision considered (since different guns may have different bullet collision rate). Some hard-coded hit-rate threshold works, but are there better solutions?
Surfing
Surfing is another story, a complete different story. But if I was asked to give Wave Surfing another name, I would simply choose Minimum Risk Movement, redefined. Back to the simple story where agent (robot) was facing an environment (mainly walls and other bots), which action (mainly movement) will you choose to maximize objective, e.g. survival? This is not a simple question, but many non-simple questions receive some simple solution — why not simply simulate what's going next, and let the simulation result decide? Then many techniques are used, e.g. Precise Intersection, Bullet Shadow/Correct, everything is making the simulation closer to what's truly happening, by making it more precise, and more exact.
Movement Basis
Most robots are calling setAhead & setTurn, and leave the reset of movement to robocode. ScalarR is directly controlling velocity & turn rate instead, and is encapsulating details into movement drivers (e.g. goto driver). By doing this, ScalarR can use faster methods to derivate velocity & turn rate, making movement prediction easier, faster and preciser. Wall smoothing is also encapsulated into drivers, so that I can directly control full details near wall, avoid losing score occasionally.
Anti-Ram
A robot ramming at you can be considered as a bullet moving in 8 per second, then the existing code of danger estimation & wave surfing can be reused. Simple & effective, done.
True Surfing
Melee Surfing
I can't see real differences between melee and 1v1, if wave surfing is redefined as some form of minimum risk. So I'm sharing most code between melee and 1v1, with only more paths to evaluate in melee. Things like second-wave surfing can be reconsidered as some fast approximation of searching all possible branches, by assuming going in straight line between waves. Actually you generally don't have enough ticks for some more complex movement, so this approximation is just fine. And to solve edge cases, I'm not really distinguishing things like first / second waves, but just summing the risks along the path, weighted by time to arrive.
1v1 Surfing
Once you had the melee stuff work, adding some 1v1 surfing is merely some changes in paths generated. Then, danger estimation get diverse, and a lot of details are retuned. But they aren't truly different, right?
Danger Estimation
Instead of inventing and experimenting some novel approach like in targeting, the best way that worked is to test against real opponents, say the entire rumble, since danger estimation is essentially fitting the targeting of existing robots in the rumble. Inventing something fancy doesn't help much, but waching a lot of battles, reading a lot of targeting code does work well. And once most problem bots are solved, you got some fairly good result, done.
KNN Stats
There seems tendency to use many trees to serve as some "crowd" surfing, from simple to complex. However summing stats together is linear and results simple stats rather than complex stats, which makes it weak against Anti-Surfer guns & Pattern Matcher. I'm using only big trees, and am simply reusing features & weights from guns, and it worked pretty well against top bots even without a flattener.
Bandwidth
Bandwidth is almost as important as KNN stats, as it serves as the "air" of danger estimation, and actual surfing end-points are determined by "air", not actual danger points. By bandwidth, two things are modeled together, the probability itself, and some hint when dangers are far apart ("air"). And model of probability itself is determined by whether the opponent is using simple guns, learning guns, guess factor or not, etc.
Flattening
Flattening can be considered as some special form of danger estimation, when the opponent is either firing with only recent waves, or trying to spot repeated patterns. Firing with only recent waves doesn't work well already against surfers with complex danger estimation (and bullet shadow), and when they are trying to spot weakness, it's merely racing that who can spot weakness better. At some point, every targeting method is either performing the same as random or worse than random, then done.
Bullet Shadow
Implementing bullet shadow correctly is important. Getting every detail correct is hard, but is worth it. Getting bullet shadow to work with Precise Intersection is even harder, because you need to define wave danger as some pdf for unbiased shadowing. But APS does pay off when all of the above are done right.
Energy Management
Energy Management is not simply Power Selection. By doing energy management, you are solving the same problem as in surfing — what's the best action (now bullet power) that maximizes objective, e.g. survival. This involves precise simulation sometimes but simulations are much harder here given the nature of random process. And most observations are biased, so do most simulations. Some very simple strategy possibly in one line can give you astonishing result, but coming up with it may require days of watching battles, and even more days of failing attempts. And past tuning of energy management gets outdated soon after mutating surf & targeting. So this is generally the last thing I do. And keeping it as simple as possible until the very last time does save you a lot of time.