Poisoning Enemy Learning Systems
The highlighted comment was created in this revision.
I just ran across an interesting article:
That's pretty interesting stuff, and not just in relation to Robocode.
As for Robocode applications, poisoning the enemy's guns with data also carries the risk of not dodging bullets, since the data gathering and the classification are so intertwined. But it's the type of technique you'd only use against high level opponents, like we do with flatteners, so it's already a situation where you're not able to dodge very accurately.
But I wonder... One thing it mentions is that this is possible if you have access to the same data as the enemy. In Robocode, of course we do, technically. But if that were really true, we'd be able to emulate the enemy's gun stats and do perfect curve flattening and never get hit. So I think it's probably closer to true that we don't have access to the same data as the enemy.
Actually it is possible to emulate opponent guns unless they use some pseudo-random technique. But we don't perfectly emulate because there are many different guns from many different opponents and few bots try to classify and specialize against the bot it is battling against (i.e. ScannedRobotEvent.getName()). Generalist bots are more fun.
Interesting fact is, this concept is already being used in RoboRumble for years.
PatternMatching (learning)... Anti-pattern matching (counter-learning)... GuessFactors (learning)... FlatMovement (counter-learning)... Dynamic Clustering... data decay...
In some sense, we are at the bleeding edge of AI advancement. There are very few AI competitions with imperfect information around the world other than Robocode. I can only think of one or two.
Basically what they explains is ways to use the minimum of 'false' information (in Robocode terms that means intentionally skewing your movement profile, but with increased chance of getting hit) in order to maximize the chance that they will incorrectly classify future data (ie, aim in the wrong place next time).
I agree anti-pattern matching and anti-GF have been effective at dodging their specific type of targeting, however this is a different concept entirely. This is about intentionally behaving in a certain way so that they will think you will do this next time, not behaving that way because you know exactly where they will shoot.
I would love to apply this somehow, because I think our learning guns are very susceptible to this. Sure, they all shoot differently at surfers meaning you can't take one gun and dodge another, but when you consider a movement profile with obvious peaks they all tend to shoot in the same way. Our wavesurfing flattens the profile, but all that does is bring us to the edge cases where every gun will shoot differently. If instead we have peaks that are obvious, all of the guns will shoot in the same way, making it possible to better predict their bullets and thus dodge them more reliably.
Hmm. I disagree with all guns shooting the same way at the obvious spikes, because I think there is really a lot of variety in gun configurations, especially once you start getting into Anti-Surfer guns. But most guns, even AS guns, are learning a lot from virtual waves. So the ideal situation is that you're creating really strong spikes in those virtual waves but still dodging the dangerous spots of real waves. That seems pretty difficult, but maybe possible.
On a semi-related note, I definitely think there's lots of room for improvement in surf stats with proactive stuff like this. I've put in a bunch of effort on "light" flatteners to flatten movement even against weaker learning guns, or switching to a different random movement profile each time I'm hit instead of actually dodging past bullet hits. So far I haven't had any success, but I still think there's something to those ideas. It bothers me that I have to get hit to change my movement profile.
I cited the wrong strategies then.
Wave Surfing is a movement strategy which keeps dodging waves the same way until it sees a bullet. The bullet usually means a spike in the opponent profile. Then it exploits the spike by changing the movement.
Screwing up virtual wave data seems a good idea. Put some virtual wave spike evaluation in the WS danger function? Minimize (avoid) real waves spikes and maximize virtual waves spikes... Some kind of anti-"anti-surfer gun" movement...
I think Skilgannon is right that this concept is different than current Robocode strategies, which are not proactive about painting an incorrect picture to the enemy. The only thing I can think of along these lines is oldwiki:AntiMirrorSystem, but even that is not really poisoning enemy data, just an exploit for a specific movement strategy.
Also, from what little I remember of SVMs, I think they may be much more susceptible to poisoning than KNN. They try to find the best plane to split the data space, so this method probably tries to plant points that alter the plane as much as possible. I think KNN would be much more resilient to this type of manipulation.
Hmm, interesting.
Regarding, "proactive about painting an incorrect picture to the enemy", two things come to mind that fit that category:
- Chase bullets (but that's taking advantage of the 'bookkeeping' not the learning)
- Robots which 'intentionally' look different to tick waves than bullet waves
A pure flattener also kind of fits this description in my mind, because it's all about going somewhere that it won't go in the future, it just looks at it from the opposite direction in time. It's just that it can't do a hugely accurate job of it due to the variety of configurations.
I think that one of the advantages of the usual "statistical" methods, is that they are more difficult to poison, because they don't "speculate" beyond what they've directly observed. In general I suspect that the same types of things that make a learning method robust to data from both multi-mode bots and random bots, also tend to make it more resistant to poisoning.
Maybe a flattener which instead of going where it was hit the least (most bots), goes where it was hit the most, but still below a certain threshold. Artificially create a spike in the opponent profile, then avoid it when it goes over the threshold. The data is poisoned until another higher spike appears, which might take a while without data decay. If the bot starts flattening the profile afterwards, it might take a looong time for another spike to form.
Can work in theory, but the threshold is different for each opponent. And data decay can quick get rid of poisoned data.