fading weights
(Maybe I should see how this fares before babbling about it, but anyway...)
How this works is: for each attribute, I have an initial weight, a final weight, and a final time (say, 20,000 ticks). The weight shifts linearly from initial to final value until final time, when it stays at final value. Tuning against my 500-bot, 4 season general rumble test bed, I increased my hit percentage from 23.21% to 23.90%. Not sure how much I could've gotten just re-tuning without the shifting, but testing just the initial weights or just the final weights came in around 23.1%. And the previous weights were genetically tuned in a similar fashion, so I think they were close to optimal.
How this came about is I was looking into a custom gun just for the first round of a battle, the way I've been using RetroGirl/Gun for the first few shots. I tried just tuning a KNN gun on 1-round battle data, and the result quickly outperformed my current settings, and the weights looked a lot different. After starting to implement it in Diamond and seeing promising (if incomplete) results, I thought of the more general solution of gradually shifting the weights. This too outperformed my current settings after very few generations. One thing that stands out is that my gun heat dimension starts at a very low weight and ends with a pretty high weight late in the match. This seems to make intuitive sense: as I have more data, it's better to favor firing waves more.
Now to cross my fingers that it produces in the rumble. =)
Neat concept! Now this has got me thinking about trying to associate KNN weightings with all sorts of things besides time... distance maybe?
What do you use for your test bed? RoboResearch?
I've been wanting to systematize my testing more so I can shake things out more thoroughly in an automated way before I throw it up on the Rumble server. I've got RoboResearch ready to go.. I just need to assemble a set of bots to test against.
Yeah, I use RoboResearch for real testing. For some gun-only testing (like this) I use WaveSim, which is a tool I wrote to test just classification against raw battle data - so it doesn't work for testing bullet power or some other nitty gritty things, but for the things it does work for, it's pretty sweet (and fast).
For test beds, I use User:Voidious/BedMaker, which is a little script I wrote to select random bots from the rumble within certain parameters. But you'd need to get a rumble server API key from Darkcanuck first. I've been thinking maybe I should make a web-based version of that for others to use freely, but I wasn't sure if anyone was interested...
Currently hunting down which of a bunch of changes between 1.7.29 and 1.7.35 caused a decrease in performance. I swear I tested rolling back each one individually yesterday and none of them fixed my score. So now the other way: start with 1.7.29 and add each change individually. Once that's done I can see if this fading KNN stuff actually helps.
I've been pretty fearless with continuous refactoring and bug fixes throughout 1.7.x, and this is the price I pay for it. =) But overall I think it's been worth it, both in terms of code quality and performance.
Orrrrr I've been chasing ghosts... In my 2,000 battle benchmark (250 bots x 8 seasons), 1.7.29 came in 0.18 above 1.7.37 and a dev version of 1.7.35 got the exact same score as 1.7.37. Rolling back individual changes gave me anywhere from 0.14 to 0.22 below 1.7.29's score. Then I recompiled 1.7.29, checked the class files came out the same (ie, I had the right source), and reran 10 seasons... and came in 0.14 below 1.7.29.
Another good reason to focus on big improvements: anything else is too small / painful to reliably benchmark. =)