Test bed with stable results
Hi to all. Who which test beds use to test robots? I cannot find out test bed which will give stable results (+/- 0.05 APS). Now i use every 5th bot roborumble with 5 season vs each and results is in interval -/+ 0.3 APS.
You do not have permission to edit this page, for the following reasons:
You can view and copy the source of this page.
Return to Thread:User talk:Jdev/Questions/Test bed with stable results/reply (2).
I run 30 seasons of 35 rounds against 40 bots (taking approx 18 hours). No idea if it is stable though, it is my first testbed. I do know my testbed does not reflect the rumble correctly, because 0.3.2 and 0.3.5 score on par, and 0.3.7 approx 1.5 APS lower.
GrubbmGait, you're a hero:) for me 4 hours for test it is limit and i want to get test which give me results in 2 hours. And may be you will try to use Distributed_Robocode - now i can share with you my home netbook (i3 1.3 x 2) and on this week i plan to setup old duron 1.6 as dedicated robocode server. So, i guess, your's test will take at maximum 6 hours (but it's strong depends from which robots is in yours test bed)
One quick little thought, is theoretically, it should be possible to use PCA to come up with the most significant axes of the roborumble, and rank robots by how well they correlate with each axis. Then, you also rank robots by their standard deviation. You then pick robots which simultaneously have a low standard deviation, and highest correlation with the axes that the PCA determined. Then you can use some linear regression to determine the weight to give each of the robots selected.
That I think, would probably be a good way to find a testbed which simultaneously represents the rumble well and has low noise... Hmm... maybe I should make a patch to Voidious' testbed maker that uses the algorithm I describe in the paragraph above...
I found out good test bed - 7 seasons against every second bot from roborumble (~2900 battles). It produces repeating results within +/- 0.05 APS and error against real RR within +/- 0.1 APS
I'd think 1 battle against each bot from the rumble would probably give better results... maybe not as reproducible but closer correlated to the rumble.
No Skilgannon, even 3 battles against every bot is worse in terms of stability and correlation with RR
I totally agree with using very large test beds (lately ~100 bots for me), but I think you're wasting some CPU cycles testing against bots that are unlikely to be affected by your changes. Bots you get 98+ against probably won't be affected unless you're testing changes to your core surfing algorithm or something. Most changes to surf stats are not going to affect HOT bots at all, and flattener tweaks won't affect any bots that have no chance of triggering your flattener.
My last release show, that you never know what will be affected:) So better if test will execute 7 hours, against 6.5, but give confidence in results:)