Test bed with stable results

Jump to navigation Jump to search
Revision as of 27 September 2011 at 03:59.
The highlighted comment was created in this revision.

Test bed with stable results

Hi to all. Who which test beds use to test robots? I cannot find out test bed which will give stable results (+/- 0.05 APS). Now i use every 5th bot roborumble with 5 season vs each and results is in interval -/+ 0.3 APS.

    Jdev05:43, 20 September 2011

    I speak about results stable in reasonable time and meaningful in APS terms

      Jdev06:05, 20 September 2011
       

      Well, I suspect it's total number of battles that matters most. A recent version of Diamond dropped 0.12 APS after 2000 battles in the rumble, but perhaps that was a fluke due to someone's client skipping turns. My APS test beds are 100 bots and I run 5 seasons (500 battles) to get an idea, 10 seasons I consider accurate enough even for pretty minor tweaks, or 20 seasons if I want to feel very confident. I'm not sure what the statistical accuracy is, but +/- 0.05 for 20 seasons (2000 battles) would be about my guess. That's pretty much my experience with the accuracy of RoboRumble results after 2k battles, too.

      I've been thinking about adding to RoboResearch the feature to calculate the confidence interval of the overall result, assuming normal distributions. It should be pretty easy.

        Voidious13:32, 20 September 2011
         

        I run 30 seasons of 35 rounds against 40 bots (taking approx 18 hours). No idea if it is stable though, it is my first testbed. I do know my testbed does not reflect the rumble correctly, because 0.3.2 and 0.3.5 score on par, and 0.3.7 approx 1.5 APS lower.

          GrubbmGait17:16, 20 September 2011
           

          GrubbmGait, you're a hero:) for me 4 hours for test it is limit and i want to get test which give me results in 2 hours. And may be you will try to use Distributed_Robocode - now i can share with you my home netbook (i3 1.3 x 2) and on this week i plan to setup old duron 1.6 as dedicated robocode server. So, i guess, your's test will take at maximum 6 hours (but it's strong depends from which robots is in yours test bed)

            Jdev18:35, 20 September 2011
             

            One quick little thought, is theoretically, it should be possible to use PCA to come up with the most significant axes of the roborumble, and rank robots by how well they correlate with each axis. Then, you also rank robots by their standard deviation. You then pick robots which simultaneously have a low standard deviation, and highest correlation with the axes that the PCA determined. Then you can use some linear regression to determine the weight to give each of the robots selected.

            That I think, would probably be a good way to find a testbed which simultaneously represents the rumble well and has low noise... Hmm... maybe I should make a patch to Voidious' testbed maker that uses the algorithm I describe in the paragraph above...

              Rednaxela19:33, 20 September 2011
               

              I found out good test bed - 7 seasons against every second bot from roborumble (~2900 battles). It produces repeating results within +/- 0.05 APS and error against real RR within +/- 0.1 APS

                Jdev04:59, 27 September 2011