how to build a good test bed?

I think your problem that you are already in top 10 :) while you are testing against relatively simple bots (by modern standards). You probably already have score pushing above 90% for this bots. If I were you I would chose test bed from the top 30 or even top 10. But after all the only real test is the rumble, may be there is a bunch of bots against which you are under performing and none of them are in the test bed.

Otherwise I do something similar but my bot is not that high, so my test bad shows relevant scores. Though sometimes it is somewhat off. I also notice that the score in rumble always slide down until it settles. I am not sure why, may be some bot which save stats keep improving with each round for a while.

But lately I notice that in melee rumble slide down is somewhat catastrophic. When I introduced EvBot v9.2 it was in the top 20 for the first 300 pairing or so, and then just plunge about extra 20 places down. I see it with several latest releases and still cannot understand why.

Beaming (talk)‎

IIRC, in the past versions, the improvement over previous version is somewhat good indicator of final result, e.g. 0.5 increase in APS of common pairings (e.g. 300 common opponents) indicates 0.5 increase in final APS.

IMO the APS until full pairing is meanless, but the difference in common APS is useful.

However, this version breaks the previous pattern. difference in common APS is no longer an indicator, nor the full pairing APS.

The reason why I test agasint relatively “weak” bots is that the majority of the rumble is there. And what affects your score the most is also there. More than half of the bots in rumble is between in APS [40, 70), and there are only 160 bots above 70, and 324 bots below 40. Bots below APS 40 can be ignored IMO, as the improvement against them can only be marginal.

Xor (talk)‎

Until the pairing is complete, APS is not a good indicator. I always go to the details of my bot and then select an older version to compare with. In that case only the bots that both versions have fought, are taken into account. It indeed seems that the last 10% of the pairings involve the best opponents, GrubbmThree held around 58 APS till approx 1000 pairings, then fell down to 57.2. Note that even with 3000-5000 battles, there are still a lot of bots you have only have one fight against, so a few bad battles do have influence.

As for testbed, I used to have around 20 bots in my testbed (50 seasons): 5 top-50 bots, 5 'white-whales', 5 between place 100-300 en a few specific ones to check whether something was broke (f.e. bbo.RamboT must score less than 0.5%)

GrubbmGait (talk)‎

Thanks for figuring out that! I thought the rumble is stabilized very fast (common APS diff of ~300 pairings are already useful), but it turned out not.

Maybe I should build a test bed with more varieties, e.g. bots from all over the rumble with different kind of strategies.

Xor (talk)‎

how to build a good test bed?

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools