how to build a good test bed?

Jump to navigation Jump to search
Revision as of 27 September 2017 at 03:36.
The highlighted comment was edited in this revision. [diff]

how to build a good test bed?

Recently I tried a lot to tune the movement, and the result is promising — it performs very well in my test bed (which consisits of some bots I’m performing bad in the past, including some guess factor targeting bots, dc bots and a simple targeter with VG. However, the rumble result shows a huge performance regression ;/

Then, I tried another one, when published to the rumble, it shows huge performance increase (and a little increase when full pairing) — but after ~5000 battles, the performance is even decreased, comparing to the baseline version.

My test bed is running at 35 battles for 30 seasons with 10 bots — 300 battles in total, but it shows irrelevant with rumble score. Is that the bots I choose make it a bad test bed, or just because I have too little battles?

The bots I use in my test bed are FloodHT, SandboxDT, RaikoMicro (gf targeting bots), Tron, Aleph (dc bots), Che, Fermet, WeeklongObsession (pattern matchers), GrubbmGrb (“simple” targeting)

Again, it seems that even after ~3000 battles, the rumble score is still not reliable enough to be used to compare two versions.

Then come my questions: How do you evaluate your bot? How much bots are there in your test bed and how many battles do you run for each of them?

    Xor (talk)02:03, 27 September 2017

    I think your problem that you are already in top 10 :) while you are testing against relatively simple bots (by modern standards). You probably already have score pushing above 90% for this bots. If I were you I would chose test bed from the top 30 or even top 10. But after all the only real test is the rumble, may be there is a bunch of bots against which you are under performing and none of them are in the test bed.

    Otherwise I do something similar but my bot is not that high, so my test bad shows relevant scores. Though sometimes it is somewhat off. I also notice that the score in rumble always slide down until it settles. I am not sure why, may be some bot which save stats keep improving with each round for a while.

    But lately I notice that in melee rumble slide down is somewhat catastrophic. When I introduced EvBot v9.2 it was in the top 20 for the first 300 pairing or so, and then just plunge about extra 20 places down. I see it with several latest releases and still cannot understand why.

      Beaming (talk)03:35, 27 September 2017

      IIRC, in the past versions, the improvement over previous version is somewhat good indicator of final result, e.g. 0.5 increase in APS of common pairings (e.g. 300 common opponents) indicates 0.5 increase in final APS.

      IMO the APS until full pairing is meanless, but the difference in common APS is useful.

      However, this version breaks the previous pattern. difference in common APS is no longer an indicator, nor the full pairing APS.

      The reason why I test agasint relatively “weak” bots is that the majority of the rumble is there. And what affects your score the most is also there. More than half of the bots in rumble is between in APS [40, 70), and there are only 160 bots above 70, and 324 bots below 40. Bots below APS 40 can be ignored IMO, as the improvement against them can only be marginal.

        Xor (talk)04:34, 27 September 2017