how to build a good test bed?
The highlighted comment was created in this revision.
Recently I tried a lot to tune the movement, and the result is promising — it performs very well in my test bed (which consisits of some bots I’m performing bad in the past, including some guess factor targeting bots, dc bots and a simple targeter with VG. However, the rumble result shows a huge performance regression ;/
Then, I tried another one, when published to the rumble, it shows huge performance increase (and a little increase when full pairing) — but after ~5000 battles, the performance is even decreased, comparing to the baseline version.
My test bed is running at 35 battles for 30 seasons with 10 bots — 300 battles in total, but it shows irrelevant with rumble score. Is that the bots I choose make it a bad test bed, or just because I have too little battles?
The bots I use in my test bed are FloodHT, SandboxDT, RaikoMicro (gf targeting bots), Tron, Aleph (dc bots), Che, Fermet, WeeklongObsession (pattern matchers), GrubbmGrb (“simple” targeting)
Again, it seems that even after ~3000 battles, the rumble score is still not reliable enough to be used to compare two versions.
Then come my questions: How do you evaluate your bot? How much bots are there in your test bed and how many battles do you run for each of them?
I think your problem that you are already in top 10 :) while you are testing against relatively simple bots (by modern standards). You probably already have score pushing above 90% for this bots. If I were you I would chose test bed from the top 30 or even top 10. But after all the only real test is the rumble, may be there is a bunch of bots against which you are under performing and none of them are in the test bed.
Otherwise I do something similar but my bot is not that high, so my test bad shows relevant scores. Though sometimes it is somewhat off. I also notice that the score in rumble always slide down until it settles. I am not sure why, may be some bot which save stats keep improving with each round for a while.
But lately I notice that in melee rumble slide down is somewhat catastrophic. When I introduced EvBot v9.2 it was in the top 20 for the first 300 pairing or so, and then just plunge about extra 20 places down. I see it with several latest releases and still cannot understand why.