Difference between revisions of "Thread:Talk:ScalarBot/Version History/how to build a good test bed?"

From Robowiki
Jump to navigation Jump to search
m
m
 
(2 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
My test bed is running at 35 battles for 30 seasons with 10 bots — 300 battles in total, but it shows irrelevant with rumble score. Is that the bots I choose make it a bad test bed, or just because I have too little battles?  
 
My test bed is running at 35 battles for 30 seasons with 10 bots — 300 battles in total, but it shows irrelevant with rumble score. Is that the bots I choose make it a bad test bed, or just because I have too little battles?  
  
Again, it seems that even after ~3000 battles, the rumble score is still not reliable to be used to compare two versions.
+
The bots I use in my test bed are FloodHT, SandboxDT, RaikoMicro (gf targeting bots), Tron, Aleph (dc bots), Che, Fermet, WeeklongObsession (pattern matchers), GrubbmGrb (“simple” targeting)
 +
 
 +
Again, it seems that even after ~3000 battles, the rumble score is still not reliable enough to be used to compare two versions.
 +
 
 +
Then come my questions: How do you evaluate your bot? How much bots are there in your test bed and how many battles do you run for each of them?

Latest revision as of 04:11, 27 September 2017

Recently I tried a lot to tune the movement, and the result is promising — it performs very well in my test bed (which consisits of some bots I’m performing bad in the past, including some guess factor targeting bots, dc bots and a simple targeter with VG. However, the rumble result shows a huge performance regression ;/

Then, I tried another one, when published to the rumble, it shows huge performance increase (and a little increase when full pairing) — but after ~5000 battles, the performance is even decreased, comparing to the baseline version.

My test bed is running at 35 battles for 30 seasons with 10 bots — 300 battles in total, but it shows irrelevant with rumble score. Is that the bots I choose make it a bad test bed, or just because I have too little battles?

The bots I use in my test bed are FloodHT, SandboxDT, RaikoMicro (gf targeting bots), Tron, Aleph (dc bots), Che, Fermet, WeeklongObsession (pattern matchers), GrubbmGrb (“simple” targeting)

Again, it seems that even after ~3000 battles, the rumble score is still not reliable enough to be used to compare two versions.

Then come my questions: How do you evaluate your bot? How much bots are there in your test bed and how many battles do you run for each of them?