Difference between revisions of "Thread:Talk:ScalarBot/Version History/how to build a good test bed?"

From Robowiki
Jump to navigation Jump to search
m
m
Line 6: Line 6:
  
 
Again, it seems that even after ~3000 battles, the rumble score is still not reliable enough to be used to compare two versions.
 
Again, it seems that even after ~3000 battles, the rumble score is still not reliable enough to be used to compare two versions.
 +
 +
Then come my questions: How do you evaluate your bot? How much bots are there in your test bed and how many battles do you run for each of them?

Revision as of 03:08, 27 September 2017

Recently I tried a lot to tune the movement, and the result is promising — it performs very well in my test bed (which consisits of some bots I’m performing bad in the past, including some guess factor targeting bots, dc bots and a simple targeter with VG. However, the rumble result shows a huge performance regression ;/

Then, I tried another one, when published to the rumble, it shows huge performance increase (and a little increase when full pairing) — but after ~5000 battles, the performance is even decreased, comparing to the baseline version.

My test bed is running at 35 battles for 30 seasons with 10 bots — 300 battles in total, but it shows irrelevant with rumble score. Is that the bots I choose make it a bad test bed, or just because I have too little battles?

Again, it seems that even after ~3000 battles, the rumble score is still not reliable enough to be used to compare two versions.

Then come my questions: How do you evaluate your bot? How much bots are there in your test bed and how many battles do you run for each of them?