smart battles
The highlighted comment was created in this revision.
So I'm planning to implement smart battle selection this weekend. Every bot (or bot set) will get at least two battles, then I will choose battles to run (in batches since I don't want idle threads) based on trying to decrease standard error in the least amount of time. Maybe with some random battles sprinkled in as well.
I'm thinking I will choose bots with the highest value for: <math>{{stDev \over \sqrt{numBattles}} - {stDev \over \sqrt{numBattles + 1}}} \over {avgBattleTime}</math>
I think this will lead to an overall result with the highest confidence in the least amount of time.
I like testing against a test bed with an average score about the same as my RoboRumble APS. The problem with this is it includes a lot of bots with super low variance (eg, 99.9% scores), so running lots of battles against them is a waste of time. But ignoring them and using a stronger test bed risks specializing against stronger bots.
That looks like a good metric for choosing fast stability. Now I'm wishing I'd included variance in the LiteRumble scores...
Yeah, do you just store a running tally of average score? I'll need to update RoboRunner to keep scores from every individual battle, too, along with battle times.
Yeah, I do a online mean calculation, so newMean = oldMean*(n/(n+1)) + newScore/(n+1), n++
I've actually thought quite a bit about this, and it all depends what score you're trying to stabilise. If you're trying to stabilise the PL, for instance, you need to run lots of battles for pairings at or near the 50/50 mark. If you're doing Schultz then lots of battles need to go to where a weak bot beat a strong bot. It's all about which battle has the most potential influence.