calculating confidence of an APS score
Fragment of a discussion from Talk:RoboRunner
Jump to navigation
Jump to search
Those +-, are they the standard error or the stddev?
The only thing I can think of testing is whether you are calculating the right number of random battles for each in the Monte-Carlo method. If you were only doing one battle for each, then the numbers you are getting would be the same for the standard as for the smart battles. It looks like the prioritisation is working well though - Sparrow and Yngwie both have low number of battles as well as low error/stddev.