calculating confidence of an APS score

Jump to navigation Jump to search
Revision as of 14 August 2012 at 12:47.
The highlighted comment was created in this revision.

calculating confidence of an APS score

Hey resident brainiacs - I'm displaying confidence using standard error calculations on a per bot basis in RoboRunner now. What I'm not sure of is how to calculate the confidence of the overall score.

If I had the same number of battles for each bot, then the average of all battles would equal the average of all per bot scores. So I think then I could just calculate the overall average and standard error, ignoring per bot averages, and get the confidence interval of overall score that way. But what I want is the average of the individual bot scores, each of which has a different number of battles.

Something like (average standard error / sqrt(num bots)) makes intuitive sense, but I have no idea if it's right. Or maybe sqrt(average(variance relative to per bot average)) / sqrt(num battles)?

This would also allow me to measure the benefits of the smart battle selection.

    Voidious19:45, 13 August 2012

    I don't actually think this can be correctly modelled by a unimodal distribution - you will be adding thin gaussians to fat gaussians, making horrible bumps which don't like to be approximated by a single gaussian mean+stdev. I almost wonder if some sort of Monte-Carlo solution wouldn't be most accurate in this instance - at least the math would be easy to understand.

      Skilgannon22:11, 13 August 2012
       

      Good call! That was super easy. I don't recall this Monte-Carlo stuff, but the name rings a bell so maybe I learned about it at some point.

      So I calculate 100 random versions of the overall score. For each battle that goes into it, instead of the real score, I generate a random score, assuming a normal distribution using the mean and standard deviation I have for that bot. Then I take the standard deviation of those randomized overall scores and multiply by 1.96 for the confidence interval. Seems like a lot of calculations, but only taking a few hundredths of a second even with 250 bots/3000 battles, so I can afford to do it even when I print the overall score after every battle. Nice!

        Voidious03:03, 14 August 2012

        Now I'm just bummed that it's still +- 0.06 after 3000 battles. :-(

          Voidious03:11, 14 August 2012
           

          I'm curious - did you use the Monte-Carlo method for calculating the non-smart-battles deviations?

          Also, how long did it take to get the 3000 battles compared to the non-smart-battles?

            Skilgannon07:32, 14 August 2012
             

            I'm using the same Monte-Carlo method for confidence either way. I hadn't run too many side by sides yet, but I'll do some more soon. Over night, I ran a test of 25 seasons of TCRM in regular vs smart battles mode on my laptop. They took about the same amount of time, and both ended up showing +- 0.363. But the smart battles came out to 89.32, very close to the 89.31 I got when I ran 100 (non-smart) seasons before, while the normal battles ended at 88.76.

            So I'm a little disappointed it wasn't faster nor showed a better confidence, but it was a lot closer to the true average. And I guess my confidence calculation sucks or something weird happened, since 88.76 is much farther than .363 from the true average. (And yes, my TCRM score has tanked that much since its glory days!)

              Voidious13:47, 14 August 2012