calculating confidence of an APS score
The highlighted comment was created in this revision.
Hey resident brainiacs - I'm displaying confidence using standard error calculations on a per bot basis in RoboRunner now. What I'm not sure of is how to calculate the confidence of the overall score.
If I had the same number of battles for each bot, then the average of all battles would equal the average of all per bot scores. So I think then I could just calculate the overall average and standard error, ignoring per bot averages, and get the confidence interval of overall score that way. But what I want is the average of the individual bot scores, each of which has a different number of battles.
Something like (average standard error / sqrt(num bots)) makes intuitive sense, but I have no idea if it's right. Or maybe sqrt(average(variance relative to per bot average)) / sqrt(num battles)?
This would also allow me to measure the benefits of the smart battle selection.
I don't actually think this can be correctly modelled by a unimodal distribution - you will be adding thin gaussians to fat gaussians, making horrible bumps which don't like to be approximated by a single gaussian mean+stdev. I almost wonder if some sort of Monte-Carlo solution wouldn't be most accurate in this instance - at least the math would be easy to understand.
Good call! That was super easy. I don't recall this Monte-Carlo stuff, but the name rings a bell so maybe I learned about it at some point.
So I calculate 100 random versions of the overall score. For each battle that goes into it, instead of the real score, I generate a random score, assuming a normal distribution using the mean and standard deviation I have for that bot. Then I take the standard deviation of those randomized overall scores and multiply by 1.96 for the confidence interval. Seems like a lot of calculations, but only taking a few hundredths of a second even with 250 bots/3000 battles, so I can afford to do it even when I print the overall score after every battle. Nice!