calculating confidence of an APS score
Hey resident brainiacs - I'm displaying confidence using standard error calculations on a per bot basis in RoboRunner now. What I'm not sure of is how to calculate the confidence of the overall score.
If I had the same number of battles for each bot, then the average of all battles would equal the average of all per bot scores. So I think then I could just calculate the overall average and standard error, ignoring per bot averages, and get the confidence interval of overall score that way. But what I want is the average of the individual bot scores, each of which has a different number of battles.
Something like (average standard error / sqrt(num bots)) makes intuitive sense, but I have no idea if it's right. Or maybe sqrt(average(variance relative to per bot average)) / sqrt(num battles)?
This would also allow me to measure the benefits of the smart battle selection.
I don't actually think this can be correctly modelled by a unimodal distribution - you will be adding thin gaussians to fat gaussians, making horrible bumps which don't like to be approximated by a single gaussian mean+stdev. I almost wonder if some sort of Monte-Carlo solution wouldn't be most accurate in this instance - at least the math would be easy to understand.