@Voidious -- I'm not sure what your plan for confidence is, but I eagerly went ahead and developed my own confidence calculator. I was looking over your code for calculating confidence and was having trouble following it, so I instead went to my wife's Principles of Biostatistics book and read the chapter on Confidence Intervals. For the sake of simplicity, I will stick with 95% confidence intervals, as that is what you used in your code (that's where the 1.96 comes from) and it seems reasonable. The confidence interval for a single robot turns out to be pretty simple to calculate (in special-character-challenged terms, it is x +- 1.96 * s / sqrt(n), where x is the mean, s is the standard deviation, and n is the sample size). Where it gets more complicated is in calculating the confidence interval for groups and the overall total score.
Lets talk groups first. What I did for a group was to take the first score for each opponent, average them all, and that becomes data point 1. Then take the second score for each opponent, average them, and that becomes data point 2. I determine how many data points to use by calculating the average number of battles for an opponent in the group, rounded. This means some data points for opponents with more scores end up getting thrown away, and some data points for opponents with fewer scores don't have enough scores. For the latter, I use as many extra randomly generated scores as I need where the random score falls within the confidence interval of scores for that particular robot. Once I have all of the data points, I then use the original means for calculating a confidence interval on the collected data points.
Now for the overall total. If there is only 1 group (or no groups, depending on how you look at it), then there is nothing more to do -- use the values calculated for the 1 group. But if there are multiple groups, then what? We should probably respect that the overall total is an average of the group totals. This would end up being just like calculating the group confidence intervals, only treating the groups like the robots.
Did that make sense? How is this different from what what you have done in RoboRunner?
Heh, well, what I did is a little complicated, but I think it's about the best you can do for a set of bots that each have their own distributions. Basically I run 1,000 or whatever random simulations of the overall score, based on the averages / standard deviations of each individual bot's score distribution. Then I can take those "overall score" samples, supposedly generated from the same distribution as the real scores, and use them as additional samples to calculate the confidence interval of the overall score. It's a fairly basic Monte Carlo method.
I see there was a discussion about it on the RoboRunner page. I should probably go read that. Never heard of the Monte Carlo method, so I'll look into it.
I'd heard the term, but it was totally Skilgannon that knew enough to suggest it. Once I looked into it, though, it was pretty simple.
But I also wanted to mention, I was planning to pass some object with all the confidence interval info you might need about the current battle in the new listener. I figured that was among the things you'd want in the application output, since it's among the things I print in the console version. But of course you're free to use whatever you like. :-)
I'll use it if it's there. I use the ScoreLog to show data from past battles, and wasn't sure if confidence information would also be available from the ScoreLog after your updates. If not, I can keep using my own confidence calculator for past data.
Hmm. Well first off, I am pretty sure you should make sure you are using the [t-distribution], not the normal distribution. Using that, I would generate a confidence interval for each individual bot. I am nearly certain that there is a way to generate a confidence interval from the mean of several other intervals. I can't remember off the top of my head but I vaguely recall it being something like the square root of the sum of the squares of the standard errors (not standard deviations since the sample size is presumably fairly small). I'll tell you if I can find it.
I didn't read through them carefully (kind of busy with school), but skimming through them quickly, it appears that the square root of the sum of the variances of the individual distributions is correct.
I think that's correct if all of them have the same number of samples. However, with the cool new 'variance minimizer' pairings selection algorithm that isn't necessarily guaranteed. Although you may be right - could you see if your Monte Carlo gives the same results as a root-sum-of-squares, Voidious?