# Talk:Darkcanuck/RRServer/Ratings

Darkcanuck (Talk | contribs) (standard devation is a good idea) |
(Better Statistics :P) |
||

Line 10: | Line 10: | ||

:My intuitive hypothesis remains unshaken, but I don't have any numbers to prove it. But I can't argue with something to the power of 193. :) I'll look into adding standard deviation to some of the tables. What would be most useful, within a pairing, across all pairings, or across all final scores? -- [[User:Darkcanuck|Darkcanuck]] 16:43, 26 September 2008 (UTC) | :My intuitive hypothesis remains unshaken, but I don't have any numbers to prove it. But I can't argue with something to the power of 193. :) I'll look into adding standard deviation to some of the tables. What would be most useful, within a pairing, across all pairings, or across all final scores? -- [[User:Darkcanuck|Darkcanuck]] 16:43, 26 September 2008 (UTC) | ||

+ | |||

+ | Ah, now that you put the statistics that way I can see how to do it. With 3 bots each has 2 pairings, so the chance of both coin flips being "lucky" is indeed 25%. However, the chance of at least 1 of those bots hitting its 25% is <code>(1 - 75%^3) ~= 57.8%</code>. Generalized, this formula is <code>(1 - (1 - .5^(bots - 1))^bots</code>. If you graph that you can see it reduces to pretty much zero pretty quickly. --[[User:Simonton|Simonton]] 17:18, 26 September 2008 (UTC) |

## Revision as of 17:18, 26 September 2008

## Battles per Pairing

I just wanted to comment on the statement, "It's uncertain how well it works with less battles or incomplete pairings." My experiment with the MC2K7 shows that separate runs of 75 battles can still show more than 1% variation for a given pairing. This affects any scoring system, and is a fact that we have to live with. The reliability of output can only be as good as input, no matter how fancy the interpolation is for incomplete pairings. The hope is that the variance will become a wash when seen over 600+ pairings. --Simonton 15:25, 26 September 2008 (UTC)

I think David Alves commented that targeting challenge scores also varied by almost 1% at 15 seasons, so I agree there's lots of evidence that more battles per pairing are needed, which would take a very, very long time in a 600+ competitor environment. You're right that as the number of competitors increases, variabilities cancel each other out. But at the same time, the bigger the competition, the more risk of a "black swan" competitor whose scores are *all* skewed in one direction. -- Darkcanuck 15:31, 26 September 2008 (UTC)

After scratching some things down on paper which are mostly intuition rather than statistics, I believe the odds of having such a "black swan" are either exactly the same or reduced by increasing the number of bots. --Simonton 16:05, 26 September 2008 (UTC)

Well, if there are 3 bots, the chance of one getting lucky against both others is 1/4th, multiply by 3 bots, and the chance of a "black swan" in 3 bots is 75% I believe. With 4 bots, the chance of one getting lucky against against all others is 1/8th, multiply by 4 bots, and the chance of a black swan is 50%. For 5 bots... it is 31.25% chance of a black swan. For 650 bots with one pairing each, the chance of a bot having above average score in every pairing is about 1 to 2.78*10^193. So if we presume getting lucky is anything above the mean score and there's a 50% chance of that in any pairing, and that a "black swan" is only when *all* pairings are lucky, then the chance of a black swan sharply decreases as the number of bots becomes larger. Of course perhaps what would be more useful than simply chance of there being a bot with *all* pairings lucky, would be the chance of luck making the score 1% different. I could calculate this, but only if I had a number of what the "standard deviation" of the percent score of an average robocode battle is. --Rednaxela 16:28, 26 September 2008 (UTC)

- My intuitive hypothesis remains unshaken, but I don't have any numbers to prove it. But I can't argue with something to the power of 193. :) I'll look into adding standard deviation to some of the tables. What would be most useful, within a pairing, across all pairings, or across all final scores? -- Darkcanuck 16:43, 26 September 2008 (UTC)

Ah, now that you put the statistics that way I can see how to do it. With 3 bots each has 2 pairings, so the chance of both coin flips being "lucky" is indeed 25%. However, the chance of at least 1 of those bots hitting its 25% is `(1 - 75%^3) ~= 57.8%`

. Generalized, this formula is `(1 - (1 - .5^(bots - 1))^bots`

. If you graph that you can see it reduces to pretty much zero pretty quickly. --Simonton 17:18, 26 September 2008 (UTC)