Difference between revisions of "Darkcanuck/RRServer/Ratings"
Darkcanuck (talk | contribs) (rating rebuild complete -- review, discuss & analyze) |
Darkcanuck (talk | contribs) (trying to summarize how current Elo ratings are calculated) |
||
Line 30: | Line 30: | ||
As seen on the [[Darkcanuck/RRServer/Updates | updates page]], the ratings rebuild has completed and ratings/rankings are up-to-date within the last minute! (going to remove this section soon, don't need two "update" areas to maintain) -- [[User:Darkcanuck|Darkcanuck]] 18:26, 27 September 2008 (UTC) | As seen on the [[Darkcanuck/RRServer/Updates | updates page]], the ratings rebuild has completed and ratings/rankings are up-to-date within the last minute! (going to remove this section soon, don't need two "update" areas to maintain) -- [[User:Darkcanuck|Darkcanuck]] 18:26, 27 September 2008 (UTC) | ||
+ | |||
+ | |||
+ | == Elo Ratings on the Old/Current Server == | ||
+ | |||
+ | I'm trying to understand the old server's rating scheme. Thanks mostly to the work of [[User:nfwu | Nfwu]] and his commented [[http://robowiki.net/cgi-bin/robowiki?Nfwu/EloSim | EloSim]] code, plus details scattered about the old wiki, I've pieced together the following: | ||
+ | |||
+ | * New competitors start at a rating of 1600 | ||
+ | * The expected % score outcome of a pairing between bots A and B is given by E(A,B) = 1.0 / (1 + 20^(ratingA-ratingB)/800)) | ||
+ | * When a new pairing result is submitted to the server: | ||
+ | *# The new % score for bot A vs bot B New(A,B) = scoreA / (scoreA + scoreB) | ||
+ | *# The running pairing score of A vs B Pair(A,B)' = 0.7 * Pair(A,B) + 0.3 * New(A,B) | ||
+ | *# Then calculate the rating change for A by iterating over ''all'' ranked bots Ri: | ||
+ | *#* deltaRatingA += 3.0 * (Pair(A,Ri) - E(A,Ri) | ||
+ | *# Do the same for B | ||
+ | *# Update the ratings for A and B by adding the new delta to their current rating. |
Revision as of 21:11, 27 September 2008
Navigation: About | Updates | Ratings | Query API | Roadmap | Design | Develop | Known Issues
I'd like to open up a discussion on what ratings are meaningful in the rumble. There are various discussions scattered around the old wiki, but right now we have an opportunity to experiment with new things (on my server) and compare to existing results (ABC's server).
Here's what I've implemented on the new server and my reasons for doing so.
Contents
Average Percentage Score (APS)
This is probably the "purest" measure of a bot's performance. Under ideal conditions (i.e. full pairings and at least 20-50 battles per pairing to reduce variability), APS would allow an accurate comparison between all bots in the rumble. It's uncertain how well it works with less battles or incomplete pairings.
The server calculates APS for each bot by:
- taking the average percentage score of all battles against each opponent separately to get an APS for each pairing,
- then averaging all pairing scores to obtain the final average.
Glicko Rating System
Created by Mark Glickman, the system is described here: http://math.bu.edu/people/mg/glicko/glicko.doc/glicko.html The main difference from the Elo system is that each competitor has a ratings deviation (RD) in addition to their rating. New competitors start out with a high RD, which gradually drops as the rating settles. In Elo, winner and loser receive equal but opposite ratings adjustments after a battle, whereas in the Glicko system the adjustment is based on the RD value. A high RD results in a bigger adjustment, so new competitors are adjusted more quickly; established competitors ratings' should change more slowly.
The server implements the system to the letter, with the one exception that RD values in the rumble do not "decay" (increase) with inactivity. Scores start at 1500, RD values start at 350.
Current Status
Pairings are still far from complete on the new server, but the APS values for the most part are up-to-date with the latest battles. I just added the Glicko ratings yesterday and the server is incrementally building them from scratch, so expect them to catch up by tomorrow. You'll notice that the two don't correspond nicely yet. -- Darkcanuck 03:28, 26 September 2008 (UTC)
Squashed a bottleneck in the scoring update code and doubled the rate. Due to the increase in clients (plus I'm uploading melee results too, which flood the server with new data) we weren't catching up nearly fast enough. This morning we turned the corner: less unrated results than unrated! Once we catch up, newer stuff (including melee) will finally show up in the rankings. -- Darkcanuck 16:03, 26 September 2008 (UTC)
As seen on the updates page, the ratings rebuild has completed and ratings/rankings are up-to-date within the last minute! (going to remove this section soon, don't need two "update" areas to maintain) -- Darkcanuck 18:26, 27 September 2008 (UTC)
Elo Ratings on the Old/Current Server
I'm trying to understand the old server's rating scheme. Thanks mostly to the work of Nfwu and his commented [| EloSim] code, plus details scattered about the old wiki, I've pieced together the following:
- New competitors start at a rating of 1600
- The expected % score outcome of a pairing between bots A and B is given by E(A,B) = 1.0 / (1 + 20^(ratingA-ratingB)/800))
- When a new pairing result is submitted to the server:
- The new % score for bot A vs bot B New(A,B) = scoreA / (scoreA + scoreB)
- The running pairing score of A vs B Pair(A,B)' = 0.7 * Pair(A,B) + 0.3 * New(A,B)
- Then calculate the rating change for A by iterating over all ranked bots Ri:
- deltaRatingA += 3.0 * (Pair(A,Ri) - E(A,Ri)
- Do the same for B
- Update the ratings for A and B by adding the new delta to their current rating.