Darkcanuck/RRServer/Ratings

From Robowiki
< Darkcanuck‎ | RRServer
Revision as of 17:03, 26 September 2008 by Darkcanuck (talk | contribs) (ratings status update)
Jump to navigation Jump to search

Navigation: About | Updates | Ratings | Query API | Roadmap | Design | Develop | Known Issues


I'd like to open up a discussion on what ratings are meaningful in the rumble. There are various discussions scattered around the old wiki, but right now we have an opportunity to experiment with new things (on my server) and compare to existing results (ABC's server).

Here's what I've implemented on the new server and my reasons for doing so.


Average Percentage Score (APS)

This is probably the "purest" measure of a bot's performance. Under ideal conditions (i.e. full pairings and at least 20-50 battles per pairing to reduce variability), APS would allow an accurate comparison between all bots in the rumble. It's uncertain how well it works with less battles or incomplete pairings.

The server calculates APS for each bot by:

  • taking the average percentage score of all battles against each opponent separately to get an APS for each pairing,
  • then averaging all pairing scores to obtain the final average.


Glicko Rating System

Created by Mark Glickman, the system is described here: http://math.bu.edu/people/mg/glicko/glicko.doc/glicko.html The main difference from the Elo system is that each competitor has a ratings deviation (RD) in addition to their rating. New competitors start out with a high RD, which gradually drops as the rating settles. In Elo, winner and loser receive equal but opposite ratings adjustments after a battle, whereas in the Glicko system the adjustment is based on the RD value. A high RD results in a bigger adjustment, so new competitors are adjusted more quickly; established competitors ratings' should change more slowly.

The server implements the system to the letter, with the one exception that RD values in the rumble do not "decay" (increase) with inactivity. Scores start at 1500, RD values start at 350.


Current Status

Pairings are still far from complete on the new server, but the APS values for the most part are up-to-date with the latest battles. I just added the Glicko ratings yesterday and the server is incrementally building them from scratch, so expect them to catch up by tomorrow. You'll notice that the two don't correspond nicely yet. -- Darkcanuck 03:28, 26 September 2008 (UTC)

Squashed a bottleneck in the scoring update code and doubled the rate. Due to the increase in clients (plus I'm uploading melee results too, which flood the server with new data) we weren't catching up nearly fast enough. This morning we turned the corner: less unrated results than unrated! Once we catch up, newer stuff (including melee) will finally show up in the rankings. -- Darkcanuck 16:03, 26 September 2008 (UTC)