Darkcanuck/RRServer/Ratings

From Robowiki
< Darkcanuck‎ | RRServer
Revision as of 05:28, 26 September 2008 by Darkcanuck (talk | contribs) (ratings discussion)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

I'd like to open up a discussion on what ratings are meaningful in the rumble. There are various discussions scattered around the old wiki, but right now we have an opportunity to experiment with new things (on my server) and compare to existing results (ABC's server).

Here's what I've implemented on the new server and my reasons for doing so.


Average Percentage Score (APS)

This is probably the "purest" measure of a bot's performance. Under ideal conditions (i.e. full pairings and at least 20-50 battles per pairing to reduce variability), APS would allow an accurate comparison between all bots in the rumble. It's uncertain how well it works with less battles or incomplete pairings.

The server calculates APS for each bot by:

  • taking the average percentage score of all battles against each opponent separately to get an APS for each pairing,
  • then averaging all pairing scores to obtain the final average.


Glicko Rating System

Created by Mark Glickman, the system is described here: http://math.bu.edu/people/mg/glicko/glicko.doc/glicko.html The main difference from the Elo system is that each competitor has a ratings deviation (RD) in addition to their rating. New competitors start out with a high RD, which gradually drops as the rating settles. In Elo, winner and loser receive equal but opposite ratings adjustments after a battle, whereas in the Glicko system the adjustment is based on the RD value. A high RD results in a bigger adjustment, so new competitors are adjusted more quickly; established competitors ratings' should change more slowly.

The server implements the system to the letter, with the one exception that RD values in the rumble do not "decay" (increase) with inactivity. Scores start at 1500, RD values start at 350.


Current Status

Pairings are still far from complete on the new server, but the APS values for the most part are up-to-date with the latest battles. I just added the Glicko ratings yesterday and the server is incrementally building them from scratch, so expect them to catch up by tomorrow. You'll notice that the two don't correspond nicely yet. -- Darkcanuck 03:28, 26 September 2008 (UTC)