Offline batch ELO rating system

From Robowiki
Revision as of 14:45, 12 August 2011 by MN (talk | contribs) (Offline batch ELO rating system)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Offline batch ELO rating system

Using data handled by Darkcanuck on 09 August 2011, I tried a different rating system algorithm I was thinking for a long time. I hope people here like it.

How does the algorithm works?

- It is heavily based on the chess ELO rating system. Scores per battle are either a win (1), draw (0.5) or loss (0). For expected scores, the same logistic distribution formula used in the chess ELO rating system (Mi=log(10) and s=400) is used.

- The difference comes in adjusting the ratings for each competitor. I replaced the online K-based adjust algorithm with a offline Rprop batch learning algorithm similar to the ones used in neural networks (error = sum of scores - sum of expected scores). All battles are considered to happen in the same rating period, and the algorithm keeps iterating on all battles over and over until the rankings converge to a stable setting. The idea is to squeeze as much accuracy as possible for a given battle data through the use of raw CPU power.

- After the rankings stabilize, an anti-inflation algorithm shifts all ratings up or down so they all average to 1500.

What are the advantages over the current rating systems?

- It is data ordering independent, contrary to classic ELO and Glicko-2 systems. This is one the main strengths of offline batch learning algorithms. No problem bot rating drift. But it is only possible because all battle data is available for processing.

- It is still accurate with incomplete pairings, contrary to APS and PL systems.

- It makes less assumptions on results than APS and PL systems, making it less biased. I believe it also makes the system less king-making prone (weak competitors interfering with strong competitor scores).

- Inflation control, contrary to classic ELO and Glicko-2 systems.

What are the disadvantages?

- It is slow, as any offline batch learning algorithm. On a 3.4ghz CPU, it took 10 minutes iterating once over all 27 million battles from Darkcanuck/RRServer counting all wins/draws/losses from all pairings. It took another minute and a half iterating a few hundred times over the pairings. And another half minute generating XHTML pages. This for 1v1 General league alone. There is a lot of room for optimization though.

- By making less assumptions on results, it needs more data for the same accuracy. Although I believe the extra accuracy achieved with batch processing compensates for that.


The resulting XHTML files are available for download here: http://sites.google.com/site/mnrobocode/files/rankings.rar --MN 12:30, 12 August 2011 (UTC)