Approximate ELO
← Thread:User talk:Chase-san/Approximate ELO/reply (4)
You do not have permission to edit this page, for the following reasons:
You can view and copy the source of this page.
Return to Thread:User talk:Chase-san/Approximate ELO/reply (4).
The main advantage of ELO/Glicko-2 over other systems is they work well with incomplete pairings, and yes, we almost don´t have problems related to incomplete pairings.
But there are other smaller advantages, like higher accuracy with scores near 0% or 100% due to its S shaped (logistic distribution) function. Being able to "forget" the past also makes ELO/Glicko-2 adequate in meleerumble where historical contamination is an issue.
And I would really like to see a Ranked Pairs ranking. This would bring something new to the rumble. It is superior to the current PL (Copeland) system in almost every way, except maybe CPU/memory performance. I was thinking in building another RoboRumble server myself, designed specifically to calculate complex rankings like Ranked Pairs. But didn´t find the time to do it yet.
The thing about ranked pairs is that I haven't seen any which provide absolute scores per bot, rather than relative scores - I guess this is just part of the definition? Because of this there isn't any way to make partial improvements however, which makes it meaningless as a testing metric. Also, they are usually difficult to understand (conceptually), so understanding which pairing is causing a score loss is harder. That's why I like the Win% and Vote% as metrics for robustness, and APS for overall strength, because they are easy to understand, calculate and identify problem bots.
In Ranked Pairs, competitors causing problems are always above in the ranking, with the one causing most problems almost always directly above. Competitors below in the ranking never cause problems.
Unless you are competing for the bottom, when you reverse the logic. Competitors causing problems are always below, and competitors above never cause problems.
Although the calculation is complex, the resulting ranking follows a rather straighforward transitive logic. Winners staying above losers, with this always being true between adjacent competitors in the ranking.
This is due to 3 criteria being met at the same time, ISDA, LIIA and cloneproof.
But Ranked Pairs is not meant to replace APS or ELO, it is meant to replace PL (Copeland). APS and ELO give a rating which helps track improvement while Ranked Pairs and Copeland don´t.