Approximate ELO

Jump to navigation Jump to search

I'd be fine with a new ELO formula to replace the currently useless one, but I can't see myself ever again caring about ELO or Glicko over straight APS for the main overall score rating. APS is clear, stable, accurate and meaningful, and ELO/Glicko just seem like attempts to solve a problem we don't have. As far as new permanent scoring methods, I'm much more interested in the Condorcet / Schulze / APW ideas brought up in the discussions on Talk:Offline batch ELO rating system, Talk:King maker, and Talk:Darkcanuck/RRServer/Ratings#Schulze_.26_Tideman_Condorcet. I also really like what Skilgannon did with ANPP in LiteRumble, where 0 is the worst score against a bot and 100 is the best score against a bot, scaling linearly.

Voidious14:45, 29 October 2012

The main advantage of ELO/Glicko-2 over other systems is they work well with incomplete pairings, and yes, we almost don´t have problems related to incomplete pairings.

But there are other smaller advantages, like higher accuracy with scores near 0% or 100% due to its S shaped (logistic distribution) function. Being able to "forget" the past also makes ELO/Glicko-2 adequate in meleerumble where historical contamination is an issue.

And I would really like to see a Ranked Pairs ranking. This would bring something new to the rumble. It is superior to the current PL (Copeland) system in almost every way, except maybe CPU/memory performance. I was thinking in building another RoboRumble server myself, designed specifically to calculate complex rankings like Ranked Pairs. But didn´t find the time to do it yet.

MN15:54, 29 October 2012
 

The thing about ranked pairs is that I haven't seen any which provide absolute scores per bot, rather than relative scores - I guess this is just part of the definition? Because of this there isn't any way to make partial improvements however, which makes it meaningless as a testing metric. Also, they are usually difficult to understand (conceptually), so understanding which pairing is causing a score loss is harder. That's why I like the Win% and Vote% as metrics for robustness, and APS for overall strength, because they are easy to understand, calculate and identify problem bots.

Skilgannon23:11, 29 October 2012

You do not have permission to edit this page, for the following reasons:

  • The action you have requested is limited to users in the group: Users.
  • You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.

You can view and copy the source of this page.

Return to Thread:User talk:Chase-san/Approximate ELO/reply (11).