Approximate ELO

Jump to navigation Jump to search

I'd be fine with a new ELO formula to replace the currently useless one, but I can't see myself ever again caring about ELO or Glicko over straight APS for the main overall score rating. APS is clear, stable, accurate and meaningful, and ELO/Glicko just seem like attempts to solve a problem we don't have. As far as new permanent scoring methods, I'm much more interested in the Condorcet / Schulze / APW ideas brought up in the discussions on Talk:Offline batch ELO rating system, Talk:King maker, and Talk:Darkcanuck/RRServer/Ratings#Schulze_.26_Tideman_Condorcet. I also really like what Skilgannon did with ANPP in LiteRumble, where 0 is the worst score against a bot and 100 is the best score against a bot, scaling linearly.

Voidious15:45, 29 October 2012

You do not have permission to edit this page, for the following reasons:

  • The action you have requested is limited to users in the group: Users.
  • You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.

You can view and copy the source of this page.

Return to Thread:User talk:Chase-san/Approximate ELO/reply (6).

 

The thing about ranked pairs is that I haven't seen any which provide absolute scores per bot, rather than relative scores - I guess this is just part of the definition? Because of this there isn't any way to make partial improvements however, which makes it meaningless as a testing metric. Also, they are usually difficult to understand (conceptually), so understanding which pairing is causing a score loss is harder. That's why I like the Win% and Vote% as metrics for robustness, and APS for overall strength, because they are easy to understand, calculate and identify problem bots.

Skilgannon00:11, 30 October 2012

In Ranked Pairs, competitors causing problems are always above in the ranking, with the one causing most problems almost always directly above. Competitors below in the ranking never cause problems.

Unless you are competing for the bottom, when you reverse the logic. Competitors causing problems are always below, and competitors above never cause problems.

Although the calculation is complex, the resulting ranking follows a rather straighforward transitive logic. Winners staying above losers, with this always being true between adjacent competitors in the ranking.

This is due to 3 criteria being met at the same time, ISDA, LIIA and cloneproof.

But Ranked Pairs is not meant to replace APS or ELO, it is meant to replace PL (Copeland). APS and ELO give a rating which helps track improvement while Ranked Pairs and Copeland don´t.

MN15:18, 30 October 2012