Talk:Offline batch ELO rating system

Quite interesting. I fail to see how this makes "less assumptions on results than APS" however, since I don't believe APS makes any assumptions whatsoever, being a simple averaging of all pairings a robot is involved in. I don't see this as being less biased. Similarly valid certainly, but not less biased.
One thing I think is important to note is, I'm pretty sure the fact that you're rounding the result to a win/loss/draw probably explains the *vast* majority of the difference between this and the APS/ELO/Glicko-2 on the rumble server. I suspect that APS or the iterative roborumble ELO result wouldn't be that different if they performed the same rounding.
--Rednaxela 13:12, 12 August 2011 (UTC)

APS assumes a bots strength is proportional to the proportional difference in scores.

If a bot scores 2 and the opponent scores 1 (67%/33%), it is stronger than if it scored 3 and bot B scored 2 (60%/40%). Both bots increased the score by the same amount, but the APS system assumes one is stronger than the other.

If a bot scores 100% against MyFirstRobot and 40% against DrussGT, it is stronger than if it scored 60% against MyFirstRobot and 60% against DrussGT. APS assumes that +40% against MyFirstRobot is worth more than the +20% against DrussGT, leading to king-making.

Arpard Elo original rating system inferred strength on frequency of wins/losses alone, not proportional score difference. If a bot beats another, it is only better, not a little better or a lot better. --MN 16:32, 12 August 2011 (UTC)

If it scores 100% against MyFirstRobot and 40% against DrussGT, it is stronger than if it scored 60% against MyFirstRobot and 60% against DrussGT. In my opinion anyway, getting 100% against any robot is more time consuming and bug hunting and hard work then getting an okayish score against many robots (which many rambots, mirror bots and and random targeters can do). Are you telling me you think a random targeter is stronger then a highly tuned statistical targeting? — Chase-san 17:36, 12 August 2011 (UTC)

Talk:Offline batch ELO rating system

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools