Difference between revisions of "Talk:Offline batch ELO rating system"

From Robowiki
Jump to navigation Jump to search
m
Line 2: Line 2:
 
One thing I think is important to note is, I'm pretty sure the fact that you're rounding the result to a win/loss/draw probably explains the *vast* majority of the difference between this and the APS/ELO/Glicko-2 on the rumble server. I suspect that APS or the iterative roborumble ELO result wouldn't be that different if they performed the same rounding. <br/>
 
One thing I think is important to note is, I'm pretty sure the fact that you're rounding the result to a win/loss/draw probably explains the *vast* majority of the difference between this and the APS/ELO/Glicko-2 on the rumble server. I suspect that APS or the iterative roborumble ELO result wouldn't be that different if they performed the same rounding. <br/>
 
--[[User:Rednaxela|Rednaxela]] 13:12, 12 August 2011 (UTC)
 
--[[User:Rednaxela|Rednaxela]] 13:12, 12 August 2011 (UTC)
 +
 +
:APS assumes a bots strength is proportional to the proportional difference in scores.
 +
 +
:If a bot scores 2 and the opponent scores 1 (67%/33%), it is stronger than if it scored 3 and bot B scored 2 (60%/40%). Both bots increased the score by the same amount, but the APS system assumes one is stronger than the other.
 +
 +
:If a bot scores 100% against MyFirstRobot and 40% against DrussGT, it is stronger than if it scored 60% against MyFirstRobot and 60% against DrussGT. APS assumes that +40% against MyFirstRobot is worth more than the +20% against DrussGT, leading to king-making.
 +
 +
:Arpard Elo original rating system inferred strength on frequency of wins/losses alone, not proportional score difference. If a bot beats another, it is only better, not a little better or a lot better. --[[User:MN|MN]] 16:32, 12 August 2011 (UTC)

Revision as of 17:34, 12 August 2011

Quite interesting. I fail to see how this makes "less assumptions on results than APS" however, since I don't believe APS makes any assumptions whatsoever, being a simple averaging of all pairings a robot is involved in. I don't see this as being less biased. Similarly valid certainly, but not less biased.
One thing I think is important to note is, I'm pretty sure the fact that you're rounding the result to a win/loss/draw probably explains the *vast* majority of the difference between this and the APS/ELO/Glicko-2 on the rumble server. I suspect that APS or the iterative roborumble ELO result wouldn't be that different if they performed the same rounding.
--Rednaxela 13:12, 12 August 2011 (UTC)

APS assumes a bots strength is proportional to the proportional difference in scores.
If a bot scores 2 and the opponent scores 1 (67%/33%), it is stronger than if it scored 3 and bot B scored 2 (60%/40%). Both bots increased the score by the same amount, but the APS system assumes one is stronger than the other.
If a bot scores 100% against MyFirstRobot and 40% against DrussGT, it is stronger than if it scored 60% against MyFirstRobot and 60% against DrussGT. APS assumes that +40% against MyFirstRobot is worth more than the +20% against DrussGT, leading to king-making.
Arpard Elo original rating system inferred strength on frequency of wins/losses alone, not proportional score difference. If a bot beats another, it is only better, not a little better or a lot better. --MN 16:32, 12 August 2011 (UTC)