Problem Bot Index
The highlighted comment was created in this revision.
I'm not sure, but I think PBI with Elo/Glicko was based on some magical formula between the ratings. Maybe an APS-based measure would just be the average score your neighbors get against that bot. So like if you're ranked #25, the average score against bot B for ranks 15-35 is your expected score.
Of course, if you're #1, you can only go from 2-11, but that's probably still useful info. And in that case (or in every case), you could shift everything so your average PBI is still 0.
PBI is the difference between expected score and real score. The expected score is based on difference between ratings.
- Zero difference is 50% expected score
- -800 difference is 1 to 20 odds, which is 1/(20+1) or 4,76% expected score
- -1600 difference is 1 to 20^2 odds or 0,25% expected score
- -Infinite difference is 0% expected score
Yeah, I was trying to think of a way of handling this elegantly without having to resort to a KNN type lookup or doing a whole ELO calculation. I was thinking something like:
Expected_for_bot_a = (bot_a_APS + (100 - bot_b_APS))/2
Eg: If BotA has APS of 70% and BotB 30% it predicts the 70%, 30% which seems intuitive to me. If BotA has APS of 80% and BotB 80% it predicts the 50%, 50% perfectly. If BotA has APS of 80% and BotB 60% it predicts 60%, 40%, which seems OK.
I think the trouble with this is that it assumes that there is a linear relationship between average score and pairwise score. I think it is more of a sigmoidal relationship, because once you have taken out the low hanging fruit there is less increase to draw from. Because of this I think a modified version of the above formula, something like:
Expected_for_bot_a = ((bot_a_APS^Q + (100 - bot_b_APS)^Q)/2)^(1/Q)
for some magic value of Q would probably be a better fit.
I've added a simple 'Vote' rankings page, where each bot votes for their worst pairing. The majority of bots don't get anything, predictably, but this is interesting for use in comparing who does the best. Again, this is a winner takes all ranking, so makes no differentiation between the bot that got 79.9% and 50% against another, where the worst pairing was 80%, and this makes me uncomfortable as there is clearly lost information. Perhaps I should change it so that every bot gets a vote of weight 100*pair%/worst pair%
, but I'll leave it as it is for a day or so.
The batch pairings get updated once an hour for any rumble which has had battles since the last pairs run.