Talk:LiteRumble
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Problem Bot Index | 7 | 13:49, 1 June 2012 |
First page |
Previous page |
Next page |
Last page |
I'm not sure, but I think PBI with Elo/Glicko was based on some magical formula between the ratings. Maybe an APS-based measure would just be the average score your neighbors get against that bot. So like if you're ranked #25, the average score against bot B for ranks 15-35 is your expected score.
Of course, if you're #1, you can only go from 2-11, but that's probably still useful info. And in that case (or in every case), you could shift everything so your average PBI is still 0.
PBI is the difference between expected score and real score. The expected score is based on difference between ratings.
- Zero difference is 50% expected score
- -800 difference is 1 to 20 odds, which is 1/(20+1) or 4,76% expected score
- -1600 difference is 1 to 20^2 odds or 0,25% expected score
- -Infinite difference is 0% expected score
Yeah, I was trying to think of a way of handling this elegantly without having to resort to a KNN type lookup or doing a whole ELO calculation. I was thinking something like:
Expected_for_bot_a = (bot_a_APS + (100 - bot_b_APS))/2
Eg: If BotA has APS of 70% and BotB 30% it predicts the 70%, 30% which seems intuitive to me. If BotA has APS of 80% and BotB 80% it predicts the 50%, 50% perfectly. If BotA has APS of 80% and BotB 60% it predicts 60%, 40%, which seems OK.
I think the trouble with this is that it assumes that there is a linear relationship between average score and pairwise score. I think it is more of a sigmoidal relationship, because once you have taken out the low hanging fruit there is less increase to draw from. Because of this I think a modified version of the above formula, something like:
Expected_for_bot_a = ((bot_a_APS^Q + (100 - bot_b_APS)^Q)/2)^(1/Q)
for some magic value of Q would probably be a better fit.
I've added a simple 'Vote' rankings page, where each bot votes for their worst pairing. The majority of bots don't get anything, predictably, but this is interesting for use in comparing who does the best. Again, this is a winner takes all ranking, so makes no differentiation between the bot that got 79.9% and 50% against another, where the worst pairing was 80%, and this makes me uncomfortable as there is clearly lost information. Perhaps I should change it so that every bot gets a vote of weight 100*pair%/worst pair%
, but I'll leave it as it is for a day or so.
The batch pairings get updated once an hour for any rumble which has had battles since the last batch run.
If you try to figure out a sigmoidal relationship, you will eventually end with the same logistic distribution used in Elo and Glicko.
Thinking on this more, I actually really like the KNN idea. It's the only one that really tells you "you can and should be doing better against this bot", as opposed to "this bot might just have a weird score profile". (RamBots are the perfect/extreme example of this - they can show up as Problem Bots even if you're doing well against them.)
I know when I'm trying to figure out who I could do better against, I don't look at PBI, I compare to DrussGT. ;) I understand it would be a lot of calculations, but it should still be simple to code up, and it's all just basic math operations.
Another thought is, if you already have the best score vs any bot, a useful number might be that score minus your score. Calling it "PBI" would be a misnomer, but It tells you how much room you have to improve.
If you look at the site, you might just notice the errors ;-) That's because I ran out of Datastore Read quota. I think it's because of the batch rankings - before them I never even got to 20% of read quota. So I've changed batch rankings to every 6 hours, so in about 17 hours the quota will reset and we can see how it works =)
Since I'm only doing updates once every 6 hours I should have lots of quota for long, tedious calculations. So I'll whip up a KNN-based PBI over the next few days to see how it does. Any ideas on how to calculate K? How about sqrt(participants)?
It seems we have similar ideas about 'max improvement indexes'. Thinking further on my comment above about my %pair/(%worst pair) idea, I'm thinking about an interesting new ranking system that I'd like to call 'Average Normalised Percentage Pairs' or ANPP. Each bot normalises all of their pairings by subtracting the min score and then dividing by (max - min). Your score is than calculated as the average of your pairing against each (100 - enemy normalised score). Thus, if the best anybody does is 75% against a rambot, and the worst anybody does is 30%, 30% will be treated as 0 and 75% treated as 100%. This would make it very easy to see problembots, as if your NPP against them is less than your average NPP, you should focus on them more. Thus, the worst bot against everybody would get 0%, and the best bot against everybody would get 100%.
Just thought I'd say... I rather do like the notion of KNN-based "expected score" system. The sigmoidal relationship given by Elo/Glicko is a reasonable fit for predicting score based each bot's overall rating, but it does really miss the sort of interesting subtitles/patterns that a system that considers multiple axis of strength would.
First page |
Previous page |
Next page |
Last page |