Problem Bot Index

Jump to navigation Jump to search
Revision as of 31 May 2012 at 13:21.
The highlighted comment was created in this revision.

Problem Bot Index

I'm not sure, but I think PBI with Elo/Glicko was based on some magical formula between the ratings. Maybe an APS-based measure would just be the average score your neighbors get against that bot. So like if you're ranked #25, the average score against bot B for ranks 15-35 is your expected score.

Of course, if you're #1, you can only go from 2-11, but that's probably still useful info. And in that case (or in every case), you could shift everything so your average PBI is still 0.

    Voidious21:17, 30 May 2012

    PBI is the difference between expected score and real score. The expected score is based on difference between ratings.

    Zero difference is 50% expected score
    -800 difference is 1 to 20 odds, which is 1/(20+1) or 4,76% expected score
    -1600 difference is 1 to 20^2 odds or 0,25% expected score
    -Infinite difference is 0% expected score
      MN23:23, 30 May 2012
       

      Yeah, I was trying to think of a way of handling this elegantly without having to resort to a KNN type lookup or doing a whole ELO calculation. I was thinking something like:

      Expected_for_bot_a = (bot_a_APS + (100 - bot_b_APS))/2

      Eg: If BotA has APS of 70% and BotB 30% it predicts the 70%, 30% which seems intuitive to me. If BotA has APS of 80% and BotB 80% it predicts the 50%, 50% perfectly. If BotA has APS of 80% and BotB 60% it predicts 60%, 40%, which seems OK.

      I think the trouble with this is that it assumes that there is a linear relationship between average score and pairwise score. I think it is more of a sigmoidal relationship, because once you have taken out the low hanging fruit there is less increase to draw from. Because of this I think a modified version of the above formula, something like: Expected_for_bot_a = ((bot_a_APS^Q + (100 - bot_b_APS)^Q)/2)^(1/Q) for some magic value of Q would probably be a better fit.

      I've added a simple 'Vote' rankings page, where each bot votes for their worst pairing. The majority of bots don't get anything, predictably, but this is interesting for use in comparing who does the best. Again, this is a winner takes all ranking, so makes no differentiation between the bot that got 79.9% and 50% against another, where the worst pairing was 80%, and this makes me uncomfortable as there is clearly lost information. Perhaps I should change it so that every bot gets a vote of weight 100*pair%/worst pair%, but I'll leave it as it is for a day or so.

      The batch pairings get updated once an hour for any rumble which has had battles since the last batch run.

        Skilgannon09:40, 31 May 2012
         

        Thinking on this more, I actually really like the KNN idea. It's the only one that really tells you "you can and should be doing better against this bot", as opposed to "this bot might just have a weird score profile". (RamBots are the perfect/extreme example of this - they can show up as Problem Bots even if you're doing well against them.)

        I know when I'm trying to figure out who I could do better against, I don't look at PBI, I compare to DrussGT. ;) I understand it would be a lot of calculations, but it should still be simple to code up, and it's all just basic math operations.

          Voidious14:16, 31 May 2012
           

          Another thought is, if you already have the best score vs any bot, a useful number might be that score minus your score. Calling it "PBI" would be a misnomer, but It tells you how much room you have to improve.

            Voidious14:21, 31 May 2012