Outlier resistant APS system

Jump to navigation Jump to search
Revision as of 16 February 2012 at 16:52.
The highlighted comment was created in this revision.

Outlier resistant APS system

Bad uploads are becoming a recurrent issue.

But there is a way to shield the ranking from these uploads using median instead of mean when calculating pairing APS, as long as bad uploads don´t outnumber the others. The drawback would be a performance hit in CPU/database during uploads.

    MN14:18, 16 February 2012

    I think an outlier resistant APS system may be good, but I do have a concern that using the median may 1) distort scores when valid data would cause a distribution that has a skew, and 2) In the cases where there are no outliers for it to fix anyway, it would generate more noisy values (The median generally has larger fluctuations than the mean as samples are added).

    I'd think it may be worth considering statistical methods to calculate the probability of a data point being an outlier, and ignoring it if it's beyond a threshold. It may be possible for such methods to not alter the means of skew caused by valid data, and will have smaller score fluctuations.

    Another thought is, regardless of if we change the APS system or not, it may make sense for the rumble to have a page that lists recent outlier results, to make it easier to spot them.

      Rednaxela17:46, 16 February 2012
       

      Taking that a step further, you could take the probability of a data point being wrong and look at how many of those are coming from each user, and over a certain threshold, all their uploads (or all within X hours of this happening) are ignored. (Or an email is fired off to Darkcanuck to look into it. =)) I like the idea of public page listing outlier results, too.

      The median idea also seems good though, and very simple. I'd be curious to see if/how much the Rumble scores change using median instead of mean. My guess is by no noticeable amount.

        Voidious17:52, 16 February 2012