Outlier resistant APS system

About the skewed distributions, fair enough. I still am concerned about the greater noise of medians though.

The more sophisticated method that was coming to my mind, was calculating the [[wikipedia:Standard score|z-score]] of each sample per pairing, tossing out results that have too extreme of a z-score value, and using the mean of the remaining samples. The reason this appeals to me, is because it changes the existing scoring system as little as possible.

Most bad results we see are near-zero scores which should be quite distinctly detected by a z-score test, so reliably tossing them out without changing the overall scoring system would be quite doable I think.