Effectiveness

I've considered something like this before but, when you have less than 20 samples or so, your estimate of standard deviation itself is going to have a large amount of uncertainty. I suspect one would need to determine the "typical" standard deviation for most bots, use that as the initial value, and slowly transition to a value calculated from the data.

Regarding the distribution not being gaussian, indeed it wouldn't be... but I think that formula may still be somewhat roughly applicable if we apply a correction factor for how the uncertainty gets smaller near the maximum escape angle, perhaps modeled off of the behavior of the binomial distribution near the edges.