Effectiveness
I tried something like this for gun, but it didn't really work out. I ended up getting much better results using a square kernel. From Wikipedia, if you assume that your distribution is Gaussian then the optimal bandwidth would be:
- <math>h = \left(\frac{4\hat{\sigma}^5}{3n}\right)^{\frac{1}{5}} \approx 1.06 \hat{\sigma} n^{-1/5}</math>, where <math>\hat{\sigma}</math> is the standard deviation of the samples and <math>n</math> is the number of samples.
Perhaps this would work well for movement, where there is much less data to work with. It might also be necessary to add some sanity checks to the calculated h value in case there is only 1 or 2 samples, etc.
Of course, I'm fairly sure our distributions are not at all Gaussian, or even uni-modal, so this formula might not be relevant at all.
I've considered something like this before but, when you have less than 20 samples or so, your estimate of standard deviation itself is going to have a large amount of uncertainty. I suspect one would need to determine the "typical" standard deviation for most bots, use that as the initial value, and slowly transition to a value calculated from the data.
Regarding the distribution not being gaussian, indeed it wouldn't be... but I think that formula may still be somewhat roughly applicable if we apply a correction factor for how the uncertainty gets smaller near the maximum escape angle, perhaps modeled off of the behavior of the binomial distribution near the edges.
I actually think this formula might not adapt to multimodal distributions very well at all, because it will try to adapt the bandwidth to fit to one big, centered danger, which may not be a reasonable assumption to make, considering things like segmented VCS buffers and different ways of measuring similar attributes (vel+offset vs latvel+advvel comes to mind) in the gun systems we're trying to dodge.
Maybe clustering the logs into different groups based on their GFs, then using this formula on each of those, with each set of logs having its own bandwidth for the final kernel density estimate would be more effective.
Hmm... at least with sufficient data I'd agree that applying bandwidth estimation separately to different clusters would make sense. The cases where the appropriate bandwidth changes most significantly would be when there's limited data though (each additional data point doesn't change uncertainty as much once there are already many data points).