Effectiveness

Jump to navigation Jump to search

I tried something like this for gun, but it didn't really work out. I ended up getting much better results using a square kernel. From Wikipedia, if you assume that your distribution is Gaussian then the optimal bandwidth would be:

<math>h = \left(\frac{4\hat{\sigma}^5}{3n}\right)^{\frac{1}{5}} \approx 1.06 \hat{\sigma} n^{-1/5}</math>, where <math>\hat{\sigma}</math> is the standard deviation of the samples and <math>n</math> is the number of samples.

Perhaps this would work well for movement, where there is much less data to work with. It might also be necessary to add some sanity checks to the calculated h value in case there is only 1 or 2 samples, etc.

Of course, I'm fairly sure our distributions are not at all Gaussian, or even uni-modal, so this formula might not be relevant at all.

Skilgannon12:00, 23 September 2012

You do not have permission to edit this page, for the following reasons:

  • The action you have requested is limited to users in the group: Users.
  • You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.

You can view and copy the source of this page.

Return to Thread:Talk:Variable bandwidth/Effectiveness/reply (5).

I actually think this formula might not adapt to multimodal distributions very well at all, because it will try to adapt the bandwidth to fit to one big, centered danger, which may not be a reasonable assumption to make, considering things like segmented VCS buffers and different ways of measuring similar attributes (vel+offset vs latvel+advvel comes to mind) in the gun systems we're trying to dodge.

Maybe clustering the logs into different groups based on their GFs, then using this formula on each of those, with each set of logs having its own bandwidth for the final kernel density estimate would be more effective.

Skilgannon16:58, 23 September 2012

Hmm... at least with sufficient data I'd agree that applying bandwidth estimation separately to different clusters would make sense. The cases where the appropriate bandwidth changes most significantly would be when there's limited data though (each additional data point doesn't change uncertainty as much once there are already many data points).

Rednaxela17:05, 23 September 2012