Mahalanobis Distance

Jump to navigation Jump to search
Revision as of 19 October 2012 at 16:53.
The highlighted comment was created in this revision.

Mahalanobis Distance

So, I was reading about Mahalanobis distance and it sounds as if it is designed for classifying data that have a definite center about which they are clustered. Has anyone tried using this with robocode?

    AW00:16, 19 October 2012

    I've thought about it, but it needs a covariance matrix and it is slow to build a covariance matrix of the entire tree. There might be faster approximate methods though...

      Skilgannon10:08, 19 October 2012
       

      Hmm, Mahalanobis distance looks interesting. I have considered using principal component analysis before, which I'd expect to give somewhat similar results (i.e. both normalize for covariance), but never got around to trying it.

      Regarding performance issues building a covariance matrix for the entire tree, I also think that doing research into incremental methods of building/updating a covariance matrix may be key to making mahalanobis distance practical for robocode usage.

      I also wonder if it may make the most sense to make smaller covariance matrices based on either the most recent X data points, or certain regions in the data, etc. They would be faster to compute, and may be able to take advantage of how the ideal weight of each axis can vary over the search space.

      Edit: Oh, one other thought that comes to mind is... perhaps it would make sense to precompute a covariance matrix for your targeting attributes, based on a large data set from battles with many robots? It wouldn't be as nice as computing it on-the-fly for what you're against, but I'd expect it to have some positive impact anyway.

        Rednaxela15:16, 19 October 2012
         

        I've spent many a CPU-hour experimenting with quite a few clustering algorithms and never been able to exceed the performance of KNN for Robocode targeting, and KNN is also much faster. So I'm a little skeptical at this point about approaches that assume a data point falls into a specific cluster. But this is interesting stuff... definitely WaveSim (or similar) makes testing this kind of thing a lot more feasible.

          Voidious18:53, 19 October 2012