Mahalanobis Distance

Jump to navigation Jump to search

Mahalanobis Distance

So, I was reading about Mahalanobis distance and it sounds as if it is designed for classifying data that have a definite center about which they are clustered. Has anyone tried using this with robocode?

AW23:16, 18 October 2012

I've thought about it, but it needs a covariance matrix and it is slow to build a covariance matrix of the entire tree. There might be faster approximate methods though...

Skilgannon09:08, 19 October 2012
 

Hmm, Mahalanobis distance looks interesting. I have considered using principal component analysis before, which I'd expect to give somewhat similar results (i.e. both normalize for covariance), but never got around to trying it.

Regarding performance issues building a covariance matrix for the entire tree, I also think that doing research into incremental methods of building/updating a covariance matrix may be key to making mahalanobis distance practical for robocode usage.

I also wonder if it may make the most sense to make smaller covariance matrices based on either the most recent X data points, or certain regions in the data, etc. They would be faster to compute, and may be able to take advantage of how the ideal weight of each axis can vary over the search space.

Edit: Oh, one other thought that comes to mind is... perhaps it would make sense to precompute a covariance matrix for your targeting attributes, based on a large data set from battles with many robots? It wouldn't be as nice as computing it on-the-fly for what you're against, but I'd expect it to have some positive impact anyway.

Rednaxela14:16, 19 October 2012
 

I've spent many a CPU-hour experimenting with quite a few clustering algorithms and never been able to exceed the performance of KNN for Robocode targeting, and KNN is also much faster. So I'm a little skeptical at this point about approaches that assume a data point falls into a specific cluster. But this is interesting stuff... definitely WaveSim (or similar) makes testing this kind of thing a lot more feasible.

Voidious17:53, 19 October 2012

Mahalanobis distance can be combined with KNN search. Swap point distance calculation from euclidean to mahalanobis and you have it.

Another way of viewing mahalanobis distance is it rotates the data space, or have "diagonal" weights. Which is good when classifiers are not completely independent from each other.

i.e. lateral velocity and vertical velocity. If lateral velocity is 8, vertical velocity is always 0. Coordinates where both velocities are 8 are always empty. Mahalanobis distance then uses a "diagonal" weight so 5.6 lateral velocity and 5.6 vertical velocity (5.6, 5.6) is normalized as (8, 8). While (8, 0) is still normalized as (8, 0).

MN21:23, 19 October 2012
 

I've also been a bit wary of class-based methods as a solution for our problem, which is actually pure regression. However, I do have one idea which might just make using classes worth it - spectral clustering. On certain types of datasets spectral clustering can get ~30% better performance than pure convex-Euclidean-geometry based classification systems, and Robocode with its patterns and limited possibilities for positioning setups might just be one of those. The problem is getting spectral clustering to run at a reasonable speed - it depends on getting the eigenvectors for a NxN matrix (and never mind populating that matrix with e^x values first), which is O(n^3) and for us with our 20k datapoints isn't really feasible without some major modifications. We could use a sparse matrix and lots of kNN instead of a brute force for the population of that matrix, but it's still going to be horribly slow.

If anybody wants to mess around with spectral clustering here is a good paper on it (yes, I've been thinking about this for a while): [1] and you'll need a matrix library like EJML or Colt - EJML is faster for dense matrices but Colt supports sparse matrices which might be a necessary component of our solution to this problem.

As always, an incremental solution would probably be best =) And sorry to hijack a Mahalanobis thread!

Skilgannon18:34, 19 October 2012
 

You do not have permission to edit this page, for the following reasons:

  • The action you have requested is limited to users in the group: Users.
  • You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.

You can view and copy the source of this page.

Return to Thread:User talk:AW/KNN/Mahalanobis Distance/reply (5).

Because the weights are always positive it should actually be possible to use a kD-Tree for Mahalanobis distance. Bounding box calculations can be done exactly like before, they just need to be adjusted by the weights.

Skilgannon21:39, 19 October 2012