Mahalanobis Distance

Fragment of a discussion from User talk:AW/KNN
Jump to navigation Jump to search

I tried. Found 2 problems in using mahalanobis distance instead of euclidean distance.

First problem is finding a covariance matrix, which is the mahalanobis equivalent of a weight vector. I tried using sampled covariance matrixes, but they are not always valid. You need a positive semi-definite covariance matrix to use it in mahalanobis distance calculations. There are methods for finding a nearest positive semi-definite matrix from a sampled matrix, but the methods are too slow to be calculated online.

Second problem is optimizing KNN search. Methods like kd-trees only work with euclidean distance. I don´t know of any data structure designed for mahalanobis distances and which supports online learning at the same time.

The first problem can be addressed with genetic tuning to find a valid covariance matrix offline, but I didn´t try it yet. In theory it can be stronger than euclidean distance search due to higher amount of weights to tune.

The second problem is not a serious problem if the data set is not too big. I managed to make a bot work using only real waves (no tick waves), a sampled covariance matrix and simple linear KNN search without skipping turns.

The problem is the matrix was sometimes valid, sometimes not. It fallbacked to euclidean distance (using only the main diagonal) whenever the matrix became invalid. The bot performance decreased and I never uploaded it to the rumble. But again, genetic tuning with an offline covariance matrix may fix it.

MN21:53, 19 October 2012

Because the weights are always positive it should actually be possible to use a kD-Tree for Mahalanobis distance. Bounding box calculations can be done exactly like before, they just need to be adjusted by the weights.

Skilgannon22:39, 19 October 2012