User:Duyn/kd-tree Tutorial

From Robowiki
< User:Duyn
Revision as of 05:03, 27 February 2010 by Duyn (talk | contribs) (Beginning of a new tutorial.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

[this is the beginning of a tutorial on writing a k-d tree].

This tutorial will walk you through the process of writing a k-d tree for k-nearest neighbour search. The tree here will hold multiple items in each leaf, splitting only when a leaf overflows. It will split on the mean of the dimension with the largest variance. There are already several k-d trees on this wiki, see some of the others for ideas.

A k-d tree is a binary tree which successively splits a k-dimensional space in half. This lets us speed up a nearest neighbour search by not examining points in partitions which are too far away. A k-nearest neighbour search can be implemented in from a nearest neighbour algorithm by not shrinking the search radius until k items have been found. The rest of this tutorial will refer to both as nearest neighbour queries.

An Exemplar class

We will call each item in our k-d tree an exemplar. Each exemplar has a domain—its spatial co-ordinates in the k-d tree's space. Each exemplar could also have an arbitrary payload, but our tree does not need to know about that. It will only handle storing exemplars based on their domain and returning them in a nearest neighbour search.

You might already have a class somewhere called Point which handles 2D co-ordinates. This terminology avoids conflict with that.

public class Exemplar {
  public final double[] domain;

  public Exemplar(final double[] coords) {
    this.domain = coords;
  }

  // Short hand. Shorter than calling Arrays.equals() each time.
  public boolean domainEquals(final Exemplar other) {
    return Arrays.equals(domain, other.domain);
  }
}

While this class is fully usable as is, rarely will you be interested in just the domain of nearest neighbours in a search. It is expected that specific data (eg. guess factors) will be loaded by sub-classing this Exemplar class. Our k-d tree will be parameterised based on this expectation.

...TBC...