Difference between revisions of "User:Duyn/kd-tree Tutorial"

Latest revision as of 16:43, 27 January 2012

[Work-in-Progress]

This page will walk you through the implementation of a kd-tree. We start with a basic tree and progressively add levels of optimisation. Writing a fully optimised kd-tree is an advanced task, but starting with the basics helps. There are already other kd-trees on this wiki, see some of the others for ideas.

Theory

A k-d tree is a binary tree which successively splits a k-dimensional space in half. This lets us speed up a nearest neighbour search by not examining points in partitions which are too far away. A k-nearest neighbour search can be implemented from a nearest neighbour algorithm by not shrinking the search radius until k items have been found. This page won't make a distinction between the two.

An Exemplar class

We will call each item in our k-d tree an exemplar. Each exemplar has a domain—its co-ordinates in the k-d tree's space. Each exemplar could also have an arbitrary payload, but our tree does not need to know about that. It will only handle storing exemplars based on their domain and returning them in a nearest neighbour search.

We might already have a class elsewere called Point which handles 2D co-ordinates. This terminology avoids conflict with that.

public class Exemplar {
  public final double[] domain;

  public Exemplar(double[] domain) {
    this.domain = domain;
  }

  public final boolean
  collocated(final Exemplar other) {
    return Arrays.equals(domain, other.domain);
  }
}

While this class is fully usable as is, rarely will the domain of nearest neighbours be of any interest. Often, useful data (such as GuessFactors) will be loaded by sub-classing this Exemplar class. Our k-d tree will be parameterised based on this expectation.

Basic Tree

First, we will build a basic kd-tree as described by Wikipedia's kd-tree page. We start off with a standard binary tree. Each tree is either a node with both left and right sub-trees defined, or a leaf carrying an exemplar. For simplicity, we won't allow nodes to contain any exemplars.

public class BasicKdTree<X extends Exemplar> {
  // Basic tree structure
  X data = null;
  BasicKdTree<X> left = null, right = null;

  // Only need to test one branch since we always populate both
  // branches at once
  private boolean
  isTree() { return left != null; }

  ...
}

Adding

Each of the public API functions defers to a private static method. This is to avoid accidentally referring to instance variables while we walk the tree. This is a common pattern we will use for much of the actual behaviour code for this tree.

public void
add(X ex) {
  BasicKdTree.addToTree(this, ex);
}

To add an exemplar, we traverse the tree from top down until we find a leaf. If the leaf doesn't already have an exemplar, we can just stow it there. Otherwise, we split the leaf and put the free exemplars in each sub-tree.

To split a leaf, we cycle through split dimensions in order as in most descriptions of kd-trees. Because we only allow each leaf to hold a single exemplar, we can use a simple splitting strategy: smallest on the left, largest on the right. The split value is half way between the two points along the split dimension.

int splitDim = 0;
double split = Double.NaN;

private static <X extends Exemplar> void
addToTree(BasicKdTree<X> tree, X ex) {
  while(tree != null) {
    if (tree.isTree()) {
      // Traverse in search of a leaf
      tree = ex.domain[tree.splitDim] <= tree.split
        ? tree.left : tree.right;
    } else {
      if (tree.data == null) {
        tree.data = ex;
      } else {
        // Split tree and add
        // Find smallest exemplar to be our split point
        final int d = tree.splitDim;
        X leftX = ex, rightX = tree.data;
        if (rightX.domain[d] < leftX.domain[d]) {
          leftX = tree.data;
          rightX = ex;
        }
        tree.split = 0.5*(leftX.domain[d] + rightX.domain[d]);
        
        final int nextSplitDim =
          (tree.splitDim + 1)%tree.dimensions();
        
        tree.left = new BasicKdTree<X>();
        tree.left.splitDim = nextSplitDim;
        tree.left.data = leftX;
        
        tree.right = new BasicKdTree<X>();
        tree.right.splitDim = nextSplitDim;
        tree.right.data = rightX;
      }

      // Done.
      tree = null;
    }
  }
}

private int
dimensions() { return data.domain.length; }

Searching

Before we start coding a search method, we need a helper class to store search results along with their distance from the query point. This is called PrioNode because we will eventually re-use it to implement a custom priority queue.

public final class PrioNode<T> {
  public final double priority;
  public final T data;

  public PrioNode(double priority, T data) {
    this.priority = priority;
    this.data = data;
  }
}

Like with our add() method, our search() method delegates to a static method to avoid introducing bugs by accidentally referring to member variables while we descend the tree.

public Iterable<? extends PrioNode<X>>
search(double[] query, int nResults) {
  return BasicKdTree.search(this, query, nResults);
}

To do a nearest neighbours search, we walk the tree, preferring to search sub-trees which are on the same side of the split as the query point first. Once we have found our initial candidates, we can contract our search sphere. We only search the other sub-tree if our search sphere might spill over onto the other side of the split.

Results are collected in a java.util.PriorityQueue so we can easily remove the farthest exemplars as we find closer ones.

private static <X extends Exemplar> Iterable<? extends PrioNode<X>>
search(BasicKdTree<X> tree, double[] query, int nResults) {
  final Queue<PrioNode<X>> results =
    new PriorityQueue<PrioNode<X>>(nResults,
      new Comparator<PrioNode<X>>() {

        // min-heap
        public int
        compare(PrioNode<X> o1, PrioNode<X> o2) {
          return o1.priority == o2.priority ? 0
            : o1.priority > o2.priority ? -1
            : 1;
        }

      }
    );
  final Deque<BasicKdTree<X>> stack
    = new LinkedList<BasicKdTree<X>>();
  stack.addLast(tree);
  while (!stack.isEmpty()) {
    tree = stack.removeLast();
    
    if (tree.isTree()) {
      // Guess nearest tree to query point
      BasicKdTree<X> nearTree = tree.left, farTree = tree.right;
      if (query[tree.splitDim] > tree.split) {
        nearTree = tree.right;
        farTree = tree.left;
      }
      
      // Only search far tree if our search sphere might
      // overlap with splitting plane
      if (results.size() < nResults
        || sq(query[tree.splitDim] - tree.split)
          <= results.peek().priority)
      {
        stack.addLast(farTree);
      }

      // Always search the nearest branch
      stack.addLast(nearTree);
    } else {
      final double dSq = distanceSqFrom(query, tree.data.domain);
      if (results.size() < nResults
        || dSq < results.peek().priority)
      {
        while (results.size() >= nResults) {
          results.poll();
        }

        results.offer(new PrioNode<X>(dSq, tree.data));
      }
    }
  }
  return results;
}
 
private static double
sq(double n) { return n*n; }

Our distance calculation is optimised because it will be called often. We use squared distances to avoid an unnecessary sqrt() operation. We also don't use Math.pow() because the JRE's default implementation must do some extra work before it can return with a simple multiplication (see the source in fdlibm).

private static double
distanceSqFrom(double[] p1, double[] p2) {
  // Note: profiling shows this is called lots of times, so it pays
  // to be well optimised
  double dSq = 0;
  for(int d = 0; d < p1.length; d++) {
    final double dst = p1[d] - p2[d];
    if (dst != 0)
      dSq += dst*dst;
  }
  return dSq;
}

And that's it. Barely 200 lines of code and we have ourselves a working kd-tree.

Evaluation

As a guide to performance, we will use the k-NN algorithm benchmark with the Diamond vs CunobelinDC gun data. For comparison, the benchmark included an optimised linear search to represent the lower bound of performance, and Rednaxela's tree—credited to be the fastest tree on the wiki. Full results are near the bottom of the page. Here are the relevant extracts for our tree:

RESULT << k-nearest neighbours search with Voidious' Linear search >>
: Average searching time       = 0.5740 miliseconds
: Average worst searching time = 2.3482 miliseconds
: Average adding time          = 0.4528 microseconds
: Accuracy                     = 100.0%

[...]

RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
: Average searching time       = 0.0378 miliseconds
: Average worst searching time = 0.2625 miliseconds
: Average adding time          = 1.5687 microseconds
: Accuracy                     = 100.0%

[...]

RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
: Average searching time       = 0.3939 miliseconds
: Average worst searching time = 11.5167 miliseconds
: Average adding time          = 0.5165 microseconds
: Accuracy                     = 100.0%

[...]

BEST RESULT: 
 - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
[...]
 - #8 duyn's basic kd-tree [0.3939]
 - #9 Voidious' Linear search [0.5740]

This test run showed that average add time was close to a linear search while searching performance improved by (0.3939/0.5740 - 1) ~= 31%. It's still an order of magnitude slower than Rednaxela's included tree, which shows there is a lot to gain by selecting appropriate optimisations.

The following code was used in the benchmark:

KNNBenchmark.java

public KNNBenchmark(final int dimension, final int numNeighbours, final SampleData[] samples, final int numReps) {
  final Class<?>[] searchAlgorithms = new Class<?>[] {
    FlatKNNSearch.class,
    SimontonTreeKNNSearch.class,
    NatTreeKNNSearch.class,
    VoidiousTreeKNNSearch.class,
+   DuynBasicKNNSearch.class,
    RednaxelaTreeKNNSearch.class,
  };
  ...
}

DuynBasicKNNSearch.java

public class DuynBasicKNNSearch extends KNNImplementation {
  final BasicKdTree<StringExemplar> tree;
  public DuynBasicKNNSearch(final int dimension) {
    super(dimension);
    tree = new BasicKdTree<StringExemplar>();
  }

  @Override public void
  addPoint(final double[] location, final String value) {
    tree.add(new StringExemplar(location, value));
  }

  @Override public String
  getName() {
    return "duyn's basic kd-tree";
  }

  @Override public KNNPoint[]
  getNearestNeighbors(final double[] location, final int size) {
    final List<KNNPoint> justPoints = new LinkedList<KNNPoint>();
    for(PrioNode<StringExemplar> sr : tree.search(location, size)) {
      final double distance = sr.priority;
      final StringExemplar pt = sr.data;
      justPoints.add(new KNNPoint(pt.value, distance));
    }
    final KNNPoint[] retVal = new KNNPoint[justPoints.size()];
    return justPoints.toArray(retVal);
  }

  class StringExemplar extends Exemplar {
    public final String value;
    public StringExemplar(final double[] coords, final String value) {
      super(coords);
      this.value = value;
    }
  }
}

Optimisations to the Tree

In this section, we will introduce the following optimisations to our tree:

Bucket leaves. Leaves will store multiple exemplars, only being split when they overflow. This makes our split points less dependent on the order in which we receive points and gives us more opportunity to choose a better split point.
Bounds-overlap-ball testing. Each tree stores the bounds of the smallest hyperrect which will contain all the data in that tree. This is smaller—maybe significantly so—than the bounds which that sub-tree occupy, giving us more opportunities to skip sub-trees. We will check that the minimum distance from each tree's content bounds overlaps with our search hypersphere. This lets us avoid walking all the way to a leaf if a tree's content bounds overlap with our search hypersphere but none of its children's content bounds do.
Splitting choice. We will split on the mean of the dimension with the largest variance. Splitting on the mean does not guarantee that our tree will be balanced, but is easier for us since we would have already calculated the mean in finding the dimension with largest variance. This will add more overhead to our add times since we will compute running means and variances each time an exemplar is added.
Singularity tracking. We will actively check whether all the exemplars in a leaf have the same domain. This is necessary to avoid getting ourselves caught in an infinite loop trying to repeatedly split an unsplittable leaf. It also lets us avoid repeated distance calculations when searching for nearest neighbours. Practically, unless all your dimensions are limited, discrete values, this optimisation is unlikely to make much impact.

Basic Tree

Once again, we start off with a basic binary tree. Each tree must store its own bucket size since it will not have a reference to its parent Bucket size is not static because we might want to build multiple trees with different bucket sizes.

public final class OptKdTree<X extends Exemplar> {
  final Queue<X> data;
  OptKdTree<X> left = null, right = null;

  // Number of exemplars to hold in a leaf before splitting
  private final int bucketSize;
  private static final int DEFAULT_BUCKET_SIZE = 10;

  public OptKdTree() {
    this(DEFAULT_BUCKET_SIZE);
  }

  public OptKdTree(int bucketSize) {
    this.bucketSize = bucketSize;
    this.data = new LinkedList<X>();
  }

  private final boolean
  isTree() { return left != null; }

  [...]
}

Adding

Now that our code is a little more sophisticated, it becomes profitable to have a dedicated method to bulk-load our tree. This also makes it easier to add dynamic rebalance support later if we need it. To take advantage the additional information we get from bulk loading, we decide whether to split a leaf only after all exemplars have been added.

public void
add(X ex) {
  OptKdTree<X> tree = addNoSplit(this, ex);
  if (shouldSplit(tree)) {
    split(tree);
  }
}

public void
addAll(Collection<X> exs) {
  final Set<OptKdTree<X>> modTrees =
    new HashSet<OptKdTree<X>>();
  for(X ex : exs) {
    modTrees.add(addNoSplit(this, ex));
  }

  for(OptKdTree<X> tree : modTrees) {
    if (shouldSplit(tree)) {
      split(tree);
    }
  }
}

To add an exemplar, we traverse the tree from top down until we find a leaf. Then we add the exemplar to the list at that leaf. For now, there is no special ordering to the list of exemplars in a leaf. To decide which sub-tree to traverse, each tree stores two values—a splitting dimension and a splitting value. If our new exemplar's domain along the splitting dimension is greater than the tree's splitting value, we put it in the right sub-tree. Otherwise, it goes in the left one.

Since adding takes little time compared to searching, we take this opportunity to make some optimisations:

We keep track of the actual hyperrect the points in this tree occupy. This lets us rule out a tree, even though its space may intersect with our search sphere, if it has no chance of containing any points within our search sphere. This hyperrect is fully defined by just storing its two extreme corners.

To save us having to do a full iteration when we come to split a leaf, we compute the running mean and variance for each dimension using Welford's method:

   <math>
   M_1 = x_1,\qquad S_1 = 0</math>
   <math>
   M_k = M_{k-1} + {x_k - M_{k-1} \over k} \qquad(exMean)</math>
   <math>
   S_k = S_{k-1} + (x_k - M_{k-1})\times(x_k - M_k) \qquad(exSumSqDev)</math>

We keep track of whether all exemplars in this leaf have the same domain. If they do, we know that comparisons on one exemplar apply to all exemplars in that leaf.

// These aren't initialised until add() is called.
private double[] exMean = null, exSumSqDev = null;

// Optimisation when sub-tree contains only duplicates
private boolean singularity = true;

// Optimisation for searches. This lets us skip a node if its
// scope intersects with a search hypersphere but it doesn't contain
// any points that actually intersect.
private double[] contentMax = null, contentMin = null;

private int
dimensions() { return contentMax.length; }

// Addition

// Adds an exemplar without splitting overflowing leaves.
// Returns leaf to which exemplar was added.
private static <X extends Exemplar> OptKdTree<X>
addNoSplit(OptKdTree<X> tree, X ex) {
  // Some spurious function calls. Optimised for readability over
  // efficiency.
  OptKdTree<X> cursor = tree;
  while (cursor != null) {
    updateBounds(cursor, ex);
    if (cursor.isTree()) {
      // Sub-tree
      cursor = ex.domain[cursor.splitDim] <= cursor.split
        ? cursor.left : cursor.right;
    } else {
      // Leaf

      // Add exemplar to leaf
      cursor.data.add(ex);

      // Calculate running mean and sum of squared deviations
      final int nExs = cursor.data.size();
      final int dims = cursor.dimensions();
      if (nExs == 1) {
        cursor.exMean = Arrays.copyOf(ex.domain, dims);
        cursor.exSumSqDev = new double[dims];
      } else {
        for(int d = 0; d < dims; d++) {
          final double coord = ex.domain[d];
          final double oldMean = cursor.exMean[d], newMean;
          cursor.exMean[d] = newMean =
            oldMean + (coord - oldMean)/nExs;
          cursor.exSumSqDev[d] = cursor.exSumSqDev[d]
            + (coord - oldMean)*(coord - newMean);
        }
      }

      // Check that data are still uniform
      if (cursor.singularity) {
        final Queue<X> cExs = cursor.data;
        if (cExs.size() > 0 && !ex.collocated(cExs.peek()))
          cursor.singularity = false;
      }

      // Finished walking
      return cursor;
    }
  }
  throw new RuntimeException("Walked tree without adding anything");
}

To update the bounding hyperrect for the contents of a tree, we iterate through each dimension, extending (if necessary) the bounds in that dimension to contain the new exemplar.

private static <T extends Exemplar> void
updateBounds(OptKdTree<T> tree, Exemplar ex) {
  final int dims = ex.domain.length;
  if (tree.contentMax == null) {
    tree.contentMax = Arrays.copyOf(ex.domain, dims);
    tree.contentMin = Arrays.copyOf(ex.domain, dims);
  } else {
    for(int d = 0; d < dims; d++) {
      final double dimVal = ex.domain[d];
      if (dimVal > tree.contentMax[d])
        tree.contentMax[d] = dimVal;
      else if (dimVal < tree.contentMin[d])
        tree.contentMin[d] = dimVal;
    }
  }
}

Splitting

We only split when a leaf's exemplars exceed its bucket size. It is only worth splitting if the exemplars don't all have the same domain.

private static <T extends Exemplar> boolean
shouldSplit(OptKdTree<T> tree) {
  return tree.data.size() > tree.bucketSize
    && !tree.singularity;
}

Thanks to our pre-computation, splitting is usually straight-forward. We iterate through each dimension to find the one with the largest variance (skip the unnecessary division), then we can directly look up the mean of that dimension. This strategy might not successfully split the tree if exemplars are so close together that rounding errors in computing the mean result in the mean not lying strictly between the exemplars in a leaf. When this happens, we resort to repeatedly trying a dimension and an exemplar at random for our splitting point. We only call split if the leaf is not a singularity, so eventually we will find a point that does divide our exemplars.

To make sure the point actually does divide our data, we separate our data into two lists destined for each sub-tree. Once we have found a successful split point, we bulk load our sub-trees, store information about our split point and let go of the exemplars stored in the tree.

// Split properties. Not initialised until split occurs.
private int splitDim = 0;
private double split = Double.NaN;

@SuppressWarnings("unchecked") private static <T extends Exemplar> void
split(OptKdTree<T> tree) {
  assert !tree.singularity;
  // Find dimension with largest variance to split on
  double largestVar = -1;
  int splitDim = 0;
  for(int d = 0; d < tree.dimensions(); d++) {
    // Don't need to divide by number of data to find largest
    // variance
    final double var = tree.exSumSqDev[d];
    if (var > largestVar) {
      largestVar = var;
      splitDim = d;
    }
  }

  // Find mean as position for our split
  double splitValue = tree.exMean[splitDim];

  // Check that our split actually splits our data. This also lets
  // us bulk load data into sub-trees, which is more likely
  // to keep optimal balance.
  final Queue<T> leftExs = new LinkedList<T>();
  final Queue<T> rightExs = new LinkedList<T>();
  for(T s : tree.data) {
    if (s.domain[splitDim] <= splitValue)
      leftExs.add(s);
    else
      rightExs.add(s);
  }
  int leftSize = leftExs.size();
  final int treeSize = tree.data.size();
  if (leftSize == treeSize || leftSize == 0) {
    System.err.println(
      "WARNING: Randomly splitting non-uniform tree");
    // We know the data aren't all the same, so try picking
    // an exemplar and a dimension at random for our split point

    // This might take several tries, so we copy our data to
    // an array to speed up process of picking a random point
    Object[] exs = tree.data.toArray();
    while (leftSize == treeSize || leftSize == 0) {
      leftExs.clear();
      rightExs.clear();

      splitDim = (int)
        Math.floor(Math.random()*tree.dimensions());
      final int splitPtIdx = (int)
        Math.floor(Math.random()*exs.length);
      // Cast is inevitable consequence of java's inability to
      // create a generic array
      splitValue = ((T)exs[splitPtIdx]).domain[splitDim];
      for(T s : tree.data) {
        if (s.domain[splitDim] <= splitValue)
          leftExs.add(s);
        else
          rightExs.add(s);
      }
      leftSize = leftExs.size();
    }
  }

  // We have found a valid split. Start building our sub-trees
  final OptKdTree<T> left = new OptKdTree<T>(tree.bucketSize);
  final OptKdTree<T> right = new OptKdTree<T>(tree.bucketSize);
  left.addAll(leftExs);
  right.addAll(rightExs);

  // Finally, commit the split
  tree.splitDim = splitDim;
  tree.split = splitValue;
  tree.left = left;
  tree.right = right;

  // Let go of data (and their running stats) held in this leaf
  tree.data.clear();
  tree.exMean = tree.exSumSqDev = null;
}

Distance calculations

Before we can start searching, we need to define our distance metric. The distance calculation has already been introduced before.

Squared Euclidean distance:

<math>distance_{x\rightarrow y}^2 = \sum (x[d] - y[d])^2</math>

This will be called for every point examined, so we should optimise it well.

private static double
distanceSqFrom(double[] p1, double[] p2) {
  double dSq = 0;
  for(int d = 0; d < p1.length; d++) {
    final double dst = p1[d] - p2[d];
    if (dst != 0)
      dSq += dst*dst;
  }
  return dSq;
}

We will also need to find the minimum distance from a hyperrect to a query point. This will let us rule out trees whose content hyperrects are outside our search sphere. To find the minimum distance between a hyperrect and a given point, we need first find the closest point along bounds of that hyperrect to the point. This is easy to do, even if not immediately obvious.

Along each dimension,

<math>nearestPoint[d]=\begin{cases}

min[d] & target[d]\le min[d]\\ target[d] & min<target[d]<max[d]\\ max[d] & target[d]\ge max[d]\end{cases}</math>

After that, we just calculate the distance from our query point to this nearest point. The code below has been optimised because it gets called a lot during a search.

// Gets distance from target of nearest point on hyper-rect defined
// by supplied min and max bounds
private static double
minDistanceSqFrom(double[] target, double[] min, double[] max) {
  double distanceSq = 0;
  for(int d = 0; d < target.length; d++) {
    if (target[d] < min[d]) {
      final double dst = min[d] - target[d];
      distanceSq += dst*dst;
    } else if (target[d] > max[d]) {
      final double dst = max[d] - target[d];
      distanceSq += dst*dst;
    }
  }
  return distanceSq;
}

Searching

Before we dive into searching, we will introduce two helper classes.

SearchStackEntry stores a tree along any data which will be useful to keep on the search stack. Currently, this additional information consists only of whether a bounds check should be done. This lets us skip a redundant bounds check on the root of the tree.

//
// class SearchStackEntry
//

private static class SearchStackEntry<T extends Exemplar> {
  public final boolean needBoundsCheck;
  public final OptKdTree<T> tree;

  public SearchStackEntry(boolean needBoundsCheck, OptKdTree<T> tree) {
    this.needBoundsCheck = needBoundsCheck;
    this.tree = tree;
  }
}

SearchState stores the current state of the search. Its sole purpose is to let us separate searching sub-trees and leaves into different methods. This breaks up a long method and makes profiling easier.

//
// class SearchState
//
// Holds data about current state of the search. Used for live updating
// of pruning distance.

private static class SearchState<X extends Exemplar> {
  final int nResults;
  final PriorityQueue<PrioNode<X>> results;

  public SearchState(int nResults) {
    this.nResults = nResults;
    results = new PriorityQueue<PrioNode<X>>(nResults,
      new Comparator<PrioNode<X>>() {

        public int
        compare(PrioNode<X> o1, PrioNode<X> o2) {
          return o1.priority == o2.priority ? 0
            : o1.priority > o2.priority ? -1 : 1;
        }

      });
  }
}

Our search method does a depth-first search using a stack. Each time we pop our stack, we do a bounds check if necessary, then defer to the appropriate method.

public Iterable<PrioNode<X>>
search(double[] query, int nResults) {
  // Forward to a static method to avoid accidental reference to
  // instance variables while descending the tree
  return search(this, query, nResults);
}

private static <X extends Exemplar> Iterable<PrioNode<X>>
search(OptKdTree<X> tree, double[] query, int nResults) {
  final SearchState<X> state = new SearchState<X>(nResults);
  final Deque<SearchStackEntry<X>> stack =
    new LinkedList<SearchStackEntry<X>>();
  if (tree.contentMin != null)
    stack.addLast(new SearchStackEntry<X>(false, tree));

  while (!stack.isEmpty()) {
    final SearchStackEntry<X> entry = stack.removeLast();
    final OptKdTree<X> cur = entry.tree;

    if (entry.needBoundsCheck && state.results.size() >= nResults) {
      final double d = minDistanceSqFrom(query,
        cur.contentMin, cur.contentMax);
      if (d > state.results.lastKey())
        continue;
    }

    if (cur.isTree()) {
      searchTree(query, cur, stack, state.results.peek().priority);
    } else {
      searchLeaf(query, cur, state);
    }
  }

  return state.results;
}

Searching a Tree

We basically want to add both sub-trees to the stack, if they are not empty. As a heuristic, we think the sub-tree on the same side of the split as our query point is more likely to contain closer points, so we search that sub-tree first by adding it to the stack last. You might try to avoid a distance calculation by not requiring a bounds check for the heuristically nearer sub-tree, but that does not seem to significantly improve performance.

private static <X extends Exemplar> void
searchTree(double[] query, OptKdTree<X> tree,
  Deque<SearchStackEntry<X>> stack, double maxPriority)
{
  OptKdTree<X> nearTree = tree.left, farTree = tree.right;
  if (query[tree.splitDim] > tree.split) {
    nearTree = tree.right;
    farTree = tree.left;
  }

  // These variables let us skip empty sub-trees
  boolean nearEmpty = nearTree.contentMin == null;
  boolean farEmpty = farTree.contentMin == null;

  // Add nearest sub-tree to stack later so we descend it
  // first. This is likely to constrict our max distance
  // sooner, resulting in less visited nodes
  if (!farEmpty && minDistanceSqFrom(query,farTree.contentMin,farTree.contentMax) < maxPriority) {
    stack.addLast(new SearchStackEntry<X>(true, farTree));
  }

  if (!nearEmpty && minDistanceSqFrom(query,nearTree.contentMin,nearTree.contentMax) < maxPriority) {
    stack.addLast(new SearchStackEntry<X>(true, nearTree));
  }
}

Searching a Leaf

By the time we reach a leaf, there is little left to optimise away. We must go through each exemplar in turn to see if it is a nearest neighbour to our query point.

private static <X extends Exemplar> void
searchLeaf(double[] query, OptKdTree<X> leaf, SearchState<X> state) {
  double exD = Double.NaN;
  for(X ex : leaf.data) {
    if (!leaf.singularity || Double.isNaN(exD)) {
      exD = distanceSqFrom(query, ex.domain);
    }

    if (examine(exD, state)) {
      while (state.results.size() >= state.nResults) {
        state.results.poll();
      }

      state.results.offer(new PrioNode<X>(exD, ex));
    }
  }
}

private static <X extends Exemplar> boolean
examine(double distance, SearchState<X> state) {
  return state.results.size() < state.nResults
    || distance < state.results.peek().priority;
}

Evaluation

From the same benchmark run, our optimised tree obtains the following results:

RESULT << k-nearest neighbours search with Voidious' Linear search >>
: Average searching time       = 0.5740 miliseconds
: Average worst searching time = 2.3482 miliseconds
: Average adding time          = 0.4528 microseconds
: Accuracy                     = 100.0%

[...]

RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
: Average searching time       = 0.0378 miliseconds
: Average worst searching time = 0.2625 miliseconds
: Average adding time          = 1.5687 microseconds
: Accuracy                     = 100.0%

[...]

RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
: Average searching time       = 0.3939 miliseconds
: Average worst searching time = 11.5167 miliseconds
: Average adding time          = 0.5165 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with duyn's optimised kd-tree >>
: Average searching time       = 0.0536 miliseconds
: Average worst searching time = 1.6742 miliseconds
: Average adding time          = 2.9380 microseconds
: Accuracy                     = 100.0%

[...]

BEST RESULT: 
 - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
[...]
 - #4 duyn's optimised kd-tree [0.0536]
[...]
 - #8 duyn's basic kd-tree [0.3939]
 - #9 Voidious' Linear search [0.5740]

This is promising. We are a whole order of magnitude faster and finally in the same league as other kd-tree implementations on this wiki. We have not changed our API, so the following shim is only for completeness:

DuynOptKNNSearch.java

public class DuynOptKNNSearch extends KNNImplementation {
  final OptKdTree<StringExemplar> tree;
  public DuynOptKNNSearch(final int dimension) {
    super(dimension);
    tree = new OptKdTree<StringExemplar>();
  }

  @Override public void
  addPoint(final double[] location, final String value) {
    tree.add(new StringExemplar(location, value));
  }

  @Override public String
  getName() {
    return "duyn's optimised kd-tree";
  }

  @Override
  public KNNPoint[]
  getNearestNeighbors(final double[] location, final int size) {
    final List<KNNPoint> justPoints = new LinkedList<KNNPoint>();
    for(PrioNode<StringExemplar> sr : tree.search(location, size)) {
      final double distance = sr.priority;
      final StringExemplar pt = sr.data;
      justPoints.add(new KNNPoint(pt.value, distance));
    }
    final KNNPoint[] retVal = new KNNPoint[justPoints.size()];
    return justPoints.toArray(retVal);
  }

  class StringExemplar extends Exemplar {
    public final String value;
    public StringExemplar(final double[] coords, final String value) {
      super(coords);
      this.value = value;
    }
  }
}

A Better Results Heap

A TreeMap does unnecessary processing, keeping all its keys in order when all we really care about is the farthest tentative result we have found. By implementing a custom heap, we can squeeze even more performance out of our tree.

In this section, we will implement an implicit binary heap. This structure does not have the best theoretical performance, but often does well in practice. More theoretically efficient heaps often rely on batching inserts, only fully processing them when a the top element is removed. Since we almost do one removal for every insert, this does not offer much benefit to us. For our usage pattern, the only heap faster than a binary heap is a buggy heap.

Implicit d-ary Heap

Implicit binary heaps can be very memory efficient since all elements are stored in an array. They don't grow as easily as other heaps, but for our purposes that is not a problem—we can set a max on the number of results we are interested in. A good description of implicit heaps is provided by John Bentley in Programming Pearls: Thanks, Heaps.

Below, we implement a d-ary heap. The math is similar to a binary heap, and there are some suggestions that a 4-way heap may be more efficient than a binary heap. Our heap can be used as a min- or a max-heap. Using the enum Direction ensures the user doesn't get this parameter wrong.

// Final for performance, may be removed if sub-classing necessary
public final class FastBinaryHeap<T> implements Iterable<PrioNode<T>> {
  private final int branches;
  // Direction = -1 or 1
  // root * direction > element * direction
  private final int direction;
  public static enum Direction {
    MIN(-1), MAX(1);

    public final int value;
    private Direction(int value) {
      this.value = value;
    }
  }
  public static final Direction MAX = Direction.MAX, MIN = Direction.MIN;

  private final PrioNode<T>[] data;
  private int size = 0;
  private static final int DEFAULT_N_CHILDREN = 4;

  public FastBinaryHeap(int capacity, Direction direction) {
    this(capacity, DEFAULT_N_CHILDREN, direction);
  }

  @SuppressWarnings("unchecked")
  public FastBinaryHeap(int capacity, int branches, Direction dir) {
    this.branches = branches;
    // This implicit conversion is safe because we only refer to
    // this array through our data member.
    this.data = new PrioNode[capacity];
    this.direction = dir.value;
  }
}

We implement an interface similar to the standard PriorityQueue:

public Iterator<PrioNode<T>>
iterator() {
  return new Iterator<PrioNode<T>>() {

    int idx = 0;

    public boolean
    hasNext() { return idx < size; }

    public PrioNode<T>
    next() { return data[idx++]; }

    public void
    remove() {
      throw new UnsupportedOperationException("Not supported.");
    }

  };
}

public int
size() { return size; }

public PrioNode<T>
peek() { return data[0]; }

Our offer() method takes a priority separately from the data we want to associate with it. We will store the priority separately so we don't fall into the same performance trap as PriorityQueue. We do not implement a poll() method because we don't need it. If we are about to overflow, we will just replace our top element with the new one.

public boolean
offer(double priority, T result) {
  if (size < data.length) {
    // Trivial case
    siftUp(size, new PrioNode<T>(priority, result));
    size++;
    return true;
  }

  if (priority*direction > data[0].priority*direction) {
    System.err.printf("Do not want %f when have %f\n",
      priority, data[0].priority);
    return false;
  }

  siftDown(0, new PrioNode<T>(priority, result));
  return true;
}

Now to the meat of our heap.

siftUp treats the given index as empty and progressively movies parents into the hole left behind if inserting the given element there would violate heap order.

Parents are calculated as follows:

<math>parent[i] = \left\lfloor {i - 1\over branches} \right\rfloor</math>

Once we have that, we can compare the values at both indexes. Multiplying them by direction lets us use the same code for both a min- and a max-heap.

private void
siftUp(int hole, PrioNode<T> value) {
  if (size > 0) {
    final double vPD = value.priority*direction;
    while (hole > 0) {
      final int parent = (hole - 1)/branches;
      if (vPD > data[parent].priority*direction) {
        data[hole] = data[parent];
        hole = parent;
      } else {
        break;
      }
    }
  }

  data[hole] = value;
}

siftDown does the opposite, starting from the given index and moving elements up if inserting the new value into the hole would violate heap order. It swaps the element with its worst heap-offending child so that with a single swap, it can restore heap order to that level.

In a 0-based index, children are calculated as follows:

<math>\begin{align}child_{1}[i] & =branches\times i+1\\

\ldots\\ child_{branches}[i] & =branches\times i+branches\end{align}</math>

private void
siftDown(int hole, PrioNode<T> value) {
  if (size > 0) {
    // Push down along min path
    final double vPD = value.priority*direction;
    int firstChild;
    while ((firstChild = branches*hole + 1) < data.length) {

      // Find largest/smallest child
      double swapPD = data[firstChild].priority*direction;
      int swapChild = firstChild;
      for(int c = firstChild + 1;
        c < firstChild + branches && c < size; c++)
      {
        final double cPD = data[c].priority*direction;
        if (cPD > swapPD) {
          swapPD = cPD;
          swapChild = c;
        }
      }

      if (swapPD > vPD) {
        data[hole] = data[swapChild];
        hole = swapChild;
      } else {
        break;
      }
    }
  }

  data[hole] = value;
}

Both of these are slightly different from standard textbook presentations of the algorithm. The versions here do less swaps by shifting other elements out of the way until the new element's rightful place has been found.

Changes to our Tree

Because the only change we have made is to the method of collecting results, we will skim over the changes necessary.

public final class FastKdTree<X extends Exemplar> {
....
}

Our SearchState will have to be updated to use our new heap:

//
// class SearchState
//
// Holds data about current state of the search. Used for live updating
// of pruning distance.

private static class SearchState<X extends Exemplar> {
  final int nResults;
  final FastBinaryHeap<X> results;

  public SearchState(int nResults) {
    this.nResults = nResults;
    results = new FastBinaryHeap<X>(
      nResults, 4, FastBinaryHeap.MAX);
  }
}

Our searchLeaf() method no longer needs to explicitly poll the top since our heap will just replace it if it has no space:

private static <X extends Exemplar> void
searchLeaf(double[] query, FastKdTree<X> leaf, SearchState<X> state) {
  double exD = Double.NaN;
  for(X ex : leaf.data) {
    if (!leaf.singularity || Double.isNaN(exD)) {
      exD = distanceSqFrom(query, ex.domain);
    }

    if (examine(exD, state)) {
      state.results.offer(exD, ex);
    }
  }
}

We have also replaced all instances of LinkedLists with ArrayDeques. This is because although theoretically LinkedLists are appropriate, the javadocs for ArrayDeque state that:

"This class is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue."—Java 6 API docs

public FastKdTree(int bucketSize) {
  this.bucketSize = bucketSize;
  this.data = new ArrayDeque<X>();
}

@SuppressWarnings("unchecked") private static <X extends Exemplar> void
split(FastKdTree<X> tree) {
  ...
  final Queue<X> leftExs = new ArrayDeque<X>();
  final Queue<X> rightExs = new ArrayDeque<X>();
  ...
}

private static <X extends Exemplar> Iterable<PrioNode<X>>
search(FastKdTree<X> tree, double[] query, int nResults) {
  final SearchState<X> state = new SearchState<X>(nResults);
  final Deque<SearchStackEntry<X>> stack =
    new ArrayDeque<SearchStackEntry<X>>();
  ...
}

Evaluation

Our improved heap squeezes a bit more performance (~10%) out of our tree:

RESULT << k-nearest neighbours search with Voidious' Linear search >>
: Average searching time       = 0.5740 miliseconds
: Average worst searching time = 2.3482 miliseconds
: Average adding time          = 0.4528 microseconds
: Accuracy                     = 100.0%

[...]

RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
: Average searching time       = 0.0378 miliseconds
: Average worst searching time = 0.2625 miliseconds
: Average adding time          = 1.5687 microseconds
: Accuracy                     = 100.0%

[...]

RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
: Average searching time       = 0.3939 miliseconds
: Average worst searching time = 11.5167 miliseconds
: Average adding time          = 0.5165 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with duyn's optimised kd-tree >>
: Average searching time       = 0.0536 miliseconds
: Average worst searching time = 1.6742 miliseconds
: Average adding time          = 2.9380 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with duyn's fast kd-tree >>
: Average searching time       = 0.0479 miliseconds
: Average worst searching time = 25.0995 miliseconds
: Average adding time          = 2.1288 microseconds
: Accuracy                     = 100.0%

[...]

BEST RESULT: 
 - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
[...]
 - #3 duyn's fast kd-tree [0.0479]
 - #4 duyn's optimised kd-tree [0.0536]
[...]
 - #8 duyn's basic kd-tree [0.3939]
 - #9 Voidious' Linear search [0.5740]

Our client code has barely changed:

DuynFastKNNSearch.java

public class DuynFastKNNSearch extends KNNImplementation {
  final FastKdTree<StringExemplar> tree;
  public DuynFastKNNSearch(final int dimension) {
    super(dimension);
    tree = new FastKdTree<StringExemplar>();
  }

  @Override public void
  addPoint(final double[] location, final String value) {
    tree.add(new StringExemplar(location, value));
  }

  @Override public String
  getName() {
    return "duyn's fast kd-tree";
  }

  @Override public KNNPoint[]
  getNearestNeighbors(final double[] location, final int size) {
    final List<KNNPoint> justPoints = new LinkedList<KNNPoint>();
    for(PrioNode<StringExemplar> sr : tree.search(location, size)) {
      final double distance = sr.priority;
      final StringExemplar pt = sr.data;
      justPoints.add(new KNNPoint(pt.value, distance));
    }
    final KNNPoint[] retVal = new KNNPoint[justPoints.size()];
    return justPoints.toArray(retVal);
  }

  class StringExemplar extends Exemplar {
    public final String value;
    public StringExemplar(final double[] coords, final String value) {
      super(coords);
      this.value = value;
    }
  }
}

Full Benchmark Results

Below are the full benchmark results quoted throughout this page. It puts the performance of our trees in context compared to other kd-trees available on this wiki. The benchmark (k-NN algorithm benchmark) was run with the Diamond vs CunobelinDC gun data.

K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
-----------------------------------------
Reading data from gun-data-Diamond-vs-jk.mini.CunobelinDC 0.3.csv.gz
Read 25621 saves and 10300 searches.

Running 30 repetition(s) for k-nearest neighbours searching:
:: 13 dimension(s); 40 neighbour(s)
Warming up the JIT with 5 repetitions first...

Running tests...test 1/30
[...]
 COMPLETED.

RESULT << k-nearest neighbours search with Voidious' Linear search >>
: Average searching time       = 0.5740 miliseconds
: Average worst searching time = 2.3482 miliseconds
: Average adding time          = 0.4528 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
: Average searching time       = 0.1609 miliseconds
: Average worst searching time = 1.4184 miliseconds
: Average adding time          = 1.3322 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
: Average searching time       = 0.2081 miliseconds
: Average worst searching time = 21.9191 miliseconds
: Average adding time          = 1.0301 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
: Average searching time       = 0.0378 miliseconds
: Average worst searching time = 0.2625 miliseconds
: Average adding time          = 1.5687 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with Rednaxela's kd-tree (3rd gen) >>
: Average searching time       = 0.0407 miliseconds
: Average worst searching time = 0.2259 miliseconds
: Average adding time          = 1.6342 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
: Average searching time       = 0.3939 miliseconds
: Average worst searching time = 11.5167 miliseconds
: Average adding time          = 0.5165 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with duyn's optimised kd-tree >>
: Average searching time       = 0.0536 miliseconds
: Average worst searching time = 1.6742 miliseconds
: Average adding time          = 2.9380 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with duyn's fast kd-tree >>
: Average searching time       = 0.0479 miliseconds
: Average worst searching time = 25.0995 miliseconds
: Average adding time          = 2.1288 microseconds
: Accuracy                     = 100.0%

RESULT << k-nearest neighbours search with Chase-san's kd-tree (KDTreeC) >>
: Average searching time       = 0.1200 miliseconds
: Average worst searching time = 19.5727 miliseconds
: Average adding time          = 1.4222 microseconds
: Accuracy                     = 100.0%


BEST RESULT: 
 - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
 - #2 Rednaxela's kd-tree (3rd gen) [0.0407]
 - #3 duyn's fast kd-tree [0.0479]
 - #4 duyn's optimised kd-tree [0.0536]
 - #5 Chase-san's kd-tree (KDTreeC) [0.1200]
 - #6 Simonton's Bucket PR k-d tree [0.1609]
 - #7 Voidious' Bucket PR k-d tree [0.2081]
 - #8 duyn's basic kd-tree [0.3939]
 - #9 Voidious' Linear search [0.5740]

Benchmark running time: 587.77 seconds

Full Source Code

The full source code for all three trees presented in this tutorial can be downloaded together: File:Duyn tutorial kd trees.jar. The code is available under the RWPCL.

Extensions

This tree does not include any support for deletion or re-balancing. You can get away with this if you plan to re-build the entire tree regularly.

Deletion support would be easy to add to this tree. Since all exemplars are stored in leaves at the bottom, we need only walk the tree and remove the exemplar from the leaf containing it. We would then merge sub-trees if multiple deletions has left one of them empty. Unless our bucket width is really small, we don't need to merge sub-trees after each deletion since the addition of a subsequent exemplar is unlikely to result in choosing a significantly different split point.

Re-balancing with this tree is trickier. It does not currently store depth, so we don't know when a re-balance is needed. Because exemplars are only stored in leaves, collecting all the exemplars under a sub-tree would require exhaustively walking that sub-tree.

If a depth is to be stored, each tree would need a pointer to its parent tree so we propogate depth updates when a leaf is split.

We can avoid an exhaustive walk by storing each exemplar in every tree on its path from root to leaf. We then can re-balance easily by dropping both sub-trees and splitting the given tree anew. This approach has the down side of making inserts and deletes more complex since we have to update the exemplar list of every tree along the path from root to leaf.

Difference between revisions of "User:Duyn/kd-tree Tutorial"

Latest revision as of 16:43, 27 January 2012

Contents

Theory

An Exemplar class

Basic Tree

Adding

Searching

Evaluation

Optimisations to the Tree

Basic Tree

Adding

Splitting

Distance calculations

Searching

Searching a Tree

Searching a Leaf

Evaluation

A Better Results Heap

Implicit d-ary Heap

Changes to our Tree

Evaluation

Full Benchmark Results

Full Source Code

Extensions

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
-( Walkthrough: An Optimised kd-tree. This is the beginning of a walk through on writing a k-d tree )
+[Work-in-Progress]
-This page will walk you through the implementation of an optimised kd tree. We will end up with a bucket kd-tree which splits on the mean of the dimension with largest variance. Writing a kd-tree is non-trivial; having some details explained can make the process easier. There are already other k-d trees on this wiki, see some of the others for more ideas.
+This page will walk you through the implementation of a ''k''d-tree. We start with a basic tree and progressively add levels of optimisation. Writing a fully optimised ''k''d-tree is an advanced task, but starting with the basics helps. There are already other ''k''d-trees on this wiki, see some of the others for ideas.
 ==Theory==
 A k-d tree is a binary tree which successively splits a ''k''-dimensional space in half. This lets us speed up a nearest neighbour search by not examining points in partitions which are too far away. A ''k''-nearest neighbour search can be implemented from a nearest neighbour algorithm by not shrinking the search radius until ''k'' items have been found. This page won't make a distinction between the two.
-Some of the non-obvious optimisations used in this tree are:
-* '''Path ordering'''. Our search will descend the tree with the edge closest to the query point first. This is heuristically more likely to contain closer neighbours, which will lead to contracting our search radius sooner. It also ensures we start searching from the leaf which would hold the query point if it were added. Combined with a good bucket size, this lets us quickly load up some decent candidates before we have even examined any other branches.
-* '''Bounds-overlap-ball testing'''. Each tree stores the bounds of the smallest hyperrect which will contain all the data in that tree. This is smaller&mdash;maybe significantly so&mdash;than the bounds which that sub-tree occupy.
-* '''Splitting on mean'''. While most descriptions of kd-trees describe splitting on the median, we will split on the mean. This is faster for us because we will have already calculated the mean in finding the dimension with largest variance.
 ==An Exemplar class==
@@ Line 16: / Line 11: @@
 We might already have a class elsewere called Point which handles 2D co-ordinates. This terminology avoids conflict with that.
-<pre>
+ public class Exemplar {
-public class Exemplar {
+   public final double[] domain;
-  public final double[] domain;
+   public Exemplar(double[] domain) {
-  public Exemplar(final double[] coords) {
+     this.domain = domain;
-    this.domain = coords;
+   }
-  }
+   public final boolean
-  // Short hand. Shorter than calling Arrays.equals() each time.
+   collocated(final Exemplar other) {
-  public boolean domainEquals(final Exemplar other) {
+     return Arrays.equals(domain, other.domain);
-    return Arrays.equals(domain, other.domain);
+   }
-  }
+ }
-}
-</pre>
-While this class is fully usable as is, rarely will the domain of nearest neighbours be of any interest. Often, useful data (such as [[guess factor]]s) will be loaded by sub-classing this Exemplar class. Our k-d tree will be parameterised based on this expectation.
+While this class is fully usable as is, rarely will the domain of nearest neighbours be of any interest. Often, useful data (such as [[GuessFactor]]s) will be loaded by sub-classing this Exemplar class. Our k-d tree will be parameterised based on this expectation.
 ==Basic Tree==
-We start off with a normal tree. Each tree is either a '''node''' with both left and right sub-trees defined, or a '''leaf''' with a list of exemplars. Because of our splitting algorithm, it is not worth the added complexity to allow a tree to be both since the split point might not correspond with any actual exemplars.
+First, we will build a basic ''k''d-tree as described by Wikipedia's [http://en.wikipedia.org/w/index.php?title=Kd-tree&oldid=346156055 kd-tree] page. We start off with a standard binary tree. Each tree is either a '''node''' with both left and right sub-trees defined, or a '''leaf''' carrying an exemplar. For simplicity, we won't allow nodes to contain any exemplars.
-We will explicitly pass bucket size and dimensions into the constructor. Trees will not have a reference to their parents, each tree must have a local copy of these variables. Dimension could be inferred from the dimension of the first point added, but explicitly passing it to the constructor is simpler. Bucket size must be passed in because we might want to build multiple trees with different bucket sizes.
+ public class BasicKdTree<X extends Exemplar> {
+   // Basic tree structure
+   X data = null;
+   BasicKdTree<X> left = null, right = null;
+   // Only need to test one branch since we always populate both
+   // branches at once
+   private boolean
+   isTree() { return left != null; }
+   ...
+ }
-<pre>
+===Adding===
-public class BucketKdTree<T extends Exemplar> {
+Each of the public API functions defers to a private static method. This is to avoid accidentally referring to instance variables while we walk the tree. This is a common pattern we will use for much of the actual behaviour code for this tree.
-  private List<T> exemplars = new LinkedList<T>();
-  private BucketKdTree<T> left, right;
-  private int bucketSize;
-  private final int dimensions;
-  public BucketKdTree(int bucketSize, int dimensions) {
+ public void
-    this.bucketSize = bucketSize;
+ add(X ex) {
-    this.dimensions = dimensions;
+   BasicKdTree.addToTree(this, ex);
-  }
+ }
-  private boolean
+To add an exemplar, we traverse the tree from top down until we find a leaf. If the leaf doesn't already have an exemplar, we can just stow it there. Otherwise, we split the leaf and put the free exemplars in each sub-tree.
-  isTree() { return left != null; }
-}
-</pre>
-==Adding==
+To split a leaf, we cycle through split dimensions in order as in most descriptions of ''k''d-trees. Because we only allow each leaf to hold a single exemplar, we can use a simple splitting strategy: smallest on the left, largest on the right. The split value is half way between the two points along the split dimension.
+ int splitDim = 0;
+ double split = Double.NaN;
+ private static <X extends Exemplar> void
+ addToTree(BasicKdTree<X> tree, X ex) {
+   while(tree != null) {
+     if (tree.isTree()) {
+       // Traverse in search of a leaf
+       tree = ex.domain[tree.splitDim] <= tree.split
+         ? tree.left : tree.right;
+     } else {
+       if (tree.data == null) {
+         tree.data = ex;
+       } else {
+         // Split tree and add
+         // Find smallest exemplar to be our split point
+         final int d = tree.splitDim;
+         X leftX = ex, rightX = tree.data;
+         if (rightX.domain[d] < leftX.domain[d]) {
+           leftX = tree.data;
+           rightX = ex;
+         }
+         tree.split = 0.5*(leftX.domain[d] + rightX.domain[d]);
+         final int nextSplitDim =
+           (tree.splitDim + 1)%tree.dimensions();
+         tree.left = new BasicKdTree<X>();
+         tree.left.splitDim = nextSplitDim;
+         tree.left.data = leftX;
+         tree.right = new BasicKdTree<X>();
+         tree.right.splitDim = nextSplitDim;
+         tree.right.data = rightX;
+       }
+       // Done.
+       tree = null;
+     }
+   }
+ }
+ private int
+ dimensions() { return data.domain.length; }
-Each of the public API functions defers the actual addition to another private static function. This is to avoid accidentally referring to instance variables while we walk the tree. This is a common pattern we will use for much of the actual behaviour code for this tree.
+===Searching===
+Before we start coding a search method, we need a helper class to store search results along with their distance from the query point. This is called <code>PrioNode</code> because we will eventually re-use it to implement a custom priority queue.
-We decide whether to split a leaf only after the add has been completed.
+ public final class PrioNode<T> {
+   public final double priority;
+   public final T data;
+   public PrioNode(double priority, T data) {
+     this.priority = priority;
+     this.data = data;
+   }
+ }
-<pre>
+Like with our <code>add()</code> method, our <code>search()</code> method delegates to a static method to avoid introducing bugs by accidentally referring to member variables while we descend the tree.
-// One at a time
-public void
-add(T ex) {
-  BucketKdTree<T> tree = addNoSplit(this, ex);
-  if (shouldSplit(tree)) {
-    split(tree);
-  }
-}
-// Bulk add gives us more data to choose a better split point
+ public Iterable<? extends PrioNode<X>>
-public void
+ search(double[] query, int nResults) {
-addAll(Collection<T> exs) {
+   return BasicKdTree.search(this, query, nResults);
-  final Set<BucketKdTree<T>> modTrees =
+ }
-    new HashSet<BucketKdTree<T>>();
-  for(T ex : exs) {
-    modTrees.add(addNoSplit(this, ex));
-  }
-  for(BucketKdTree<T> tree : modTrees) {
+To do a nearest neighbours search, we walk the tree, preferring to search sub-trees which are on the same side of the split as the query point first. Once we have found our initial candidates, we can contract our search sphere. We only search the other sub-tree if our search sphere might spill over onto the other side of the split.
-    if (shouldSplit(tree)) {
-      split(tree);
-    }
-  }
-}
-</pre>
-To add an exemplar, we traverse the tree from top down until we find a leaf. Then we add the exemplar to the list at that leaf. For now, there is no special ordering to the list of exemplars in a leaf.
+Results are collected in a <code>java.util.PriorityQueue</code> so we can easily remove the farthest exemplars as we find closer ones.
-To decide which sub-tree to traverse, each tree stores two values&mdash;a splitting dimension and a splitting value. If our new exemplar's domain along the splitting dimension is greater than the tree's splitting value, we put it in the right sub-tree. Otherwise, it goes in the left one.
+ private static <X extends Exemplar> Iterable<? extends PrioNode<X>>
+ search(BasicKdTree<X> tree, double[] query, int nResults) {
+   final Queue<PrioNode<X>> results =
+     new PriorityQueue<PrioNode<X>>(nResults,
+       new Comparator<PrioNode<X>>() {
+         // min-heap
+         public int
+         compare(PrioNode<X> o1, PrioNode<X> o2) {
+           return o1.priority == o2.priority ? 0
+             : o1.priority > o2.priority ? -1
+             : 1;
+         }
+       }
+     );
+   final Deque<BasicKdTree<X>> stack
+     = new LinkedList<BasicKdTree<X>>();
+   stack.addLast(tree);
+   while (!stack.isEmpty()) {
+     tree = stack.removeLast();
+     if (tree.isTree()) {
+       // Guess nearest tree to query point
+       BasicKdTree<X> nearTree = tree.left, farTree = tree.right;
+       if (query[tree.splitDim] > tree.split) {
+         nearTree = tree.right;
+         farTree = tree.left;
+       }
+       // Only search far tree if our search sphere might
+       // overlap with splitting plane
+       if (results.size() < nResults
+         || sq(query[tree.splitDim] - tree.split)
+           <= results.peek().priority)
+       {
+         stack.addLast(farTree);
+       }
+       // Always search the nearest branch
+       stack.addLast(nearTree);
+     } else {
+       final double dSq = distanceSqFrom(query, tree.data.domain);
+       if (results.size() < nResults
+         || dSq < results.peek().priority)
+       {
+         while (results.size() >= nResults) {
+           results.poll();
+         }
+         results.offer(new PrioNode<X>(dSq, tree.data));
+       }
+     }
+   }
+   return results;
+ }
+ private static double
+ sq(double n) { return n*n; }
-Since adding takes little time compared to searching, we take this opportunity to make some optimisations:
+Our distance calculation is optimised because it will be called often. We use squared distances to avoid an unnecessary <code>sqrt()</code> operation. We also don't use <code>Math.pow()</code> because the JRE's default implementation must do some extra work before it can return with a simple multiplication (see the source in [http://www.netlib.org/fdlibm/e_pow.c fdlibm]).
-* We keep track of the actual hyperrect the points in this tree occupy. This lets us rule out a tree, even though its space may intersect with our search sphere, if it doesn't actually contain any points within the hyperrect bounding our search sphere. This hyperrect is defined by <code>maxBounds</code> and <code>minBounds</code>.
+ private static double
+ distanceSqFrom(double[] p1, double[] p2) {
+   // Note: profiling shows this is called lots of times, so it pays
+   // to be well optimised
+   double dSq = 0;
+   for(int d = 0; d < p1.length; d++) {
+     final double dst = p1[d] - p2[d];
+     if (dst != 0)
+       dSq += dst*dst;
+   }
+   return dSq;
+ }
-* To save us having to do a full iteration when we come to split a leaf, we compute the running mean and variance for each dimension using [http://www.johndcook.com/standard_deviation.html Welford's method]:
+And that's it. Barely 200 lines of code and we have ourselves a working ''k''d-tree.
-    <math>
-    M_1 = x_1,\qquad S_1 = 0</math>
-    <math>
-    M_k = M_{k-1} + {x_k - M_{k-1} \over k} \qquad(exMean)</math>
-    <math>
-    S_k = S_{k-1} + (x_k - M_{k-1})\times(x_k - M_k) \qquad(exSumSqDev)</math>
-* We keep track of whether all exemplars in this leaf have the same domain. If they do, we know that comparisons on one exemplar apply to all exemplars in that leaf.
+===Evaluation===
+As a guide to performance, we will use the [[k-NN algorithm benchmark]] with the [http://homepages.ucalgary.ca/~agschult/robocode/gun-data-Diamond-vs-jk.mini.CunobelinDC%200.3.csv.gz Diamond vs CunobelinDC gun data]. For comparison, the benchmark included an optimised linear search to represent the lower bound of performance, and Rednaxela's tree&mdash;credited to be the fastest tree on the wiki. Full results are near the bottom of the page. Here are the relevant extracts for our tree:
-<pre>
+ RESULT << k-nearest neighbours search with Voidious' Linear search >>
-private int splitDim;
+ : Average searching time       = 0.5740 miliseconds
-private double split;
+ : Average worst searching time = 2.3482 miliseconds
+ : Average adding time          = 0.4528 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
+ : Average searching time       = 0.0378 miliseconds
+ : Average worst searching time = 0.2625 miliseconds
+ : Average adding time          = 1.5687 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
+ : Average searching time       = 0.3939 miliseconds
+ : Average worst searching time = 11.5167 miliseconds
+ : Average adding time          = 0.5165 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ BEST RESULT:
+  - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
+ [...]
+  - #8 duyn's basic kd-tree [0.3939]
+  - #9 Voidious' Linear search [0.5740]
-// These aren't initialised until add() is called.
+This test run showed that average add time was close to a linear search while searching performance improved by (0.3939/0.5740 - 1) ~= 31%. It's still an order of magnitude slower than Rednaxela's included tree, which shows there is a lot to gain by selecting appropriate optimisations.
-private double[] exMean;
-private double[] exSumSqDev;
-// Optimisation when tree contains large number of duplicates
+The following code was used in the benchmark:
-private boolean exemplarsAreUniform = true;
-// Optimisation for searches. This lets us skip a node if its
+* KNNBenchmark.java
-// scope intersects with a search hypersphere but it doesn't contain
-// any points that actually intersect.
-private double[] maxBounds;
-private double[] minBounds;
-// Adds an exemplar without splitting overflowing leaves.
+ public KNNBenchmark(final int dimension, final int numNeighbours, final SampleData[] samples, final int numReps) {
-// Returns leaf to which exemplar was added.
+   final Class<?>[] searchAlgorithms = new Class<?>[] {
-private static <T extends Exemplar> BucketKdTree<T>
+     FlatKNNSearch.class,
-addNoSplit(BucketKdTree<T> tree, T ex) {
+     SimontonTreeKNNSearch.class,
-  // Some spurious function calls. Optimised for readability over
+     NatTreeKNNSearch.class,
-  // efficiency.
+     VoidiousTreeKNNSearch.class,
-  BucketKdTree<T> cursor = tree;
+ +   DuynBasicKNNSearch.class,
-  while (cursor != null) {
+     RednaxelaTreeKNNSearch.class,
-    updateBounds(cursor, ex);
+   };
-    if (cursor.isTree()) {
+   ...
-      // Sub-tree
+ }
-      cursor = ex.domain[cursor.splitDim] <= cursor.split
-        ? cursor.left : cursor.right;
+* DuynBasicKNNSearch.java
-    } else {
-      // Leaf
-      cursor.exemplars.add(ex);
-      final int nExs = cursor.exemplars.size();
-      if (nExs == 1) {
-        cursor.exMean =
-          Arrays.copyOf(ex.domain, cursor.dimensions);
-        cursor.exSumSqDev = new double[cursor.dimensions];
-      } else {
-        for(int d = 0; d < cursor.dimensions; d++) {
-          final double coord = ex.domain[d];
-          final double oldExMean = cursor.exMean[d];
+ public class DuynBasicKNNSearch extends KNNImplementation {
-          final double newMean = cursor.exMean[d] =
+   final BasicKdTree<StringExemplar> tree;
-            oldExMean + (coord - oldExMean)/nExs;
+   public DuynBasicKNNSearch(final int dimension) {
+     super(dimension);
+     tree = new BasicKdTree<StringExemplar>();
+   }
+   @Override public void
+   addPoint(final double[] location, final String value) {
+     tree.add(new StringExemplar(location, value));
+   }
+   @Override public String
+   getName() {
+     return "duyn's basic kd-tree";
+   }
+   @Override public KNNPoint[]
+   getNearestNeighbors(final double[] location, final int size) {
+     final List<KNNPoint> justPoints = new LinkedList<KNNPoint>();
+     for(PrioNode<StringExemplar> sr : tree.search(location, size)) {
+       final double distance = sr.priority;
+       final StringExemplar pt = sr.data;
+       justPoints.add(new KNNPoint(pt.value, distance));
+     }
+     final KNNPoint[] retVal = new KNNPoint[justPoints.size()];
+     return justPoints.toArray(retVal);
+   }
+   class StringExemplar extends Exemplar {
+     public final String value;
+     public StringExemplar(final double[] coords, final String value) {
+       super(coords);
+       this.value = value;
+     }
+   }
+ }
-          final double oldSumSqDev = cursor.exSumSqDev[d];
+==Optimisations to the Tree==
-          cursor.exSumSqDev[d] = oldSumSqDev
+In this section, we will introduce the following optimisations to our tree:
-            + (coord - oldExMean)*(coord - newMean);
+* '''Bucket leaves.''' Leaves will store multiple exemplars, only being split when they overflow. This makes our split points less dependent on the order in which we receive points and gives us more opportunity to choose a better split point.
-        }
+* '''Bounds-overlap-ball testing.''' Each tree stores the bounds of the smallest hyperrect which will contain all the data in that tree. This is smaller&mdash;maybe significantly so&mdash;than the bounds which that sub-tree occupy, giving us more opportunities to skip sub-trees. We will check that the minimum distance from each tree's content bounds overlaps with our search hypersphere. This lets us avoid walking all the way to a leaf if a tree's content bounds overlap with our search hypersphere but none of its children's content bounds do.
-      }
+* '''Splitting choice.''' We will split on the mean of the dimension with the largest variance. Splitting on the mean does not guarantee that our tree will be balanced, but is easier for us since we would have already calculated the mean in finding the dimension with largest variance. This will add more overhead to our <code>add</code> times since we will compute running means and variances each time an exemplar is added.
-      if (cursor.exemplarsAreUniform) {
+* '''Singularity tracking.''' We will actively check whether all the exemplars in a leaf have the same domain. This is necessary to avoid getting ourselves caught in an infinite loop trying to repeatedly split an unsplittable leaf. It also lets us avoid repeated distance calculations when searching for nearest neighbours. Practically, unless all your dimensions are limited, discrete values, this optimisation is unlikely to make much impact.
-        final List<T> cExs = cursor.exemplars;
-        if (cExs.size() > 0 && !ex.domainEquals(cExs.get(0)))
-          cursor.exemplarsAreUniform = false;
-      }
-      return cursor;
-    }
-  }
-  throw new RuntimeException("Walked tree without adding anything");
-}
-private static <T extends Exemplar> void
+===Basic Tree===
-updateBounds(BucketKdTree<T> tree, Exemplar ex) {
+Once again, we start off with a basic binary tree. Each tree must store its own bucket size since it will not have a reference to its parent Bucket size is not static because we might want to build multiple trees with different bucket sizes.
-  final int dims = tree.dimensions;
-  if (tree.maxBounds == null) {
-    tree.maxBounds = Arrays.copyOf(ex.domain, dims);
-    tree.minBounds = Arrays.copyOf(ex.domain, dims);
-  } else {
-    for(int d = 0; d < dims; d++) {
-      final double dimVal = ex.domain[d];
-      if (dimVal > tree.maxBounds[d])
-        tree.maxBounds[d] = dimVal;
-      else if (dimVal < tree.minBounds[d])
-        tree.minBounds[d] = dimVal;
-    }
-  }
-}
-</pre>
-==Splitting==
+ public final class OptKdTree<X extends Exemplar> {
-We only split when a leaf's exemplars exceed its bucket size. It is only worth splitting if the exemplars don't all have the same domain.
+   final Queue<X> data;
+   OptKdTree<X> left = null, right = null;
+   // Number of exemplars to hold in a leaf before splitting
+   private final int bucketSize;
+   private static final int DEFAULT_BUCKET_SIZE = 10;
+   public OptKdTree() {
+     this(DEFAULT_BUCKET_SIZE);
+   }
+   public OptKdTree(int bucketSize) {
+     this.bucketSize = bucketSize;
+     this.data = new LinkedList<X>();
+   }
+   private final boolean
+   isTree() { return left != null; }
+   [...]
+ }
-<pre>
+===Adding===
-private static <T extends Exemplar> boolean
+Now that our code is a little more sophisticated, it becomes profitable to have a dedicated method to bulk-load our tree. This also makes it easier to add dynamic rebalance support later if we need it. To take advantage the additional information we get from bulk loading, we decide whether to split a leaf only after all exemplars have been added.
-shouldSplit(BucketKdTree<T> tree) {
-  return tree.exemplars.size() > tree.bucketSize
-    && !tree.exemplarsAreUniform;
-}
-</pre>
-Thanks to our pre-computation, splitting is straight-forward&mdash;most of the time. We iterate through each dimension to find the one with the largest variance (skip the unnecessary division), then we can directly look up the mean of that dimension.
+ public void
+ add(X ex) {
+   OptKdTree<X> tree = addNoSplit(this, ex);
+   if (shouldSplit(tree)) {
+     split(tree);
+   }
+ }
+ public void
+ addAll(Collection<X> exs) {
+   final Set<OptKdTree<X>> modTrees =
+     new HashSet<OptKdTree<X>>();
+   for(X ex : exs) {
+     modTrees.add(addNoSplit(this, ex));
+   }
+   for(OptKdTree<X> tree : modTrees) {
+     if (shouldSplit(tree)) {
+       split(tree);
+     }
+   }
+ }
-To make sure the point actually does divide our data, we separate our data into two lists destined for each sub-tree. If all the exemplars end up in only one of the lists, then our split point has failed to actually separate our exemplars. This is most likely due to rounding error when our exemplars are really close together. At a loss for what to do, we simply pick a random point and a random dimension until we find something that parts our exemplars. We know we must find one eventually because our exemplars are not uniform&mdash;at least one of them is smaller in at least one dimension than all the others.
+To add an exemplar, we traverse the tree from top down until we find a leaf. Then we add the exemplar to the list at that leaf. For now, there is no special ordering to the list of exemplars in a leaf. To decide which sub-tree to traverse, each tree stores two values&mdash;a splitting dimension and a splitting value. If our new exemplar's domain along the splitting dimension is greater than the tree's splitting value, we put it in the right sub-tree. Otherwise, it goes in the left one.
-Finally, we bulk load our sub-trees, store information about our split point and let go of the exemplars stored in the tree.
+Since adding takes little time compared to searching, we take this opportunity to make some optimisations:
-<pre>
+* We keep track of the actual hyperrect the points in this tree occupy. This lets us rule out a tree, even though its space may intersect with our search sphere, if it has no chance of containing any points within our search sphere. This hyperrect is fully defined by just storing its two extreme corners.
-@SuppressWarnings("unchecked") private static <T extends Exemplar> void
-split(BucketKdTree<T> tree) {
-  assert !tree.exemplarsAreUniform;
-  // Find dimension with largest variance to split on
-  double largestVar = -1;
-  int splitDim = 0;
-  for(int d = 0; d < tree.dimensions; d++) {
-    final double var =
-      tree.exSumSqDev[d]/(tree.exemplars.size() - 1);
-    if (var > largestVar) {
-      largestVar = var;
-      splitDim = d;
-    }
-  }
-  // Find mean as position for our split
+* To save us having to do a full iteration when we come to split a leaf, we compute the running mean and variance for each dimension using [http://www.johndcook.com/standard_deviation.html Welford's method]:
-  double splitValue = tree.exMean[splitDim];
+    <math>
+    M_1 = x_1,\qquad S_1 = 0</math>
+    <math>
+    M_k = M_{k-1} + {x_k - M_{k-1} \over k} \qquad(exMean)</math>
+    <math>
+    S_k = S_{k-1} + (x_k - M_{k-1})\times(x_k - M_k) \qquad(exSumSqDev)</math>
-  // Check that our split actually splits our data. This also lets
+* We keep track of whether all exemplars in this leaf have the same domain. If they do, we know that comparisons on one exemplar apply to all exemplars in that leaf.
-  // us bulk load exemplars into sub-trees, which is more likely
-  // to keep optimal balance.
-  final List<T> leftExs = new LinkedList<T>();
-  final List<T> rightExs = new LinkedList<T>();
-  for(T s : tree.exemplars) {
-    if (s.domain[splitDim] <= splitValue)
-      leftExs.add(s);
-    else
-      rightExs.add(s);
-  }
-  int leftSize = leftExs.size();
-  final int treeSize = tree.exemplars.size();
-  if (leftSize == treeSize || leftSize == 0) {
-    System.err.println(
-      "WARNING: Randomly splitting non-uniform tree");
-    // We know the exemplars aren't all the same, so try picking
-    // an exemplar and a dimension at random for our split point
-    // This might take several tries, so we copy our exemplars to
+ // These aren't initialised until add() is called.
-    // an array to speed up process of picking a random point
+ private double[] exMean = null, exSumSqDev = null;
-    Object[] exs = tree.exemplars.toArray();
-    while (leftSize == treeSize || leftSize == 0) {
+ // Optimisation when sub-tree contains only duplicates
-      leftExs.clear();
+ private boolean singularity = true;
-      rightExs.clear();
+ // Optimisation for searches. This lets us skip a node if its
+ // scope intersects with a search hypersphere but it doesn't contain
+ // any points that actually intersect.
+ private double[] contentMax = null, contentMin = null;
+ private int
+ dimensions() { return contentMax.length; }
+ // Addition
+ // Adds an exemplar without splitting overflowing leaves.
+ // Returns leaf to which exemplar was added.
+ private static <X extends Exemplar> OptKdTree<X>
+ addNoSplit(OptKdTree<X> tree, X ex) {
+   // Some spurious function calls. Optimised for readability over
+   // efficiency.
+   OptKdTree<X> cursor = tree;
+   while (cursor != null) {
+     updateBounds(cursor, ex);
+     if (cursor.isTree()) {
+       // Sub-tree
+       cursor = ex.domain[cursor.splitDim] <= cursor.split
+         ? cursor.left : cursor.right;
+     } else {
+       // Leaf
+       // Add exemplar to leaf
+       cursor.data.add(ex);
+       // Calculate running mean and sum of squared deviations
+       final int nExs = cursor.data.size();
+       final int dims = cursor.dimensions();
+       if (nExs == 1) {
+         cursor.exMean = Arrays.copyOf(ex.domain, dims);
+         cursor.exSumSqDev = new double[dims];
+       } else {
+         for(int d = 0; d < dims; d++) {
+           final double coord = ex.domain[d];
+           final double oldMean = cursor.exMean[d], newMean;
+           cursor.exMean[d] = newMean =
+             oldMean + (coord - oldMean)/nExs;
+           cursor.exSumSqDev[d] = cursor.exSumSqDev[d]
+             + (coord - oldMean)*(coord - newMean);
+         }
+       }
+       // Check that data are still uniform
+       if (cursor.singularity) {
+         final Queue<X> cExs = cursor.data;
+         if (cExs.size() > 0 && !ex.collocated(cExs.peek()))
+           cursor.singularity = false;
+       }
+       // Finished walking
+       return cursor;
+     }
+   }
+   throw new RuntimeException("Walked tree without adding anything");
+ }
-      splitDim = (int)
+To update the bounding hyperrect for the contents of a tree, we iterate through each dimension, extending (if necessary) the bounds in that dimension to contain the new exemplar.
-        Math.floor(Math.random()*tree.dimensions);
-      final int splitPtIdx = (int)
-        Math.floor(Math.random()*exs.length);
-      // Cast is inevitable consequence of java's inability to
-      // create a generic array
-      splitValue = ((T)exs[splitPtIdx]).domain[splitDim];
-      for(T s : tree.exemplars) {
-        if (s.domain[splitDim] <= splitValue)
-          leftExs.add(s);
-        else
-          rightExs.add(s);
-      }
-      leftSize = leftExs.size();
-    }
-  }
-  // We have found a valid split. Start building our sub-trees
+ private static <T extends Exemplar> void
-  final BucketKdTree<T> left =
+ updateBounds(OptKdTree<T> tree, Exemplar ex) {
-    new BucketKdTree<T>(tree.bucketSize, tree.dimensions);
+   final int dims = ex.domain.length;
-  final BucketKdTree<T> right =
+   if (tree.contentMax == null) {
-    new BucketKdTree<T>(tree.bucketSize, tree.dimensions);
+     tree.contentMax = Arrays.copyOf(ex.domain, dims);
-  left.addAll(leftExs);
+     tree.contentMin = Arrays.copyOf(ex.domain, dims);
-  right.addAll(rightExs);
+   } else {
+     for(int d = 0; d < dims; d++) {
+       final double dimVal = ex.domain[d];
+       if (dimVal > tree.contentMax[d])
+         tree.contentMax[d] = dimVal;
+       else if (dimVal < tree.contentMin[d])
+         tree.contentMin[d] = dimVal;
+     }
+   }
+ }
-  // Finally, commit the split
+===Splitting===
-  tree.splitDim = splitDim;
+We only split when a leaf's exemplars exceed its bucket size. It is only worth splitting if the exemplars don't all have the same domain.
-  tree.split = splitValue;
-  tree.left = left;
-  tree.right = right;
-  // Let go of exemplars (and their running stats) held in this leaf
+ private static <T extends Exemplar> boolean
-  tree.exemplars = null;
+ shouldSplit(OptKdTree<T> tree) {
-  tree.exMean = tree.exSumSqDev = null;
+   return tree.data.size() > tree.bucketSize
-}
+     && !tree.singularity;
-</pre>
+ }
-==Searching==
+Thanks to our pre-computation, splitting is usually straight-forward. We iterate through each dimension to find the one with the largest variance (skip the unnecessary division), then we can directly look up the mean of that dimension. This strategy might not successfully split the tree if exemplars are so close together that rounding errors in computing the mean result in the mean not lying strictly between the exemplars in a leaf. When this happens, we resort to repeatedly trying a dimension and an exemplar at random for our splitting point. We only call <code>split</code> if the leaf is not a singularity, so eventually we will find a point that does divide our exemplars.
-Before we can start searching, we need to define our distance metric. We will be using euclidean distance. To speed up calculations, we'll skip the sqrt() operation and just work with squared distances.
-The code here is optimised because it will be called a lot during searching. The most significant optimisation was not using <code>Math.pow</code>, which [http://www.netlib.org/fdlibm/e_pow.c must do some extra work before it can return with a multiplication].
+To make sure the point actually does divide our data, we separate our data into two lists destined for each sub-tree. Once we have found a successful split point, we bulk load our sub-trees, store information about our split point and let go of the exemplars stored in the tree.
-<pre>
-// Gets distance from target of nearest point on hyper-rect defined
-// by supplied min and max bounds
-private static double
-minDistanceSqFrom(double[] target, double[] min, double[] max) {
-  // Note: profiling shows this is called lots of times, so it pays
-  // to be well optimised
-  double distanceSq = 0;
-  for(int d = 0; d < target.length; d++) {
-    final double coord = target[d];
-    double nearCoord;
-    if (((nearCoord = min[d]) > coord)
-      || ((nearCoord = max[d]) < coord))
-    {
-      final double dst = nearCoord - coord;
-      distanceSq += dst*dst;
-    }
-  }
-  return distanceSq;
-}
-// Accessible to testing
+ // Split properties. Not initialised until split occurs.
-static double
+ private int splitDim = 0;
-distanceSqFrom(double[] p1, double[] p2) {
+ private double split = Double.NaN;
-  // Note: profiling shows this is called lots of times, so it pays
-  // to be well optimised
+ @SuppressWarnings("unchecked") private static <T extends Exemplar> void
-  double dSq = 0;
+ split(OptKdTree<T> tree) {
-  for(int d = 0; d < p1.length; d++) {
+   assert !tree.singularity;
-    final double dst = p1[d] - p2[d];
+   // Find dimension with largest variance to split on
-    if (dst != 0)
+   double largestVar = -1;
-      dSq += dst*dst;
+   int splitDim = 0;
-  }
+   for(int d = 0; d < tree.dimensions(); d++) {
-  return dSq;
+     // Don't need to divide by number of data to find largest
-}
+     // variance
-</pre>
+     final double var = tree.exSumSqDev[d];
+     if (var > largestVar) {
+       largestVar = var;
+       splitDim = d;
+     }
+   }
+   // Find mean as position for our split
+   double splitValue = tree.exMean[splitDim];
+   // Check that our split actually splits our data. This also lets
+   // us bulk load data into sub-trees, which is more likely
+   // to keep optimal balance.
+   final Queue<T> leftExs = new LinkedList<T>();
+   final Queue<T> rightExs = new LinkedList<T>();
+   for(T s : tree.data) {
+     if (s.domain[splitDim] <= splitValue)
+       leftExs.add(s);
+     else
+       rightExs.add(s);
+   }
+   int leftSize = leftExs.size();
+   final int treeSize = tree.data.size();
+   if (leftSize == treeSize || leftSize == 0) {
+     System.err.println(
+       "WARNING: Randomly splitting non-uniform tree");
+     // We know the data aren't all the same, so try picking
+     // an exemplar and a dimension at random for our split point
+     // This might take several tries, so we copy our data to
+     // an array to speed up process of picking a random point
+     Object[] exs = tree.data.toArray();
+     while (leftSize == treeSize || leftSize == 0) {
+       leftExs.clear();
+       rightExs.clear();
+       splitDim = (int)
+         Math.floor(Math.random()*tree.dimensions());
+       final int splitPtIdx = (int)
+         Math.floor(Math.random()*exs.length);
+       // Cast is inevitable consequence of java's inability to
+       // create a generic array
+       splitValue = ((T)exs[splitPtIdx]).domain[splitDim];
+       for(T s : tree.data) {
+         if (s.domain[splitDim] <= splitValue)
+           leftExs.add(s);
+         else
+           rightExs.add(s);
+       }
+       leftSize = leftExs.size();
+     }
+   }
+   // We have found a valid split. Start building our sub-trees
+   final OptKdTree<T> left = new OptKdTree<T>(tree.bucketSize);
+   final OptKdTree<T> right = new OptKdTree<T>(tree.bucketSize);
+   left.addAll(leftExs);
+   right.addAll(rightExs);
+   // Finally, commit the split
+   tree.splitDim = splitDim;
+   tree.split = splitValue;
+   tree.left = left;
+   tree.right = right;
+   // Let go of data (and their running stats) held in this leaf
+   tree.data.clear();
+   tree.exMean = tree.exSumSqDev = null;
+ }
-Before we dive into searching, we will introduce two helper classes.
+===Distance calculations===
-* <code>SearchStackEntry</code> stores a tree along with its minimum distance from the query point. We put these on our tree-walking stack instead of just trees so we don't have to re-compute this minimum distance when we next pop the tree.
+Before we can start searching, we need to define our distance metric. The distance calculation has already been introduced before.
-* <code>SearchState</code> holds some variables we want to be updated in after each nearest neighbour is added. Its sole purpose is to let us separate searching sub-trees and leaves into different methods without sacrificing performance.
-<pre>
+:Squared Euclidean distance:
+:<math>distance_{x\rightarrow y}^2 = \sum (x[d] - y[d])^2</math>
-//
+This will be called for every point examined, so we should optimise it well.
-// class SearchStackEntry
-//
-// Stores a precomputed distance so we don't have to do it again
-// when we pop the tree off the search stack.
-private static class SearchStackEntry<T extends Exemplar> {
+ private static double
-  public final double minDFromQ;
+ distanceSqFrom(double[] p1, double[] p2) {
-  public final BucketKdTree<T> tree;
+   double dSq = 0;
+   for(int d = 0; d < p1.length; d++) {
+     final double dst = p1[d] - p2[d];
+     if (dst != 0)
+       dSq += dst*dst;
+   }
+   return dSq;
+ }
-  public SearchStackEntry(double minDFromQ, BucketKdTree<T> tree) {
+We will also need to find the minimum distance from a hyperrect to a query point. This will let us rule out trees whose content hyperrects are outside our search sphere. To find the minimum distance between a hyperrect and a given point, we need first find the closest point along bounds of that hyperrect to the point. This is easy to do, even if not immediately obvious.
-    this.minDFromQ = minDFromQ;
-    this.tree = tree;
-  }
-}
-//
+:Along each dimension,
-// class SearchState
+:<math>nearestPoint[d]=\begin{cases}
-//
+min[d] & target[d]\le min[d]\\
-// Holds data about current state of the search. Used for live updating
+target[d] & min<target[d]<max[d]\\
-// of pruning distance.
+max[d] & target[d]\ge max[d]\end{cases}</math>
-private static class SearchState {
+After that, we just calculate the distance from our query point to this nearest point. The code below has been optimised because it gets called a lot during a search.
-  int nResults = 0;
-  double maxDistance = Double.POSITIVE_INFINITY;
-  int nExsAtMaxD = 0;
-}
-</pre>
-Our search method continuously pops our search stack, decides whether it's looking at a sub-tree or a leaf and passes the call on as appropriate.
+ // Gets distance from target of nearest point on hyper-rect defined
+ // by supplied min and max bounds
+ private static double
+ minDistanceSqFrom(double[] target, double[] min, double[] max) {
+   double distanceSq = 0;
+   for(int d = 0; d < target.length; d++) {
+     if (target[d] < min[d]) {
+       final double dst = min[d] - target[d];
+       distanceSq += dst*dst;
+     } else if (target[d] > max[d]) {
+       final double dst = max[d] - target[d];
+       distanceSq += dst*dst;
+     }
+   }
+   return distanceSq;
+ }
-<pre>
+===Searching===
-public SortedMap<Double, List<T>>
+Before we dive into searching, we will introduce two helper classes.
-search(double[] query, int nMinResults) {
+* <code>SearchStackEntry</code> stores a tree along any data which will be useful to keep on the search stack. Currently, this additional information consists only of whether a bounds check should be done. This lets us skip a redundant bounds check on the root of the tree.
-  // Forward to a static method to avoid accidental reference to
-  // instance variables while descending the tree
-  return search(this, query, nMinResults);
-}
-// May return more results than requested if multiple exemplars have
+ //
-// same distance from target.
+ // class SearchStackEntry
-//
+ //
-// Note: this function works with squared distances to avoid sqrt()
-// operations
+ private static class SearchStackEntry<T extends Exemplar> {
-private static <T extends Exemplar> SortedMap<Double, List<T>>
+   public final boolean needBoundsCheck;
-search(BucketKdTree<T> tree, double[] query, int nMinResults) {
+   public final OptKdTree<T> tree;
-  // distance => list of points that distance away from query
-  final NavigableMap<Double, List<T>> results =
+   public SearchStackEntry(boolean needBoundsCheck, OptKdTree<T> tree) {
-    new TreeMap<Double, List<T>>();
+     this.needBoundsCheck = needBoundsCheck;
+     this.tree = tree;
+   }
+ }
-  final SearchState state = new SearchState();
+* <code>SearchState</code> stores the current state of the search. Its sole purpose is to let us separate searching sub-trees and leaves into different methods. This breaks up a long method and makes profiling easier.
-  final Deque<SearchStackEntry<T>> stack =
-    new LinkedList<SearchStackEntry<T>>();
-  stack.addFirst(new SearchStackEntry<T>(state.maxDistance, tree));
-  while (!stack.isEmpty()) {
-    final SearchStackEntry<T> entry = stack.removeFirst();
-    final BucketKdTree<T> cur = entry.tree;
-    if (cur.isTree()) {
+ //
-      searchTree(query, nMinResults, cur, state, stack);
+ // class SearchState
-    } else if (entry.minDFromQ <= state.maxDistance
+ //
-      || state.nResults < nMinResults)
+ // Holds data about current state of the search. Used for live updating
-    {
+ // of pruning distance.
-      searchLeaf(query, nMinResults, cur, state, results);
-    }
+ private static class SearchState<X extends Exemplar> {
-  }
+   final int nResults;
+   final PriorityQueue<PrioNode<X>> results;
+   public SearchState(int nResults) {
+     this.nResults = nResults;
+     results = new PriorityQueue<PrioNode<X>>(nResults,
+       new Comparator<PrioNode<X>>() {
+         public int
+         compare(PrioNode<X> o1, PrioNode<X> o2) {
+           return o1.priority == o2.priority ? 0
+             : o1.priority > o2.priority ? -1 : 1;
+         }
+       });
+   }
+ }
-  return results;
+Our search method does a depth-first search using a stack. Each time we pop our stack, we do a bounds check if necessary, then defer to the appropriate method.
-}
-</pre>
-===Searching a sub-tree===
+ public Iterable<PrioNode<X>>
-Here is where we see two of our optimisations come into play. For each non-empty sub-tree, we compute the minimum distance from the smallest hyperrect bounding that sub-tree's exemplars to our query point. If this distance exceeds our maximum distance, we can rule out the entire sub-tree. Of course, if we haven't found our k-potential nearest neighbours yet, then both sub-trees may contain potential nearest neighbours so we will add them to the search stack anyway.
+ search(double[] query, int nResults) {
+   // Forward to a static method to avoid accidental reference to
+   // instance variables while descending the tree
+   return search(this, query, nResults);
+ }
+ private static <X extends Exemplar> Iterable<PrioNode<X>>
+ search(OptKdTree<X> tree, double[] query, int nResults) {
+   final SearchState<X> state = new SearchState<X>(nResults);
+   final Deque<SearchStackEntry<X>> stack =
+     new LinkedList<SearchStackEntry<X>>();
+   if (tree.contentMin != null)
+     stack.addLast(new SearchStackEntry<X>(false, tree));
-As a heuristic, we want to search the sub-tree with an edge closest to the query point first. This is more likely to lead to earlier contraction of our search distance. This is why we add the farthest tree to our stack first.
+   while (!stack.isEmpty()) {
+     final SearchStackEntry<X> entry = stack.removeLast();
+     final OptKdTree<X> cur = entry.tree;
+     if (entry.needBoundsCheck && state.results.size() >= nResults) {
+       final double d = minDistanceSqFrom(query,
+         cur.contentMin, cur.contentMax);
+       if (d > state.results.lastKey())
+         continue;
+     }
+     if (cur.isTree()) {
+       searchTree(query, cur, stack, state.results.peek().priority);
+     } else {
+       searchLeaf(query, cur, state);
+     }
+   }
+   return state.results;
+ }
-<pre>
+====Searching a Tree====
-private static <T extends Exemplar> void
+We basically want to add both sub-trees to the stack, if they are not empty. As a heuristic, we think the sub-tree on the same side of the split as our query point is more likely to contain closer points, so we search that sub-tree first by adding it to the stack last. You might try to avoid a distance calculation by not requiring a bounds check for the heuristically nearer sub-tree, but that does not seem to significantly improve performance.
-searchTree(double[] query, int nMinResults, BucketKdTree<T> tree,
-  SearchState searchState, Deque<SearchStackEntry<T>> stack)
-{
-  // Left is presumed near. This is verified further down.
-  BucketKdTree<T> nearTree = tree.left, farTree = tree.right;
-  // These variables let us skip empty sub-trees
-  boolean nearEmpty = nearTree.minBounds == null;
-  boolean farEmpty = farTree.minBounds == null;
-  // Find distance from nearest possible point in each
+ private static <X extends Exemplar> void
-  // sub-tree to query. If that is greater than max distance,
+ searchTree(double[] query, OptKdTree<X> tree,
-  // we can rule out that sub-tree.
+   Deque<SearchStackEntry<X>> stack, double maxPriority)
-  double nearD = nearEmpty ? Double.POSITIVE_INFINITY
+ {
-    : minDistanceSqFrom(query,
+   OptKdTree<X> nearTree = tree.left, farTree = tree.right;
-      nearTree.minBounds, nearTree.maxBounds);
+   if (query[tree.splitDim] > tree.split) {
-  double farD = farEmpty ? Double.POSITIVE_INFINITY
+     nearTree = tree.right;
-    : minDistanceSqFrom(query,
+     farTree = tree.left;
-      farTree.minBounds, farTree.maxBounds);
+   }
+   // These variables let us skip empty sub-trees
+   boolean nearEmpty = nearTree.contentMin == null;
+   boolean farEmpty = farTree.contentMin == null;
+   // Add nearest sub-tree to stack later so we descend it
+   // first. This is likely to constrict our max distance
+   // sooner, resulting in less visited nodes
+   if (!farEmpty && minDistanceSqFrom(query,farTree.contentMin,farTree.contentMax) < maxPriority) {
+     stack.addLast(new SearchStackEntry<X>(true, farTree));
+   }
+   if (!nearEmpty && minDistanceSqFrom(query,nearTree.contentMin,nearTree.contentMax) < maxPriority) {
+     stack.addLast(new SearchStackEntry<X>(true, nearTree));
+   }
+ }
-  // Swap near and far if they're incorrect
+====Searching a Leaf====
-  if (farD < nearD) {
+By the time we reach a leaf, there is little left to optimise away. We must go through each exemplar in turn to see if it is a nearest neighbour to our query point.
-    final double tmpD = nearD;
-    final BucketKdTree<T> tmpTree = nearTree;
-    final boolean tmpEmpty = nearEmpty;
-    nearD = farD;
+ private static <X extends Exemplar> void
-    nearTree = farTree;
+ searchLeaf(double[] query, OptKdTree<X> leaf, SearchState<X> state) {
-    nearEmpty = farEmpty;
+   double exD = Double.NaN;
+   for(X ex : leaf.data) {
+     if (!leaf.singularity || Double.isNaN(exD)) {
+       exD = distanceSqFrom(query, ex.domain);
+     }
+     if (examine(exD, state)) {
+       while (state.results.size() >= state.nResults) {
+         state.results.poll();
+       }
+       state.results.offer(new PrioNode<X>(exD, ex));
+     }
+   }
+ }
+ private static <X extends Exemplar> boolean
+ examine(double distance, SearchState<X> state) {
+   return state.results.size() < state.nResults
+     || distance < state.results.peek().priority;
+ }
-    farD = tmpD;
+===Evaluation===
-    farTree = tmpTree;
+From the same benchmark run, our optimised tree obtains the following results:
-    farEmpty = tmpEmpty;
-  }
-  // Add nearest sub-tree to stack later so we descend it
+ RESULT << k-nearest neighbours search with Voidious' Linear search >>
-  // first. This is likely to constrict our max distance
+ : Average searching time       = 0.5740 miliseconds
-  // sooner, resulting in less visited nodes
+ : Average worst searching time = 2.3482 miliseconds
-  if (!farEmpty && (farD <= searchState.maxDistance
+ : Average adding time          = 0.4528 microseconds
-                     || searchState.nResults < nMinResults))
+ : Accuracy                     = 100.0%
-  {
-    stack.addFirst(new SearchStackEntry<T>(farD, farTree));
+ [...]
-   }
+ RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
+ : Average searching time       = 0.0378 miliseconds
+ : Average worst searching time = 0.2625 miliseconds
+ : Average adding time          = 1.5687 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
+ : Average searching time       = 0.3939 miliseconds
+ : Average worst searching time = 11.5167 miliseconds
+ : Average adding time          = 0.5165 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with duyn's optimised kd-tree >>
+ : Average searching time       = 0.0536 miliseconds
+ : Average worst searching time = 1.6742 miliseconds
+ : Average adding time          = 2.9380 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ BEST RESULT:
+  - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
+ [...]
+  - #4 duyn's optimised kd-tree [0.0536]
+ [...]
+  - #8 duyn's basic kd-tree [0.3939]
+   - #9 Voidious' Linear search [0.5740]
+This is promising. We are a whole order of magnitude faster and finally in the same league as other ''k''d-tree implementations on this wiki. We have not changed our API, so the following shim is only for completeness:
-  if (!nearEmpty && (nearD <= searchState.maxDistance
+* DuynOptKNNSearch.java
-                     || searchState.nResults < nMinResults))
-  {
-    stack.addFirst(new SearchStackEntry<T>(nearD, nearTree));
-  }
-}
-</pre>
-===Searching a leaf===
+ public class DuynOptKNNSearch extends KNNImplementation {
-When we reach a leaf, we finally have a chance to contract our search distance. If we haven't yet found our k-potential nearest neighbours yet, or if the exemplar we're checking lies on the edge of our max search distance, we blindly add exemplars and update our max search distance.
+   final OptKdTree<StringExemplar> tree;
+   public DuynOptKNNSearch(final int dimension) {
+     super(dimension);
+     tree = new OptKdTree<StringExemplar>();
+   }
+   @Override public void
+   addPoint(final double[] location, final String value) {
+     tree.add(new StringExemplar(location, value));
+   }
+   @Override public String
+   getName() {
+     return "duyn's optimised kd-tree";
+   }
+   @Override
+   public KNNPoint[]
+   getNearestNeighbors(final double[] location, final int size) {
+     final List<KNNPoint> justPoints = new LinkedList<KNNPoint>();
+     for(PrioNode<StringExemplar> sr : tree.search(location, size)) {
+       final double distance = sr.priority;
+       final StringExemplar pt = sr.data;
+       justPoints.add(new KNNPoint(pt.value, distance));
+     }
+     final KNNPoint[] retVal = new KNNPoint[justPoints.size()];
+     return justPoints.toArray(retVal);
+   }
+   class StringExemplar extends Exemplar {
+     public final String value;
+     public StringExemplar(final double[] coords, final String value) {
+       super(coords);
+       this.value = value;
+     }
+   }
+ }
-Otherwise, we check the exemplar's distance from our query point. If it's closer than our furthest potential match, we drop our furthest result (unless doing so would result in too little k-potential neighbours) to make room for our new neighbour. Finally, we update our max search distance so we can skip any trees or points outside our new search distance.
+==A Better Results Heap==
+A TreeMap does unnecessary processing, keeping all its keys in order when all we really care about is the farthest tentative result we have found. By implementing a custom heap, we can squeeze even more performance out of our tree.
-If all the exemplars in a leaf are uniform, we don't need to do this check for every exemplar. We just do it for the first one, and either add or reject them all.
+In this section, we will implement an implicit binary heap. This structure does not have the best theoretical performance, but often does well in practice. More theoretically efficient heaps often rely on batching inserts, only fully processing them when a the top element is removed. Since we almost do one removal for every insert, this does not offer much benefit to us. For our usage pattern, the only heap faster than a binary heap is a buggy heap.
-<pre>
+===Implicit d-ary Heap===
-private static <T extends Exemplar> void
+Implicit binary heaps can be very memory efficient since all elements are stored in an array. They don't grow as easily as other heaps, but for our purposes that is not a problem&mdash;we can set a max on the number of results we are interested in. A good description of implicit heaps is provided by John Bentley in [http://www.akira.ruc.dk/~keld/teaching/algoritmedesign_f04/Artikler/02/ Programming Pearls: Thanks, Heaps].
-searchLeaf(double[] query, int nMinResults,
-  BucketKdTree<T> leaf, SearchState searchState,
-  NavigableMap<Double, List<T>> results)
-{
-  // Keep track of elements at max distance so we know
-  // whether we can just drop entire list of furthest
-  // exemplars
-  final int nMinResultsBeforeAddition = nMinResults - 1;
-  for(T ex : leaf.exemplars) {
-    final double exD = distanceSqFrom(query, ex.domain);
-    if (searchState.nResults < nMinResults
-      || exD == searchState.maxDistance)
-    {
-      // Blindly add this exemplar
-      List<T> exsAtD = getOrElseInit(results, exD);
-      if (leaf.exemplarsAreUniform) {
-        // No need to go through every one if all
-        // exemplars are the same
-        exsAtD.addAll(leaf.exemplars);
-        searchState.nResults += leaf.exemplars.size();
-      } else {
-        exsAtD.add(ex);
-        searchState.nResults++;
-      }
-      // Update information about furthest exemplars
+Below, we implement a d-ary heap. The math is similar to a binary heap, and there are some suggestions that a 4-way heap may be more efficient than a binary heap. Our heap can be used as a min- or a max-heap. Using the enum <code>Direction</code> ensures the user doesn't get this parameter wrong.
-      if (exD > searchState.maxDistance
-        || searchState.maxDistance == Double.POSITIVE_INFINITY)
-      {
-        searchState.maxDistance = exD;
-      }
-      if (searchState.maxDistance == exD)
+ // Final for performance, may be removed if sub-classing necessary
-        searchState.nExsAtMaxD = exsAtD.size();
+ public final class FastBinaryHeap<T> implements Iterable<PrioNode<T>> {
+   private final int branches;
+   // Direction = -1 or 1
+   // root * direction > element * direction
+   private final int direction;
+   public static enum Direction {
+     MIN(-1), MAX(1);
+     public final int value;
+     private Direction(int value) {
+       this.value = value;
+     }
+   }
+   public static final Direction MAX = Direction.MAX, MIN = Direction.MIN;
+   private final PrioNode<T>[] data;
+   private int size = 0;
+   private static final int DEFAULT_N_CHILDREN = 4;
+   public FastBinaryHeap(int capacity, Direction direction) {
+     this(capacity, DEFAULT_N_CHILDREN, direction);
+   }
+   @SuppressWarnings("unchecked")
+   public FastBinaryHeap(int capacity, int branches, Direction dir) {
+     this.branches = branches;
+     // This implicit conversion is safe because we only refer to
+     // this array through our data member.
+     this.data = new PrioNode[capacity];
+     this.direction = dir.value;
+   }
+ }
-    } else if (exD < searchState.maxDistance) {
+We implement an interface similar to the standard PriorityQueue:
-      // Point closer than furthest neighbour
-      if (searchState.nResults - searchState.nExsAtMaxD
+ public Iterator<PrioNode<T>>
-        >= nMinResultsBeforeAddition)
+ iterator() {
-      {
+   return new Iterator<PrioNode<T>>() {
-        // Dropping furthest exemplars won't leave
-        // us with too little to meet return
+     int idx = 0;
-        // minimum
-        results.remove(searchState.maxDistance);
+     public boolean
-        searchState.nResults -= searchState.nExsAtMaxD;
+     hasNext() { return idx < size; }
-      }
+     public PrioNode<T>
+     next() { return data[idx++]; }
+     public void
+     remove() {
+       throw new UnsupportedOperationException("Not supported.");
+     }
+   };
+ }
+ public int
+ size() { return size; }
+ public PrioNode<T>
+ peek() { return data[0]; }
+Our offer() method takes a priority separately from the data we want to associate with it. We will store the priority separately so we don't fall into the same performance trap as PriorityQueue. We do not implement a poll() method because we don't need it. If we are about to overflow, we will just replace our top element with the new one.
-      // Add new exemplar
+ public boolean
-      List<T> exsAtD = getOrElseInit(results, exD);
+ offer(double priority, T result) {
-      if (leaf.exemplarsAreUniform) {
+   if (size < data.length) {
-        // No need to go through every one if all
+     // Trivial case
-        // exemplars are the same
+     siftUp(size, new PrioNode<T>(priority, result));
-        exsAtD.addAll(leaf.exemplars);
+     size++;
-        searchState.nResults += leaf.exemplars.size();
+     return true;
-      } else {
+   }
-        exsAtD.add(ex);
-        searchState.nResults++;
+   if (priority*direction > data[0].priority*direction) {
-      }
+     System.err.printf("Do not want %f when have %f\n",
+       priority, data[0].priority);
+     return false;
+   }
+   siftDown(0, new PrioNode<T>(priority, result));
+   return true;
+ }
-      // Update information about furthest exemplars
+Now to the meat of our heap.
-      Map.Entry<Double, List<T>> lastEnt = results.lastEntry();
-      searchState.maxDistance = lastEnt.getKey();
-      searchState.nExsAtMaxD = lastEnt.getValue().size();
-    } // exD < maxDistance
-    // No need to keep going if all exemplars are
+* siftUp treats the given index as empty and progressively movies parents into the hole left behind if inserting the given element there would violate heap order.
-    // the same
+: Parents are calculated as follows:
-    if (leaf.exemplarsAreUniform) break;
+: <math>parent[i] = \left\lfloor {i - 1\over branches} \right\rfloor</math>
-  } // for(T ex : cur.exemplars)
-}
-private static <T extends Exemplar> List<T>
+: Once we have that, we can compare the values at both indexes. Multiplying them by <code>direction</code> lets us use the same code for both a min- and a max-heap.
-getOrElseInit(Map<Double, List<T>> results, double dst) {
-  List<T> lst = results.get(dst);
+ private void
-  if (lst == null) {
+ siftUp(int hole, PrioNode<T> value) {
-    lst = new LinkedList<T>();
+   if (size > 0) {
-    results.put(dst, lst);
+     final double vPD = value.priority*direction;
-  }
+     while (hole > 0) {
-  return lst;
+       final int parent = (hole - 1)/branches;
-}
+       if (vPD > data[parent].priority*direction) {
-</pre>
+         data[hole] = data[parent];
+         hole = parent;
+       } else {
+         break;
+       }
+     }
+   }
+   data[hole] = value;
+ }
-==Assessment==
+* siftDown does the opposite, starting from the given index and moving elements up if inserting the new value into the hole would violate heap order. It swaps the element with its worst heap-offending child so that with a single swap, it can restore heap order to that level.
-Initial performance benchmarks are promising. Randomised JUnit tests indicate (not very rigorously) that searching can be 2&ndash;7x faster than a linear search. Using the [[k-NN algorithm benchmark]] with the [http://homepages.ucalgary.ca/~agschult/robocode/gun-data-Diamond-vs-jk.mini.CunobelinDC%200.3.csv.gz Diamond vs CunobelinDC gun data] yields the following performance:
+: In a 0-based index, children are calculated as follows:
+: <math>\begin{align}child_{1}[i] & =branches\times i+1\\
+\ldots\\
+child_{branches}[i] & =branches\times i+branches\end{align}</math>
-<pre>
+ private void
-K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
+ siftDown(int hole, PrioNode<T> value) {
------------------------------------------
+   if (size > 0) {
-Reading data from gun-data-Diamond-vs-jk.mini.CunobelinDC 0.3.csv.gz
+     // Push down along min path
-Read 25621 saves and 10300 searches.
+     final double vPD = value.priority*direction;
+     int firstChild;
+     while ((firstChild = branches*hole + 1) < data.length) {
+       // Find largest/smallest child
+       double swapPD = data[firstChild].priority*direction;
+       int swapChild = firstChild;
+       for(int c = firstChild + 1;
+         c < firstChild + branches && c < size; c++)
+       {
+         final double cPD = data[c].priority*direction;
+         if (cPD > swapPD) {
+           swapPD = cPD;
+           swapChild = c;
+         }
+       }
+       if (swapPD > vPD) {
+         data[hole] = data[swapChild];
+         hole = swapChild;
+       } else {
+         break;
+       }
+     }
+   }
+   data[hole] = value;
+ }
-Running 30 repetition(s) for k-nearest neighbours searching:
+Both of these are slightly different from standard textbook presentations of the algorithm. The versions here do less swaps by shifting other elements out of the way until the new element's rightful place has been found.
-:: 13 dimension(s); 40 neighbour(s)
-Warming up the JIT with 5 repetitions first...
-Running tests... COMPLETED.
+===Changes to our Tree===
+Because the only change we have made is to the method of collecting results, we will skim over the changes necessary.
-RESULT << k-nearest neighbours search with Voidious' Linear search >>
+ public final class FastKdTree<X extends Exemplar> {
-: Average searching time       = 0.57 miliseconds
+ ....
-: Average worst searching time = 14.992 miliseconds
+ }
-: Average adding time          = 0.55 microseconds
-: Accuracy                     = 100%
-RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
+* Our SearchState will have to be updated to use our new heap:
-: Average searching time       = 0.16 miliseconds
-: Average worst searching time = 20.109 miliseconds
-: Average adding time          = 1.41 microseconds
-: Accuracy                     = 100%
-RESULT << k-nearest neighbours search with Nat's Bucket PR k-d tree >>
+ //
-: Average searching time       = 0.361 miliseconds
+ // class SearchState
-: Average worst searching time = 304.663 miliseconds
+ //
-: Average adding time          = 66.64 microseconds
+ // Holds data about current state of the search. Used for live updating
-: Accuracy                     = 31%
+ // of pruning distance.
+ private static class SearchState<X extends Exemplar> {
+   final int nResults;
+   final FastBinaryHeap<X> results;
+   public SearchState(int nResults) {
+     this.nResults = nResults;
+     results = new FastBinaryHeap<X>(
+       nResults, 4, FastBinaryHeap.MAX);
+   }
+ }
-RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
+* Our searchLeaf() method no longer needs to explicitly poll the top since our heap will just replace it if it has no space:
-: Average searching time       = 0.216 miliseconds
-: Average worst searching time = 100.556 miliseconds
-: Average adding time          = 1.58 microseconds
-: Accuracy                     = 100%
-RESULT << k-nearest neighbours search with duyn's Bucket kd-tree >>
+ private static <X extends Exemplar> void
-: Average searching time       = 0.079 miliseconds
+ searchLeaf(double[] query, FastKdTree<X> leaf, SearchState<X> state) {
-: Average worst searching time = 17.476 miliseconds
+   double exD = Double.NaN;
-: Average adding time          = 2.45 microseconds
+   for(X ex : leaf.data) {
-: Accuracy                     = 100%
+     if (!leaf.singularity || Double.isNaN(exD)) {
+       exD = distanceSqFrom(query, ex.domain);
+     }
+     if (examine(exD, state)) {
+       state.results.offer(exD, ex);
+     }
+   }
+ }
-RESULT << k-nearest neighbours search with Rednaxela's Bucket kd-tree >>
+* We have also replaced all instances of LinkedLists with ArrayDeques. This is because although theoretically LinkedLists are appropriate, the javadocs for ArrayDeque state that:
-: Average searching time       = 0.053 miliseconds
+: "This class is likely to be faster than Stack when used as a stack, and faster than LinkedList  when used as a queue."&mdash;[http://java.sun.com/javase/6/docs/api/java/util/ArrayDeque.html Java 6 API docs]
-: Average worst searching time = 0.296 miliseconds
-: Average adding time          = 1.69 microseconds
-: Accuracy                     = 100%
+ public FastKdTree(int bucketSize) {
+   this.bucketSize = bucketSize;
+   this.data = new ArrayDeque<X>();
+ }
+ @SuppressWarnings("unchecked") private static <X extends Exemplar> void
+ split(FastKdTree<X> tree) {
+   ...
+   final Queue<X> leftExs = new ArrayDeque<X>();
+   final Queue<X> rightExs = new ArrayDeque<X>();
+   ...
+ }
+ private static <X extends Exemplar> Iterable<PrioNode<X>>
+ search(FastKdTree<X> tree, double[] query, int nResults) {
+   final SearchState<X> state = new SearchState<X>(nResults);
+   final Deque<SearchStackEntry<X>> stack =
+     new ArrayDeque<SearchStackEntry<X>>();
+   ...
+ }
-BEST RESULT:
+===Evaluation===
- - #1 Rednaxela's Bucket kd-tree [0.0529]
+Our improved heap squeezes a bit more performance (~10%) out of our tree:
- - #2 duyn's Bucket kd-tree [0.0787]
- - #3 Simonton's Bucket PR k-d tree [0.1598]
- - #4 Voidious' Bucket PR k-d tree [0.2165]
- - #5 Nat's Bucket PR k-d tree [0.3605]
- - #6 Voidious' Linear search [0.5699]
-Benchmark running time: 545.43 seconds
+ RESULT << k-nearest neighbours search with Voidious' Linear search >>
-</pre>
+ : Average searching time       = 0.5740 miliseconds
+ : Average worst searching time = 2.3482 miliseconds
+ : Average adding time          = 0.4528 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
+ : Average searching time       = 0.0378 miliseconds
+ : Average worst searching time = 0.2625 miliseconds
+ : Average adding time          = 1.5687 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
+ : Average searching time       = 0.3939 miliseconds
+ : Average worst searching time = 11.5167 miliseconds
+ : Average adding time          = 0.5165 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with duyn's optimised kd-tree >>
+ : Average searching time       = 0.0536 miliseconds
+ : Average worst searching time = 1.6742 miliseconds
+ : Average adding time          = 2.9380 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with duyn's fast kd-tree >>
+ : Average searching time       = 0.0479 miliseconds
+ : Average worst searching time = 25.0995 miliseconds
+ : Average adding time          = 2.1288 microseconds
+ : Accuracy                     = 100.0%
+ [...]
+ BEST RESULT:
+  - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
+ [...]
+  - #3 duyn's fast kd-tree [0.0479]
+  - #4 duyn's optimised kd-tree [0.0536]
+ [...]
+  - #8 duyn's basic kd-tree [0.3939]
+  - #9 Voidious' Linear search [0.5740]
-This is not an exhaustive test, but it shows we're off to a nice start. The following code was used to run the benchmark:
+Our client code has barely changed:
-* KNNBenchmark.java
+* DuynFastKNNSearch.java
-  <pre>
- public KNNBenchmark(final int dimension, final int numNeighbours, final SampleData[] samples, final int numReps) {
+ public class DuynFastKNNSearch extends KNNImplementation {
-   final Class<?>[] searchAlgorithms = new Class<?>[] {
+   final FastKdTree<StringExemplar> tree;
-      FlatKNNSearch.class,
+   public DuynFastKNNSearch(final int dimension) {
-     SimontonTreeKNNSearch.class,
+     super(dimension);
-      NatTreeKNNSearch.class,
+     tree = new FastKdTree<StringExemplar>();
-      VoidiousTreeKNNSearch.class,
+   }
-+    DuynTreeImpl.class,
-     RednaxelaTreeKNNSearch.class,
+   @Override public void
-   };
+   addPoint(final double[] location, final String value) {
-    ...
+     tree.add(new StringExemplar(location, value));
+   }
+   @Override public String
+   getName() {
+     return "duyn's fast kd-tree";
+   }
+   @Override public KNNPoint[]
+   getNearestNeighbors(final double[] location, final int size) {
+     final List<KNNPoint> justPoints = new LinkedList<KNNPoint>();
+      for(PrioNode<StringExemplar> sr : tree.search(location, size)) {
+       final double distance = sr.priority;
+       final StringExemplar pt = sr.data;
+       justPoints.add(new KNNPoint(pt.value, distance));
+     }
+      final KNNPoint[] retVal = new KNNPoint[justPoints.size()];
+      return justPoints.toArray(retVal);
+   }
+    class StringExemplar extends Exemplar {
+     public final String value;
+     public StringExemplar(final double[] coords, final String value) {
+       super(coords);
+       this.value = value;
+     }
+    }
   }
-  </pre>
-* DuynTreeImpl.java
+==Full Benchmark Results==
-  <pre>
+Below are the full benchmark results quoted throughout this page. It puts the performance of our trees in context compared to other ''k''d-trees available on this wiki. The benchmark ([[k-NN algorithm benchmark]]) was run with the [http://homepages.ucalgary.ca/~agschult/robocode/gun-data-Diamond-vs-jk.mini.CunobelinDC%200.3.csv.gz Diamond vs CunobelinDC gun data].
-package net.robowiki.knn.implementations;
-import net.robowiki.knn.util.KNNPoint;
-import java.util.*;
-// Namespace I use for my kd-tree
-import dn.j1.algorithm.nearestneighbours.*;
-public class DuynTreeImpl extends KNNImplementation {
+ K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
-   final BucketKdTree<KNNPointAdapter> tree;
+ -----------------------------------------
-  public DuynTreeImpl(final int dimension) {
+ Reading data from gun-data-Diamond-vs-jk.mini.CunobelinDC 0.3.csv.gz
-    super(dimension);
+ Read 25621 saves and 10300 searches.
-    tree = new BucketKdTree<KNNPointAdapter>(10, dimension);
-   }
+ Running 30 repetition(s) for k-nearest neighbours searching:
+ :: 13 dimension(s); 40 neighbour(s)
+ Warming up the JIT with 5 repetitions first...
+ Running tests...test 1/30
+ [...]
+   COMPLETED.
+ RESULT << k-nearest neighbours search with Voidious' Linear search >>
+ : Average searching time       = 0.5740 miliseconds
+ : Average worst searching time = 2.3482 miliseconds
+ : Average adding time          = 0.4528 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
+ : Average searching time       = 0.1609 miliseconds
+ : Average worst searching time = 1.4184 miliseconds
+ : Average adding time          = 1.3322 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
+ : Average searching time       = 0.2081 miliseconds
+ : Average worst searching time = 21.9191 miliseconds
+ : Average adding time          = 1.0301 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with Rednaxela's kd-tree (2nd gen) >>
+ : Average searching time       = 0.0378 miliseconds
+ : Average worst searching time = 0.2625 miliseconds
+ : Average adding time          = 1.5687 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with Rednaxela's kd-tree (3rd gen) >>
+ : Average searching time       = 0.0407 miliseconds
+ : Average worst searching time = 0.2259 miliseconds
+ : Average adding time          = 1.6342 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with duyn's basic kd-tree >>
+ : Average searching time       = 0.3939 miliseconds
+ : Average worst searching time = 11.5167 miliseconds
+ : Average adding time          = 0.5165 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with duyn's optimised kd-tree >>
+ : Average searching time       = 0.0536 miliseconds
+ : Average worst searching time = 1.6742 miliseconds
+ : Average adding time          = 2.9380 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with duyn's fast kd-tree >>
+ : Average searching time       = 0.0479 miliseconds
+ : Average worst searching time = 25.0995 miliseconds
+ : Average adding time          = 2.1288 microseconds
+ : Accuracy                     = 100.0%
+ RESULT << k-nearest neighbours search with Chase-san's kd-tree (KDTreeC) >>
+ : Average searching time       = 0.1200 miliseconds
+ : Average worst searching time = 19.5727 miliseconds
+ : Average adding time          = 1.4222 microseconds
+ : Accuracy                     = 100.0%
+ BEST RESULT:
+  - #1 Rednaxela's kd-tree (2nd gen) [0.0378]
+  - #2 Rednaxela's kd-tree (3rd gen) [0.0407]
+  - #3 duyn's fast kd-tree [0.0479]
+  - #4 duyn's optimised kd-tree [0.0536]
+  - #5 Chase-san's kd-tree (KDTreeC) [0.1200]
+  - #6 Simonton's Bucket PR k-d tree [0.1609]
+  - #7 Voidious' Bucket PR k-d tree [0.2081]
+  - #8 duyn's basic kd-tree [0.3939]
+   - #9 Voidious' Linear search [0.5740]
+ Benchmark running time: 587.77 seconds
-  @Override public void
+==Full Source Code==
-  addPoint(final double[] location, final String value) {
+The full source code for all three trees presented in this tutorial can be downloaded together: [[File:duyn_tutorial_kd_trees.jar]]. The code is available under the [[RWPCL]].
-    tree.add(new KNNPointAdapter(location, value));
-  }
-  @Override public String
-  getName() {
-    return "duyn's Bucket kd-tree";
-  }
-  @Override public KNNPoint[]
-  getNearestNeighbors(final double[] location, final int size) {
-    final SortedMap<Double, List<KNNPointAdapter>> results = tree.search(location, size);
-    final List<KNNPoint> justPoints = new LinkedList<KNNPoint>();
-    for(final Map.Entry<Double, List<KNNPointAdapter>> entry : results.entrySet()) {
-      final double distance = entry.getKey();
-      for(final KNNPointAdapter pt : entry.getValue()) {
-        justPoints.add(new KNNPoint(pt.value, distance));
-      }
-    }
-    final KNNPoint[] retVal = new KNNPoint[justPoints.size()];
-    return justPoints.toArray(retVal);
-  }
-  class KNNPointAdapter extends Exemplar {
-    public final String value;
-    public KNNPointAdapter(final double[] coords, final String value) {
-      super(coords);
-      this.value = value;
-    }
-  }
-}
-  </pre>
 ==Extensions==
-This tree does not include any support for deletion or re-balancing because it was written to be used with standardised data. This would require periodic re-normalisation and thus re-builds, making deletion/re-balancing unnecessary.
+This tree does not include any support for deletion or re-balancing. You can get away with this if you plan to re-build the entire tree regularly.
 '''Deletion''' support would be easy to add to this tree. Since all exemplars are stored in leaves at the bottom, we need only walk the tree and remove the exemplar from the leaf containing it. We would then merge sub-trees if multiple deletions has left one of them empty. Unless our bucket width is really small, we don't need to merge sub-trees after each deletion since the addition of a subsequent exemplar is unlikely to result in choosing a significantly different split point.
@@ Line 688: / Line 1,226: @@
 '''Re-balancing''' with this tree is trickier. It does not currently store depth, so we don't know when a re-balance is needed. Because exemplars are only stored in leaves, collecting all the exemplars under a sub-tree would require exhaustively walking that sub-tree.
 * If a depth is to be stored, each tree would need a pointer to its parent tree so we propogate depth updates when a leaf is split.
 * We can avoid an exhaustive walk by storing each exemplar in every tree on its path from root to leaf. We then can re-balance easily by dropping both sub-trees and splitting the given tree anew. This approach has the down side of making inserts and deletes more complex since we have to update the exemplar list of every tree along the path from root to leaf.
-==Full Source Code==
-For the full source code to the tree built in this tutorial, see [[User:Duyn/BucketKdTree|duyn's Bucket kd-tree]].
 ==See also==
-* [[Kd-tree]]. For further information on kd-trees.
+* [[Kd-tree]] for further information on kd-trees.
-* [http://www.autonlab.org/autonweb/14665 Andrew Moore, 'An intoductory tutorial on kd']. A good introduction to kd-trees.
+* Andrew Moore, ''[http://www.autonlab.org/autonweb/14665 An intoductory tutorial on kd]''. A good introduction to kd-trees.
-* [http://ilpubs.stanford.edu:8090/723/ Neal Sample, Matthew Haines, Mark Arnold, Timothy Purcell, 'Optimizing Search Strategies in k-d Trees']. Details some of the optimisations used in this tree.
+* Neal Sample, Matthew Haines, Mark Arnold, Timothy Purcell, ''[http://ilpubs.stanford.edu:8090/723/ Optimizing Search Strategies in k-d Trees]''. Details some of the optimisations used in this tree.