Thread history
Viewing a history listing
Time | User | Activity | Comment |
---|---|---|---|
00:05, 6 May 2012 | Ojd (talk | contribs) | New thread created | |
13:59, 7 May 2012 | Nat (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
17:31, 8 May 2012 | Rednaxela (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
16:02, 22 May 2012 | Ojd (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
22:21, 27 May 2012 | Arie (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
00:21, 28 May 2012 | Rednaxela (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
00:24, 28 May 2012 | Rednaxela (talk | contribs) | Comment text edited | |
17:20, 28 May 2012 | Arie (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
05:20, 29 May 2012 | Rednaxela (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
05:23, 29 May 2012 | Rednaxela (talk | contribs) | Comment text edited | (additional notes) |
14:02, 29 May 2012 | AW (talk | contribs) | New reply created | (Reply to Unit tests for KdTree) |
Hello Rednaxela,
I just converted your KD-tree Java code to C#. Do you have unit tests for KD-Tree? They would allow me to test if the conversion succeded.
I don't know if you could modified it to run the C# implementation, but K-NN algorithm benchmark has an accuracy checker with linear search, as with performance benchmark. You can give it a try.
Yeah, I didn't use any unit tests when writing the KD-Tree, just some functional tests. The most thorough of which is from the K-NN algorithm benchmark that Nat and I wrote. It repeatedly tests the KD-Trees with a very large set of random set of points against known-good results from a very very simple linear-time k-nn search.
I just published my C# implementation of a KD-Tree based on Rednaxela's java implementation here on robowiki.
Hi Rednaxela & Ojd,
I'd like to have remove functionality and added the method below to KdNode. It is based on Ojd's C# variant and adjusts. KdNode needs and extra field: parent. It seems to do what I want, but feel not confident about some points:
- do the bounds of parent nodes to be adjusted as well?
- how to add it nicely to the NearestNeighborIterator?
- what if you want to re-weight after n removals? (remove remainder and add again?)
Thanks, Arie
public boolean remove(double[] point, T value) { KdNode<T> cursor = this; while (!cursor.isLeaf()) { if (point[cursor.splitDimension] > cursor.splitValue) { cursor = cursor.left; } else { cursor = cursor.right; } } for (int i = 0; i < cursor.size; i++) { if (cursor.data[i].equals(value)) { double[] cursorPoint = cursor.points[i]; // remove this points cursor.data = ArrayUtil.remove(cursor.data, i); cursor.points = ArrayUtil.remove(cursor.points, i); // fix tree size all the way up to the root KdNode<T> parent = cursor.parent; do { parent.size--; parent = cursor.parent; } while (parent != null); // adjust _minBound and _maxBound after removing an item. if(cursor.points.length > 0){ for (int j = 0; j < dimensions; j++) { if (Double.isNaN(cursorPoint[j])) { if (!Double.isNaN(minBound[j]) || !Double.isNaN(maxBound[j])) { singlePoint = false; } minBound[j] = Double.NaN; maxBound[j] = Double.NaN; } else if (minBound[j] > cursorPoint[j]) { minBound[j] = cursorPoint[i]; singlePoint = false; } else if (maxBound[j] < cursorPoint[i]) { maxBound[j] = cursorPoint[i]; singlePoint = false; } } } else{ // last point at this cursor was removed. minBound = null; maxBound = null; } return true; } } return false; }
To answer your questions:
- Actually, technically you don't need to adjust ANY bounds at all. It won't affect the correctness of the result. The only consequence is that the searches may be slightly slower than optimal. If you are adjusting the bounds though, you might as well adjust the parent ones as well, because that's where a large chunk of the (smallish) performance impact would be.
- On one hand adjusting the bounds makes the removal slower, but on the other hand it makes the search a little faster. To really know if it's worth adjusting the bounds, would probably require benchmarking with some code that's making realistic use of the removal.
- To add it nicely to the iterator, I think you'd need to have the iterator keep track of the last point it returned, so it can call a remove on that.
- If I understand what you mean... You should note that the structure of the tree is such that you can't just remove a leaf node, because every node must either be a leaf node or have two child nodes. To prune back the tree after removing items, you'd need to implement a "merge siblings" algorithm. This algorithm would need to first check that the combined size of both would fit comfortably in one node, and if that's the case basically perform the reverse of what "splitLeafNode" does. It'll also need to take into account the possibility that one sibling may not be a leaf node while the other is.
Some other notes:
- You don't appear to be changing the "size" of the node itself, just the parents, which would be a problem. I also don't know that "ArrayUtil.remove" is what you want there.
- Note that the "data" and "points" arrays have extra capacity beyond what's used and that "size" (or "_count" in the case of Ojd's C# variant) tracks the number of elements actually used.
- (I also can't seem to find any documentation of a "ArrayUtil.remove" in either Java or C# so I'm a bit confused)
- The code to adjust the bounds there appears to be incorrect. It looks like what's happening in that code is extending the bounds to contain the removed point... and the bounds already contain that anyway.
Thanks for your quick reply!
Good to know that the bounds don't affect the correctness. I agree, then it is a performance trade-off to adjust them or not.
You only really need to perform "merge siblings" when ALL points from the node are removed, isn't it?
Indeed, the size of the node itself must be adjusted as well:
KdNode<T> parent = cursor.parent; do { parent.size--; parent = cursor.parent; } while (parent != null);
Also ArrayUtil.remove is my own method:
/** * Remove the element from the array at location i. * @param as original array. * @param i index indicating the element to be removed. * @return new array with length (as.length-1) if a is contained in as, or else a copy of the original array. */ public static int[] remove(int[] as, int i){ int n = as.length; if(i < 0 || i >= n) { return Arrays.copyOf(as, n); } int[] copy = new int[n-1]; System.arraycopy(as,0,copy,0,i); System.arraycopy(as,i+1,copy,i,n-1-i); return copy; } public static int[] remove(int[] as, int i){ int n = as.length; if(i < 0 || i >= n) { return copyOf(as, n); } int[] copy = new int[n-1]; System.arraycopy(as,0,copy,0,i); System.arraycopy(as,i+1,copy,i,n-1-i); return copy; }
Removal is O(n), but you cant do much better. There is also a variant for double[] and double[][].
The bounds indeed really need to determine the min and max by iterating over all points.
Back to the keyboard...
Well, to be clear the bounds affect the correctness if they are *too small*, but the bounds being too big (as is the case after removing) has no impact.
Really, you don't have to "merge siblings" when all points are removed from a node even, and it could (for sake of tree balance) be worth merging sibling nodes before they're completely empty (i.e. if both nodes had half their points removed)
With regards to the removing from the array, actually you can do it in O(1) time due to the fact that 1) unused array elements are expected, and 2) the array of points in a leaf node are unordered. You just need to overwrite the removed element with the last element, and then remove the last element:
points[i] = points[size-1];
data[i] = data[size-1];
points[size-1] = null;
data[size-1] = null;
size--;
(Oh, and when removing elements, be careful with how you're looping over them at the same time. You need to decrement "i" when the remove occurs, because what points[i] refers to changes when you remove the element)
If you wanted to take a look at Gilgalad, I think I have this working correctly in my tree. However, the feature is never used since it is faster to just have one large bucket unless the size limit is greater than about 10000 or so, and if I used that many points I would probably just use them all.