Unit tests for KdTree
Thanks for your quick reply!
Good to know that the bounds don't affect the correctness. I agree, then it is a performance trade-off to adjust them or not.
You only really need to perform "merge siblings" when ALL points from the node are removed, isn't it?
Indeed, the size of the node itself must be adjusted as well:
KdNode<T> parent = cursor.parent; do { parent.size--; parent = cursor.parent; } while (parent != null);
Also ArrayUtil.remove is my own method:
/** * Remove the element from the array at location i. * @param as original array. * @param i index indicating the element to be removed. * @return new array with length (as.length-1) if a is contained in as, or else a copy of the original array. */ public static int[] remove(int[] as, int i){ int n = as.length; if(i < 0 || i >= n) { return Arrays.copyOf(as, n); } int[] copy = new int[n-1]; System.arraycopy(as,0,copy,0,i); System.arraycopy(as,i+1,copy,i,n-1-i); return copy; } public static int[] remove(int[] as, int i){ int n = as.length; if(i < 0 || i >= n) { return copyOf(as, n); } int[] copy = new int[n-1]; System.arraycopy(as,0,copy,0,i); System.arraycopy(as,i+1,copy,i,n-1-i); return copy; }
Removal is O(n), but you cant do much better. There is also a variant for double[] and double[][].
The bounds indeed really need to determine the min and max by iterating over all points.
Back to the keyboard...
Well, to be clear the bounds affect the correctness if they are *too small*, but the bounds being too big (as is the case after removing) has no impact.
Really, you don't have to "merge siblings" when all points are removed from a node even, and it could (for sake of tree balance) be worth merging sibling nodes before they're completely empty (i.e. if both nodes had half their points removed)
With regards to the removing from the array, actually you can do it in O(1) time due to the fact that 1) unused array elements are expected, and 2) the array of points in a leaf node are unordered. You just need to overwrite the removed element with the last element, and then remove the last element:
points[i] = points[size-1];
data[i] = data[size-1];
points[size-1] = null;
data[size-1] = null;
size--;
(Oh, and when removing elements, be careful with how you're looping over them at the same time. You need to decrement "i" when the remove occurs, because what points[i] refers to changes when you remove the element)
If you wanted to take a look at Gilgalad, I think I have this working correctly in my tree. However, the feature is never used since it is faster to just have one large bucket unless the size limit is greater than about 10000 or so, and if I used that many points I would probably just use them all.