User talk:Rednaxela/kD-Tree

From Robowiki
< User talk:Rednaxela
Revision as of 21:23, 26 August 2009 by Rednaxela (talk | contribs) (reply to Skilgannon)
Jump to navigation Jump to search

So this mean I can use it int close-sourced robot? » Nat | Talk » 13:24, 25 August 2009 (UTC)

Yep, so long as you don't misrepresent it's origin (i.e. claim you wrote the kd-tree in use) :) --Rednaxela 13:29, 25 August 2009 (UTC)
You used to say that it will be licensed under RWPCL and CC-BY, but I'm more happy with this though, for that I can release the code under more tricky way than usual in-jar method =) » Nat | Talk » 14:25, 25 August 2009 (UTC)
I decided I wanted to avoid as much complication as possible, and I found the zlib licence. Since I don't mind how people use it so long as they don't make false claims to have written it, and it doesn't conflict with the RWPCL (i.e. can be used in RWPCL bots), it seemed like a good match. I removed one clause that I deemed unnecessary. It seems reasonable to me :) --Rednaxela 14:46, 25 August 2009 (UTC)

I just tried this tree, and I'm getting problems where every now and again an Entry will have a null value in it. I've got workaround code for it, but I know it shouldn't be doing that, and I'm not sure why it is. Also, DrussGT is still skipping a LOT of turns with this tree, so I'll have to find a dimension or 3 to cull... I could even run 2 trees with 8 dimensions in less time than my 1 tree with 11. Oh yes, I changed the 2 distance functions to be Manhattan distance, but that shouldn't affect things too much. --Skilgannon 18:21, 26 August 2009 (UTC)

Huh... that seems odd. I don't see why null entries could ever happen, I'll look into it when I get home. As far as DrussGT skipping a lot of turns still, I assume you don't mean any turns than normal do you? And yeah, even a few dimensions will make a huge difference with any kd-tree. --Rednaxela 18:33, 26 August 2009 (UTC)
Yeah, I know it passed all the benchmarks etc. so I'm not sure what's going on. Maybe look in DrussGT 1.3.10wilo for the version I'm using if you want to test it. Lines 614 to 618 of DrussGunDC.java are where my workaround is (basically just removing all Entries with null values from the cluster). If I take that out it starts throwing errors every few rounds. (Note: just to prevent any confusion, due to me adding the first workaround - just skipping null values in the loop - it now throws the errors down on line 663 when it sorts the Indices instead of in the loop when trying to access the data inside the null value). Thanks for any light you can shed on this =) --Skilgannon 18:55, 26 August 2009 (UTC)
Aha! I haven't tested it yet, but this should fix the nulls. :) --Rednaxela 19:23, 26 August 2009 (UTC)

You cannot post new threads to this discussion page because it has been protected from new threads, or you do not currently have permission to edit.

Contents

Thread titleRepliesLast modified
Interval Heap vs MinHeap620:54, 20 July 2013
removeOld() not working? Or locationCount not decrementing after removeOld()?2019:52, 1 May 2013
Null Pointer Exception - 3rd Gen Tree516:30, 26 March 2013
Unit tests for KdTree815:02, 29 May 2012

Interval Heap vs MinHeap

I'm curious, why did you go with the MinHeap instead of the IntervalHeap for your path ordering? Surely IntervalHeap can be kept pruned and small, resulting in less overhead?

Skilgannon (talk)08:38, 20 July 2013

I hadn't actually thought to try that. The IntervalHeap is something I added specifically when I was thinking about implementing to search iterator, rather than optimizing the search steps themselves.

I just tried it right now. Turns out that the more complex procedure to maintain the IntervalHeap's ordering adds more cost than is saved by keeping the path ordering queue pruned, at least for this size of tree in the 'standard' data set.

I tried two seperate approaches of using the IntervalHeap to keep that queue pruned:

  1. Remove any paths which are now known to be too distant, whenever we finish processing a leaf node (aka, always keep it pruned as much as possible)
  2. When we're adding a new path to the queue, if the furthest path in the queue is too distant, replace it to avoid growing the heap (aka, don't actively prune it, merely don't grow it if we can avoid it)

Both approaches performed worse overall than the unpruned MinHeap. Even if I do a run with JIT warmup allowed to compensate for the IntervalHeap's more complex code, it still performs slightly worse on average. When the JIT warmup is allowed however, the "worst case" time appears to improve though, just not the average.

Rednaxela (talk)15:45, 20 July 2013

How about keeping a flag in your leaf node processing so that you only try pruning if you added a new item?

Skilgannon (talk)15:48, 20 July 2013

Just tried that. The pruning is still not paying off compared to the extra overhead of the IntervalHeap.

Rednaxela (talk)16:05, 20 July 2013

And how about if you just use the IntervalHeap, but don't try any pruning? I'm curious whether the pruning is slow, or the different heap.

Skilgannon (talk)16:26, 20 July 2013

I was thinking it was the different heap... but actually turns out the heaps are approximately the same performance, with the difference being the pruning:

 - #1 Rednaxela's kd-tree (3rd gen, Interval Heap) [0.0290]
 - #2 Rednaxela's kd-tree (3rd gen) [0.0290]
 - #3 Rednaxela's kd-tree (3rd gen, Interval Heap, Prune When Points Added) [0.0293]
 - #4 Rednaxela's kd-tree (3rd gen, Interval Heap, Avoid Growing Heap When Possible) [0.0294]
 - #5 Skilgannon's Cache-hit KDTree [0.0296]
Rednaxela (talk)17:26, 20 July 2013
 
 
 
 
 

removeOld() not working? Or locationCount not decrementing after removeOld()?

I've added this KD-tree to my bot as I do a ground-up rewrite of it. I'm not even to the point of extracting k.n.n. data from the tree yet, I was just in the process of testing dropping data into the tree, and noticed something odd. The tree isn't pruning when locationCount exceeds sizeLimit. Or if it is, then it's not being reflected in the value of locationCount.

I've set sizeLimit in the tree instantiation to a mere 35 as a test, and by the end of a 35 round match against Walls, the tree reports containing nearly 2000 nodes. After reading through the code, I don't see a lower bound on pruning the tree other than bucketSize, which is only 24 anyway. I've added a debug print message, and removeOld() is definitely being called, but is returning before the "weird" comment at the end of that method....

I'm sure I'm missing something... but what? Should I presume this is working as intended, despite the reported size() growing without bound?

Tkiesel00:38, 23 November 2011

Okay. A bit of testing with grabbing knn's via nearestNeighbor() shows that even if I request more knn points than. sizeLimit, the function only returns <= sizeLimit points. Excellent! So, is it problematic that locationCount (and thus the output of size() )is going to head without limit for Integer.MAX_VALUE and never gets decremented to reflect the current actual size of the tree? The former isn't much of a concern, RC matches are never that long.. but the latter?

Tkiesel00:59, 23 November 2011
 

I found the error. In method RemoveOld, the loop condition of the decrementing loop should be

   cursor != null    // OK

instead of

   cursor.parent != null     // WRONG

This prevented the count in the root node of being decremented.

Whole loop (corrected):

   do {
       cursor.locationCount--;
       cursor = cursor.parent;
   } while (cursor != null);

Also RemoveOld does not adapt minLimit and maxLimit. I don't know what the impact of this would be in practice.

Ojd17:09, 7 May 2012
 

That works! Excellent! I'll edit the code on the page to reflect it.

Tkiesel17:54, 8 May 2012
 

Hey, thanks for finding that bug.

By the way, just so you know, the version with the code on the page is not the version I actively use anymore. The "up to date" version is here under the java package "ags.utils.dataStructures.trees.thirdGenKD". This version is a re-write what is cleaner (IMO anyway), slightly faster, though with a slightly different feature set. It's missing that "remove old elements beyond a size limit" feature because nobody I knew of was using it at the time, but it has some additional features like an iterator to allow one to iterate over the nearest points in sorted order, and if you stop early it saves a notable amount of cpu. The advantage of this iterator is that it can be significantly faster in situations where the number of points you need is uncertain until looking at the closest few points (i.e. if you're filtering out some of the points after the k-nn search based on some other criteria).

Rednaxela18:23, 8 May 2012

Hey Rednaxela,

I'm switching over to your 3rd gen tree, and had a question re: your statement here:

"... it has some additional features like an iterator to allow one to iterate over the nearest points in sorted order, and if you stop early it saves a notable amount of cpu."

How early are we talking about? For instance, if I grab an iterator for 200 maxPointsReturned, iterate over the first 150 of them and decide I'm done, is that still in the "saves a notable amount of cpu" territory?

Tkiesel14:53, 21 May 2012

I'd have to benchmark it to be sure, and it depends on the distribution of data of course, but I'd estimate stopping at 150 out of 200 points would shave off something like 10-15% of the search time compared to just getting 200 points.

Rednaxela16:40, 21 May 2012
 

IIRC it depends on the structure of your data. From what I understand, it slowly expands the hypersphere of contained points in bursts (grabbing further and further tree branches), sorting them in order of relevancy as it goes. It depends on whether you don't have to do an extra expansion to get the extra points. An iterator for 200, then getting 150, will be slower than getting an iterator for 150, but chances are it will be faster than getting an iterator for 200 and using all 200.

Skilgannon16:42, 21 May 2012

Pretty much yeah, though it does avoid the full effort of sorting them by using a min-max heap that tosses the most distant points off and keeps the closest point accessible in constant time. The search algorithm is exactly the same as searching for all 200 (it needs to remember the 200 closest points it's found so far, and know what the furthest and closest ones of that set are), except that it pauses the search when when it is able to determine that no unchecked branch could have anything closer than the closest point not yet returned by the iterator.

Rednaxela18:15, 21 May 2012
 

Excellent info from you both, thanks!

One more question: the DistanceFunction file defines an Interface, but features no comments to describe what the two member methods are supposed to do.

distance() is utterly obvious... especially when reading your EuclideanDistanceFunction implementation. But distanceToRect()... I think I know what it requires, but I don't want to screw it up when I write my own DistanceFunction.

What precisely is distanceToRect defined as?

Tkiesel17:08, 21 May 2012
 

It's the minimum distance from that point you are testing to the hyper-rectangle defined by the min and max co-ordinates on each dimension.

So considering dimension x: if the point val is less than the min, the distance is (min - val), if it is between min and max the value is 0 (it is inside the rectangle), if it is more than max the distance is (val - max).

If you are doing Euclidean, square each distance then add them together. I do Manhattan, so just use the absolute value.

Skilgannon17:36, 21 May 2012
 

Yep, what Skilgannon said. Sorry I forgot to put comments in that interface.

If your'e wondering, this is used to compare the search point to the bounding box associated with each branch of the tree, and allows efficient skipping of irrelevant branches.

Rednaxela18:27, 21 May 2012
 

Excellent. I was guessing it was that or related to that from reading your euclidian distance implementation.

When I write a WeightedSquareEuclidieanDistanceFunction and/or WeightedManhattanDistanceFunction, would you like me to commit them to your bitbucket hg repo? I'm happy to help contribute! :)

Tkiesel18:50, 21 May 2012
 

Sure, or if posted on the wiki or in a bot I can upload it to that some time. Thanks :)

Rednaxela19:06, 21 May 2012

DeBroglie rev0026 is up. Only change from rev0025 is your 3rd gen tree. Should perform pretty close to what it was doing before, though some differences are to be expected since your 3rd gen tree doesn't drop points. We'll see, though I have to get going to the doctor at the moment.

I tested with both the SqrEuclidean and Manhattan versions of my Weighted trees. Both seemed to work fine in several test battles with some bots I had sitting around. I ended up making a WeightedDistanceFunction class to be a superclass of both the WeightedManhattanDistanceFunction and WeightedSquareEuclideanDistanceFunction.. to duplicate less of the code involved in weighting.

The weighted DF should failover gracefully if given weights that mismatch the number of tree dimensions. Only thing I didn't implement was doing a Math.abs() on weights, since someone out there might invent a DistanceFunction that utilizes negative weights.

If the code on my bitbucket fork meets your approval, I can toss you a pull request. :)

EDIT: Made a new bitbucket with all the work in a single commit, and decided to make WeightedDistanceFunction abstract.

Tkiesel20:35, 22 May 2012

Neat! When I first took a quick look at the version you initially posted, I was thinking to myself that WeightedDistanceFunction should have been abstract yeah.

I'll merge it in some time shortly.

Rednaxela01:22, 28 May 2012
 
 

Alright. I've got them written on a fork of your bitbucket repo. Once I test them in DeBroglie to be sure they work, I'll drop a pull request to you. :)

Tkiesel20:23, 21 May 2012
 
 

!!!! That's a feature that I've really been wanting for an idea I had for DeBroglie's gun! Excellent! You ought to edit the main page to mention that the most up-to-date version of the code is available off-wiki. Thanks for the heads-up Rednaxela!

Tkiesel22:45, 8 May 2012
 

Seems pretty great - is the 3rd gen tree threadsafe? The earlier iterations do not appear so.

Elliptic19:25, 1 May 2013

No. Use of multiple threads isn't normal in Robocode, so it wasn't designed to handle that. Also, adding thread safety checks would add a fairly notable performance penalty.

Rednaxela19:52, 1 May 2013
 
 

Null Pointer Exception - 3rd Gen Tree

Hiya,

Getting an occasional null pointer exception. I've not worked out why its happening, and looking at the source I can't quite see how it could happen. Call stack looks like

at KDTree.KdNode.addLeafPoint(KdNode.java:70) at KDTree.KdNode.splitLeafNode(KdNode.java:173) at KDTree.KdNode.addLeafPoint(KdNode.java:79) at KDTree.KdNode.addPoint(KdNode.java:63) at KDTree.KdTree.addPoint(KdTree.java:1)

Any ideas?

Wolfman00:45, 26 March 2013

Seems that the "points" array is getting set to 'null' in a newly created child node. Assuming you haven't modified the code of the tree, and you're not doing any multithreading things, I can only see one possible cause: *All* of the data points are being put in the same child node, causing the newly created child node to itself be split while the upper level's splitLeafNode is still running. This implies that calculateSplit() is returning true when it shouldn't.

Could you add code to line 169 to print the list of coordinates stored in the node being split, or use a debugger to obtain similar information? My suspicion is that the coordinates you're putting into the tree are very unusual in some manner, possibly involving NaN values.

Rednaxela01:28, 26 March 2013
 

Sorry for the wall of numbers. Here are the values inside the splitLeafNode function:

Points array: [ [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.27476300750221805, 0.0, 0.5, 0.5, 0.05944248946787964, 0.05944248946787964], [0.25781116652220715, 0.08692892561404503, 0.49442777954156447, 0.525842295924762, 0.06786100210077804, 0.14692687699385826], [0.2578177856998261, 0.08372619661301722, 0.49463837126786636, 0.5064605739811905, 0.06785023783184158, 0.1469334424631162], [0.2578214298492213, 0.08292549845220197, 0.4946909579122118, 0.5016151434952976, 0.06784441533138576, 0.14693385451422164], [0.25782433041558794, 0.08272532316252756, 0.49470410169693013, 0.5004037858738244, 0.06783982767738157, 0.14693272660841677], [0.25782704507786713, 0.0826752795385066, 0.4947073884210326, 0.500100946468456, 0.06783554880964597, 0.14693121360417843], [0.25782971324440696, 0.0826627688903635, 0.4947082111092713, 0.5000252366171141, 0.0678313472047577, 0.14692960432136848], [0.2578323697665857, 0.08265964148995872, 0.4947084178030773, 0.5000063091542785, 0.06782716496882069, 0.14692797097446295], [0.25783502335720493, 0.08265885990173155, 0.49470847049921196, 0.5000015772885695, 0.06782298762433847, 0.14692633161848062], [0.2578376761944629, 0.0826586647665601, 0.49470848469597223, 0.5000003943221424, 0.06781881155096657, 0.14692469076738635], [0.25784032882290964, 0.08265861624464657, 0.49470848926786565, 0.5000000985805355, 0.06781463584332002, 0.14692304954976892], [0.2578429813786835, 0.0826586043760398, 0.49470849143351225, 0.5000000246451339, 0.06781046027498529, 0.14692140824779062], [0.2578456338958197, 0.08265860167075158, 0.4947084929975659, 0.5000000061612835, 0.06780628478936644, 0.1469197669319714], [0.2578482863828274, 0.08265860125628488, 0.49470849441118986, 0.5000000015403209, 0.06780210937228809, 0.1469181256199622], [0.2578509388418348, 0.08265860141451545, 0.49470849578717485, 0.5000000003850803, 0.06779793402021453, 0.14691648431616464], [0.2578535912733745, 0.08265860171591212, 0.49470849715371845, 0.5000000000962701, 0.06779375873223772, 0.14691484302170008], [0.2578562436775801, 0.08265860205309222, 0.4947084985178703, 0.5000000000240675, 0.0677895835081634, 0.146913201736816], [0.257858896054486, 0.08265860239921004, 0.49470849988139254, 0.5000000000060169, 0.06778540834789157, 0.14691156046162138], [0.2578615484041011, 0.0826586027475541, 0.49470850124472593, 0.5000000000015042, 0.06778123325144109, 0.1469099191960998], [0.25786420072642846, 0.08265860309644661, 0.4947085026079805, 0.500000000000376, 0.06777705821881806, 0.1469082779402436], [0.2578668530214694, 0.08265860344546808, 0.4947085039711838, 0.500000000000094, 0.06777288324998178, 0.14690663669408965], [0.25786950528922475, 0.08265860379451359, 0.4947085053343428, 0.5000000000000234, 0.06776870834495623, 0.1469049954576125], [0.25787215752969567, 0.08265860414355704, 0.49470850669745914, 0.5000000000000058, 0.06776453350371553, 0.14690335423083412], [0.2578748097428828, 0.08265860449259181, 0.4947085080605333, 0.5000000000000014, 0.06776035872627743, 0.1469017130137357], [0.25787746192878697, 0.08265860484161626, 0.4947085094235653, 0.5000000000000003, 0.0677561840126425, 0.1469000718063138], [0.2578801140874092, 0.08265860519062998, 0.4947085107865553, 0.5000000000000001, 0.06775200936277642, 0.14689843060859753], null ] bucket capacity is 50 Size is currently 48 Split dimension is 5 Split value is 0.10318817199105064

Wolfman09:43, 26 March 2013
 

The last point there is null...

AW15:50, 26 March 2013
 

Yes but its index 49, and size is currently 48. :)

Wolfman16:06, 26 March 2013
 

Yeah, that null there is not a problem, though it does look like I accidentally made the nodes split one entry too early. That wouldn't cause any issue besides a tiny amount of wasted memory though.

Later today I'll see if I can replicate the issue with that list of points.

Rednaxela16:30, 26 March 2013
 

Unit tests for KdTree

Hello Rednaxela,

I just converted your KD-tree Java code to C#. Do you have unit tests for KD-Tree? They would allow me to test if the conversion succeded.

Ojd01:05, 6 May 2012

I don't know if you could modified it to run the C# implementation, but K-NN algorithm benchmark has an accuracy checker with linear search, as with performance benchmark. You can give it a try.

Nat Pavasant14:59, 7 May 2012
 

Yeah, I didn't use any unit tests when writing the KD-Tree, just some functional tests. The most thorough of which is from the K-NN algorithm benchmark that Nat and I wrote. It repeatedly tests the KD-Trees with a very large set of random set of points against known-good results from a very very simple linear-time k-nn search.

Rednaxela18:31, 8 May 2012
 

I just published my C# implementation of a KD-Tree based on Rednaxela's java implementation here on robowiki.

Ojd17:02, 22 May 2012
 

Hi Rednaxela & Ojd,

I'd like to have remove functionality and added the method below to KdNode. It is based on Ojd's C# variant and adjusts. KdNode needs and extra field: parent. It seems to do what I want, but feel not confident about some points:

  • do the bounds of parent nodes to be adjusted as well?
  • how to add it nicely to the NearestNeighborIterator?
  • what if you want to re-weight after n removals? (remove remainder and add again?)

Thanks, Arie

    public boolean remove(double[] point, T value)
    {
        KdNode<T> cursor = this;
        while (!cursor.isLeaf()) {
            if (point[cursor.splitDimension] > cursor.splitValue) {
                cursor = cursor.left;
            } else {
                cursor = cursor.right;
            }
        }
        for (int i = 0; i < cursor.size; i++) {
            if (cursor.data[i].equals(value)) {
                double[] cursorPoint = cursor.points[i];
                // remove this points
                cursor.data = ArrayUtil.remove(cursor.data, i);
                cursor.points = ArrayUtil.remove(cursor.points, i);

                // fix tree size all the way up to the root
                KdNode<T> parent = cursor.parent;
                do {
                    parent.size--;
                    parent = cursor.parent;
                } while (parent != null);


                // adjust _minBound and _maxBound after removing an item.
                if(cursor.points.length > 0){
                    for (int j = 0; j < dimensions; j++) {
                        if (Double.isNaN(cursorPoint[j])) {
                            if (!Double.isNaN(minBound[j]) || !Double.isNaN(maxBound[j])) {
                                singlePoint = false;
                            }
                            minBound[j] = Double.NaN;
                            maxBound[j] = Double.NaN;
                        }
                        else if (minBound[j] > cursorPoint[j]) {
                            minBound[j] = cursorPoint[i];
                            singlePoint = false;
                        }
                        else if (maxBound[j] < cursorPoint[i]) {
                            maxBound[j] = cursorPoint[i];
                            singlePoint = false;
                        }
                    }
                }
                else{
                    // last point at this cursor was removed.
                    minBound = null;
                    maxBound = null;
                }
                return true;
            }
        }
        return false;
    }
Arie23:21, 27 May 2012

To answer your questions:

  • Actually, technically you don't need to adjust ANY bounds at all. It won't affect the correctness of the result. The only consequence is that the searches may be slightly slower than optimal. If you are adjusting the bounds though, you might as well adjust the parent ones as well, because that's where a large chunk of the (smallish) performance impact would be.
On one hand adjusting the bounds makes the removal slower, but on the other hand it makes the search a little faster. To really know if it's worth adjusting the bounds, would probably require benchmarking with some code that's making realistic use of the removal.
  • To add it nicely to the iterator, I think you'd need to have the iterator keep track of the last point it returned, so it can call a remove on that.
  • If I understand what you mean... You should note that the structure of the tree is such that you can't just remove a leaf node, because every node must either be a leaf node or have two child nodes. To prune back the tree after removing items, you'd need to implement a "merge siblings" algorithm. This algorithm would need to first check that the combined size of both would fit comfortably in one node, and if that's the case basically perform the reverse of what "splitLeafNode" does. It'll also need to take into account the possibility that one sibling may not be a leaf node while the other is.

Some other notes:

  • You don't appear to be changing the "size" of the node itself, just the parents, which would be a problem. I also don't know that "ArrayUtil.remove" is what you want there.
Note that the "data" and "points" arrays have extra capacity beyond what's used and that "size" (or "_count" in the case of Ojd's C# variant) tracks the number of elements actually used.
(I also can't seem to find any documentation of a "ArrayUtil.remove" in either Java or C# so I'm a bit confused)
  • The code to adjust the bounds there appears to be incorrect. It looks like what's happening in that code is extending the bounds to contain the removed point... and the bounds already contain that anyway.
Rednaxela01:21, 28 May 2012
 

Thanks for your quick reply!

Good to know that the bounds don't affect the correctness. I agree, then it is a performance trade-off to adjust them or not.

You only really need to perform "merge siblings" when ALL points from the node are removed, isn't it?

Indeed, the size of the node itself must be adjusted as well:

KdNode<T> parent = cursor.parent;
do {
    parent.size--;
    parent = cursor.parent;
} while (parent != null);

Also ArrayUtil.remove is my own method:

        /**
     * Remove the element from the array at location i.
     * @param as original array.
     * @param i index indicating the element to be removed.
     * @return new array with length (as.length-1) if a is contained in as, or else a copy of the original array.
     */
    public static int[] remove(int[] as, int i){
        int n = as.length;
        if(i < 0 || i >= n) {
            return Arrays.copyOf(as, n);
        }
        int[] copy = new int[n-1];
        System.arraycopy(as,0,copy,0,i);
        System.arraycopy(as,i+1,copy,i,n-1-i);
        return copy;
    }
    public static int[] remove(int[] as, int i){
        int n = as.length;
        if(i < 0 || i >= n) {
            return copyOf(as, n);
        }
        int[] copy = new int[n-1];
        System.arraycopy(as,0,copy,0,i);
        System.arraycopy(as,i+1,copy,i,n-1-i);
        return copy;
    }

Removal is O(n), but you cant do much better. There is also a variant for double[] and double[][].

The bounds indeed really need to determine the min and max by iterating over all points.

Back to the keyboard...

Arie18:20, 28 May 2012

Well, to be clear the bounds affect the correctness if they are *too small*, but the bounds being too big (as is the case after removing) has no impact.

Really, you don't have to "merge siblings" when all points are removed from a node even, and it could (for sake of tree balance) be worth merging sibling nodes before they're completely empty (i.e. if both nodes had half their points removed)

With regards to the removing from the array, actually you can do it in O(1) time due to the fact that 1) unused array elements are expected, and 2) the array of points in a leaf node are unordered. You just need to overwrite the removed element with the last element, and then remove the last element:

points[i] = points[size-1];
data[i] = data[size-1];
points[size-1] = null;
data[size-1] = null;
size--;

(Oh, and when removing elements, be careful with how you're looping over them at the same time. You need to decrement "i" when the remove occurs, because what points[i] refers to changes when you remove the element)

Rednaxela06:20, 29 May 2012
 

If you wanted to take a look at Gilgalad, I think I have this working correctly in my tree. However, the feature is never used since it is faster to just have one large bucket unless the size limit is greater than about 10000 or so, and if I used that many points I would probably just use them all.

AW15:02, 29 May 2012