removeOld() not working? Or locationCount not decrementing after removeOld()?

Jump to navigation Jump to search
Revision as of 21 May 2012 at 17:06.
The highlighted comment was created in this revision.

removeOld() not working? Or locationCount not decrementing after removeOld()?

I've added this KD-tree to my bot as I do a ground-up rewrite of it. I'm not even to the point of extracting k.n.n. data from the tree yet, I was just in the process of testing dropping data into the tree, and noticed something odd. The tree isn't pruning when locationCount exceeds sizeLimit. Or if it is, then it's not being reflected in the value of locationCount.

I've set sizeLimit in the tree instantiation to a mere 35 as a test, and by the end of a 35 round match against Walls, the tree reports containing nearly 2000 nodes. After reading through the code, I don't see a lower bound on pruning the tree other than bucketSize, which is only 24 anyway. I've added a debug print message, and removeOld() is definitely being called, but is returning before the "weird" comment at the end of that method....

I'm sure I'm missing something... but what? Should I presume this is working as intended, despite the reported size() growing without bound?

    Tkiesel23:38, 22 November 2011

    Okay. A bit of testing with grabbing knn's via nearestNeighbor() shows that even if I request more knn points than. sizeLimit, the function only returns <= sizeLimit points. Excellent! So, is it problematic that locationCount (and thus the output of size() )is going to head without limit for Integer.MAX_VALUE and never gets decremented to reflect the current actual size of the tree? The former isn't much of a concern, RC matches are never that long.. but the latter?

      Tkiesel23:59, 22 November 2011
       

      I found the error. In method RemoveOld, the loop condition of the decrementing loop should be

         cursor != null    // OK
      

      instead of

         cursor.parent != null     // WRONG
      

      This prevented the count in the root node of being decremented.

      Whole loop (corrected):

         do {
             cursor.locationCount--;
             cursor = cursor.parent;
         } while (cursor != null);
      

      Also RemoveOld does not adapt minLimit and maxLimit. I don't know what the impact of this would be in practice.

        Ojd16:09, 7 May 2012
         

        That works! Excellent! I'll edit the code on the page to reflect it.

          Tkiesel16:54, 8 May 2012
           

          Hey, thanks for finding that bug.

          By the way, just so you know, the version with the code on the page is not the version I actively use anymore. The "up to date" version is here under the java package "ags.utils.dataStructures.trees.thirdGenKD". This version is a re-write what is cleaner (IMO anyway), slightly faster, though with a slightly different feature set. It's missing that "remove old elements beyond a size limit" feature because nobody I knew of was using it at the time, but it has some additional features like an iterator to allow one to iterate over the nearest points in sorted order, and if you stop early it saves a notable amount of cpu. The advantage of this iterator is that it can be significantly faster in situations where the number of points you need is uncertain until looking at the closest few points (i.e. if you're filtering out some of the points after the k-nn search based on some other criteria).

            Rednaxela17:23, 8 May 2012

            Hey Rednaxela,

            I'm switching over to your 3rd gen tree, and had a question re: your statement here:

            "... it has some additional features like an iterator to allow one to iterate over the nearest points in sorted order, and if you stop early it saves a notable amount of cpu."

            How early are we talking about? For instance, if I grab an iterator for 200 maxPointsReturned, iterate over the first 150 of them and decide I'm done, is that still in the "saves a notable amount of cpu" territory?

              Tkiesel13:53, 21 May 2012

              I'd have to benchmark it to be sure, and it depends on the distribution of data of course, but I'd estimate stopping at 150 out of 200 points would shave off something like 10-15% of the search time compared to just getting 200 points.

                Rednaxela15:40, 21 May 2012
                 

                IIRC it depends on the structure of your data. From what I understand, it slowly expands the hypersphere of contained points in bursts (grabbing further and further tree branches), sorting them in order of relevancy as it goes. It depends on whether you don't have to do an extra expansion to get the extra points. An iterator for 200, then getting 150, will be slower than getting an iterator for 150, but chances are it will be faster than getting an iterator for 200 and using all 200.

                  Skilgannon15:42, 21 May 2012

                  Pretty much yeah, though it does avoid the full effort of sorting them by using a min-max heap that tosses the most distant points off and keeps the closest point accessible in constant time. The search algorithm is exactly the same as searching for all 200 (it needs to remember the 200 closest points it's found so far, and know what the furthest and closest ones of that set are), except that it pauses the search when when it is able to determine that no unchecked branch could have anything closer than the closest point not yet returned by the iterator.

                    Rednaxela17:15, 21 May 2012
                     

                    Excellent info from you both, thanks!

                    One more question: the DistanceFunction file defines an Interface, but features no comments to describe what the two member methods are supposed to do.

                    distance() is utterly obvious... especially when reading your EuclideanDistanceFunction implementation. But distanceToRect()... I think I know what it requires, but I don't want to screw it up when I write my own DistanceFunction.

                    What precisely is distanceToRect defined as?

                      Tkiesel16:08, 21 May 2012
                       

                      It's the minimum distance from that point you are testing to the hyper-rectangle defined by the min and max co-ordinates on each dimension.

                      So considering dimension x: if the point val is less than the min, the distance is (min - val), if it is between min and max the value is 0 (it is inside the rectangle), if it is more than max the distance is (val - max).

                      If you are doing Euclidean, square each distance then add them together. I do Manhattan, so just use the absolute value.

                        Skilgannon16:36, 21 May 2012
                         

                        Yep, what Skilgannon said. Sorry I forgot to put comments in that interface.

                        If your'e wondering, this is used to compare the search point to the bounding box associated with each branch of the tree, and allows efficient skipping of irrelevant branches.

                          Rednaxela17:27, 21 May 2012
                           

                          Excellent. I was guessing it was that or related to that from reading your euclidian distance implementation.

                          When I write a WeightedSquareEuclidieanDistanceFunction and/or WeightedManhattanDistanceFunction, would you like me to commit them to your bitbucket hg repo? I'm happy to help contribute! :)

                            Tkiesel17:50, 21 May 2012
                             

                            Sure, or if posted on the wiki or in a bot I can upload it to that some time. Thanks :)

                              Rednaxela18:06, 21 May 2012
                               
                               

                              !!!! That's a feature that I've really been wanting for an idea I had for DeBroglie's gun! Excellent! You ought to edit the main page to mention that the most up-to-date version of the code is available off-wiki. Thanks for the heads-up Rednaxela!

                                Tkiesel21:45, 8 May 2012