User talk:Chase-san/Kd-Tree

From Robowiki
< User talk:Chase-san
Revision as of 10:42, 1 July 2010 by RednaxelaBot (talk | contribs) (Using <syntaxhighlight>.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

NullPointerExceptions and inaccuracy

Trying to test your Kd-Tree in the benchmark framework, I'm getting some NullPointerExceptions due to right.rect.getNearest(k) returning null sometimes. I don't know if it's the proper fix, I changed the relevant lines to be like:

PointKD p = right.rect.getNearest(k);
if(p != null && k.distanceSq(p) < t) {

Something else is wrong also I believe, because it's only getting "88% accuracy" out of the benchmark results, so it's not finding all of the nearest neighbors correctly. --Rednaxela 22:23, 1 March 2010 (UTC)

That should only happen if the rectangle hasn't gotten expanded ever, which shouldn't happen, unless you haven't added anything to the tree at all (darn me and removing all my safeties). Since this gets set during the add. Can you give me some idea of the data you are feeding into it? --Chase 22:44, 1 March 2010 (UTC)
Well, it's adding to the tree before it's searching. See, File:KNN.jar along with Diamond vs CunobelinDC gun data, a recording I made of every kd-tree operation performed by an older version of Voidious's Diamond during a battle. I can upload an updated KNN.jar that includes the testing of your tree if you wish. Personally I've found the framework invaluable for testing my tree, making it easy to see when something I just did added a bug. --Rednaxela 22:55, 1 March 2010 (UTC)
I see, I got it running, and I get the error too. Let me see whats going on. --Chase 23:25, 1 March 2010 (UTC)
Well I found parts of the problem, it gets 100% now, but its about pretty slow (not as slow as Linear search), I suppose that is to be expected, seeing how hard it is on the GC. Still not sure why Rect is returning null, since it shouldn't (probably something stupid per usual). Well I based this new one off my old one, maybe I should write one from scratch. :P --Chase 00:02, 2 March 2010 (UTC)
K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
-----------------------------------------
Reading data from data.csv
Read 25621 saves and 10300 searches.

Running 10 repetition(s) for k-nearest neighbours searching:
:: 13 dimension(s); 40 neighbour(s)
Warming up the JIT with 5 repetitions first...

Running tests... COMPLETED.

RESULT << k-nearest neighbours search with Voidious' Linear search >>
: Average searching time       = 2.007 miliseconds
: Average worst searching time = 37.627 miliseconds
: Average adding time          = 3.96 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Chase-san's Bucket PR k-d tree >>
: Average searching time       = 1.023 miliseconds
: Average worst searching time = 29.937 miliseconds
: Average adding time          = 8.38 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
: Average searching time       = 0.493 miliseconds
: Average worst searching time = 45.444 miliseconds
: Average adding time          = 7.15 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
: Average searching time       = 0.836 miliseconds
: Average worst searching time = 35.729 miliseconds
: Average adding time          = 6.39 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Rednaxela's Bucket kd-tree >>
: Average searching time       = 0.152 miliseconds
: Average worst searching time = 38.736 miliseconds
: Average adding time          = 7.43 microseconds
: Accuracy                     = 100%


BEST RESULT: 
 - #1 Rednaxela's Bucket kd-tree [0.1518]
 - #2 Simonton's Bucket PR k-d tree [0.4928]
 - #3 Voidious' Bucket PR k-d tree [0.8362]
 - #4 Chase-san's Bucket PR k-d tree [1.0227]
 - #5 Voidious' Linear search [2.0067]

Benchmark running time: 493.49 seconds

Some results... All I can say is.. this current implementation, the worst search time is the lowest of all of them (though not by a ton, I suppose). But I wonder why that is. --Chase 00:29, 2 March 2010 (UTC)

RESULT << k-nearest neighbours search with Chase-san's Bucket PR k-d tree >>
: Average searching time       = 0.649 miliseconds
: Average worst searching time = 34.114 miliseconds
: Average adding time          = 8.92 microseconds
: Accuracy                     = 10

It seems Caching the distance cut the time by about 40%. (other times are close to what they are above). Oh well this is probably my last post on the wiki tonight. --Chase 00:45, 2 March 2010 (UTC)

As far as "I wonder why that is", I would say a variety of factors are involved. Firstly, I'd say the way you use comparators slows things down, due to the calling overhead. Secondly, a default bucket size of 200 seems odd; I've found that the optimal bucket size is more in the area of 24 to 32, depending on the specifics of the tree. With those things changed, as well as some other optimization details around the place, I'd guess the speed could get to would be around that of Duyn's tree perhaps, at least as fast as Simonton's tree. To go a step further, one could replace the Dequeue with a heap, in order to avoid full sorting overhead on points that will not be part of the final set of returned points. Anyway, not a bad job though. It does beat Voidious' tree so far after all which is a well enough made tree. :) --Rednaxela 01:15, 2 March 2010 (UTC)

Here is the low down. It seems what was causing the problem with the null was the fact of a bucket getting points with exactly the same value on a certain dimension, meanin that all the points were placed to one side or the other (in this case the left) of that dimension, and the other side got nothing (poor child). The fact is the rectangles were only defined with something was added to the child, and I assumed since I divided the domain perfectly in half, this error was impossible to get (as I mentioned in my first bit there).
That should only happen if the rectangle hasn't gotten expanded ever, which shouldn't happen, unless you haven't added anything to the tree at all
I had pretty much answered my own question, but was too dense to realize it. however I got tired of poking at my old tree (preformance just kept getting worse), so I made a cut down version with a ShiftArray in it (basically array copy stuff back along the array and just insert the item in place, which is what my prioritydeque pretty much did anyway, but this does it with much less code). I got rid of the whole PointKD, RectangleKD functions, now I have just a thing that takes double[], I added a value clause, so it is just not a key-tree or set, anymore. I am running 10 iterations with KNN to get a benchmark, I will post those when they finish (on another computer). --Chase 03:38, 2 March 2010 (UTC)
K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
-----------------------------------------
Reading data from data.csv
Read 25621 saves and 10300 searches.

Running 10 repetition(s) for k-nearest neighbours searching:
:: 13 dimension(s); 40 neighbour(s)
Warming up the JIT with 5 repetitions first...

Running tests... COMPLETED.

RESULT << k-nearest neighbours search with Voidious' Linear search >>
: Average searching time       = 1.873 miliseconds
: Average worst searching time = 34.274 miliseconds
: Average adding time          = 3.42 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Chase-san's Bucket PR k-d tree >>
: Average searching time       = 0.532 miliseconds
: Average worst searching time = 24.898 miliseconds
: Average adding time          = 7.76 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Chase's Hyper Bucket KD-Tree >>
: Average searching time       = 0.433 miliseconds
: Average worst searching time = 24.474 miliseconds
: Average adding time          = 6.82 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
: Average searching time       = 0.437 miliseconds
: Average worst searching time = 38.07 miliseconds
: Average adding time          = 6.36 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
: Average searching time       = 0.763 miliseconds
: Average worst searching time = 31.013 miliseconds
: Average adding time          = 5.62 microseconds
: Accuracy                     = 100%

RESULT << k-nearest neighbours search with Rednaxela's Bucket kd-tree >>
: Average searching time       = 0.138 miliseconds
: Average worst searching time = 36.875 miliseconds
: Average adding time          = 6.69 microseconds
: Accuracy                     = 100%


BEST RESULT: 
 - #1 Rednaxela's Bucket kd-tree [0.1375]
 - #2 Chase's Hyper Bucket KD-Tree [0.4326]
 - #3 Simonton's Bucket PR k-d tree [0.4372]
 - #4 Chase-san's Bucket PR k-d tree [0.5324]
 - #5 Voidious' Bucket PR k-d tree [0.7634]
 - #6 Voidious' Linear search [1.873]

Benchmark running time: 463.63 seconds

I am not you Rednaxela, I think this is as far as I am going to get, I think I will post the hyper version though (it is smaller). --Chase 03:44, 2 March 2010 (UTC)

Oh yeah, am I the only one that has a Range function?

Pretty sure. At least, the only one with a Hyperrectangle style one. --Rednaxela 05:17, 2 March 2010 (UTC)