User:AW/Kd-Tree/thoughts
I was thinking about how to write a kd-tree and I think that none of the ones I have seen are 100% accurate. For example assume a 2 dimensional space which for convenience will be a the refered to by the cartesian coordinate system ({x | 0 < x < 1} {y | 0 < y < 1}). Suppose we had these points added in this order and the tree was partitioned with one point in each node (if you don't understand / I described this incorrectly, please see: Spatial Index Demos).
1) point one: (0.5, 0.8)
2) point two: (0.45, 0.5)
at this point the data is partitioned (ie. the graph is divided by the line x = 0.5)
3) point three: (0.55, 0.45)
point three is closer to point two than it is to point one, but the kd-tree will say the closest match is point one. The proper way to do partition this would look more like a voronoi diagram, but this would make the trees slower.
Am I right?
--AW 00:56, 14 March 2011 (UTC)
Nope, it will say the closest point is point two, but in this case it won't be any faster than a brute force search. It will descend to the right because 0.55 > 0.5, but on the way back up it will check if it's possible there are any points on the other side of each split that could be closer than the current closest point.
In this case - and I'm kind of mangling the algorithm here, but for the sake of argument - let's say it gets back to the root and the closest point so far is point one=(0.5, 0.8). It then checks: what is the shortest possible distance for any theoretical point ont he other side of the partition? This is just the distance on the splitting axis, so 0.55-0.5=0.05 for the theoretical point (0.5, 0.45) that could exist on the other side. Since this is closer than the distance to the current nearest point, it will descend the other side of that node and look for closer points. Does that make sense? --Voidious 01:43, 14 March 2011 (UTC)
I think so. Basically, the tree only trims points in nodes whose boundries are farther away than the current closest point after first descending to a leaf, right? Has anyone tried using a voronoi diagram type aproach to eliminate this? (assuming two dimensions again) One couldn't do a simple less than greater than, but one could store the equation of a line f(x) and then given a point (a, b) trim based on whether f(a) > b. Do you think this is worth a try? -- AW 02:30, 14 March 2011 (UTC)
Well, it doesn't change the result of the nearest neighbor search, so it's just a question of speed. A variant that doesn't restrict partitions to being orthogonal to dimensions would definitely increase efficiency in terms of "nodes visited per serach", however I strongly suspect it'll be much slower in practice due to the computation costs of comparing across the partition boundary. Now if only we had processors that has single-cycle comparisons of arbitrary-dimension lines and points... :) --Rednaxela 03:44, 14 March 2011 (UTC)
I think this may work depending on the processor. (however I am not that familiar with Java so I am thinking in terms of C) the kd-trees go down to a leaf, and then, as Voidious just taught me go up. However they would compare to go down and then again to go up (my assumption, he did say he was mangling the algorithm). Based on this, I think it would take 2 comparisons to get to each appropriate node, but these comparisons are not done in one clock cycle because you go down, compare, then up, compare.
So I think it would take 2 cycles per level of the tree we traverse, and if the nearest point was on a node that had so to speak several dividing lines between it and the point we are searching for a nearest neighbor to we would need to perform these comparisons more than once (of course this is assuming a poor case which should get more rare as the tree gets more points). To use a voronoi diagram you would need to perform one multiplication, one addition, and one comparison. I think this would take 3 clock cycles to perform on a dual core processor, thereby possibly saving time. Also each node has only one point so there is no need to "search" for it.
However, besides the assumptions I already mentioned it could be take longer to generate the voronoi diagram. Since points in voronoi diagrams are usually rather far away from the edges there may be a way to speed up the multiplication at the cost of accuracy. Imagine the faces of Intel employees when reading, "Dear Sirs: I need an instruction to perform less accurate floating point multiplication in two clock cycles for this computer game. ...", Then write to Oracle about using it in the JVM, then I'd have to write the program. And perhaps I could have a tree that performs 0.5% better in rare instances. -- AW 17:47, 14 March 2011 (UTC)
On second thought, you probably can't make a tree out of a voronoi diagram as it gets more complex because the "dividing lines" won't divide the entire domain. Its easy for a person to see which box a given point is in, but with a computer doing that would take a lot more than just a tree. --AW 18:13, 14 March 2011 (UTC)