Talk:Range Search
Your KNN algorithm illustration isn't correct, as it is showing a range box like the others. A more accurate depiction would be lines drawn to the nearest 4/8 neighbors. As for the range search, My KdTree Implementation supports them. — Chase-san 10:47, 8 August 2011 (UTC)
- Yes, you're rigth. i'll redraw it later. but it's any way will be rhomb for Manhettan distance formula and circle for Euclidean distance formula, where radius is max distance to neighbor. --Jdev 11:07, 8 August 2011 (UTC)
Thanks to all for help. A little clarification: imho, advantage of RS, that if you have many data points, but current situation is placed in area with low density of data, you did not get not corresponding data --Jdev 16:54, 8 August 2011 (UTC)
I quite agree this is an idea worth looking into. I've tried it in guns and not found any improvement over KNN, but it's something I'd tinker with again. It's worth noting that some of us weight data points by inverse distance to the current data point, so theoretically we're not being hurt by including the less relevant data because we already weight it accordingly. --Voidious 18:46, 8 August 2011 (UTC)
- here another case: current situation is in area with high density, and you can get much more relevant data than k --Jdev 18:54, 8 August 2011 (UTC)
- Yeah, I agree with that notion too. It fueled a lot of my research into other clustering techniques, where I'd sometimes be using up to 2000 data points in the densest parts of the graph. But in the end I settled on KNN with a pretty large k - right now in Diamond, I scale up linearly from 1 to 340, which I hit at around 5,000 data points. I never got my fancy clustering guns to outperform my best KNN and they were 5-10x slower... --Voidious 18:59, 8 August 2011 (UTC)
- I quite strongly disagree with that notion actually. Why? Because, if the way you weight points based on distance is ideal, then there is never an advantage to a small "k" value. If the weighting is formulated right, the only reason to cap the "k" value is for performance reasons (and if a number of points is too slow for KNN output, it's also too high for Range Search anyway). --Rednaxela 19:18, 8 August 2011 (UTC)
- Well, maybe we should be more specific about what notion we're talking about. =) I think we can agree that in the densest areas of the graph, most to all of us outside of VCS are ignoring gobs of relevant data. There can be spots with thousands of data points very, very tightly grouped. Taking all points within a certain distance instead of a fixed number of points seems like a pretty elegant way to handle the situation. Always using a huge k and weighting things perfectly sounds good to me, too, but it isn't necessarily easy or even possible... --Voidious 20:30, 8 August 2011 (UTC)
- Well, say you have thousands of data points very tightly grouped. If you have the CPU time to process them all with a range search, I don't see why you don't have the CPU time to process the same number of points from a KNN search. The CPU time constraints should be extremely similar. If it's impractical to process thousands of points from KNN due to CPU constraints, it would also be impractical to process those thousands of points from a range search.
- One little aside, which applies equally to both KNN and range search: Processing thousands of points could be sped up by caching aggregate data at nodes of the tree, so that if a node is fully within the selected points, it's aggregate statistics could be used. --Rednaxela 20:54, 8 August 2011 (UTC)
Just for my understanding, shouldn't the illustrations of k-nearest and range-search be circles around the point instead of rectangles? Or is that an ideal but impractical situation due to boundary checks or cpu constraints. --GrubbmGait 23:18, 8 August 2011 (UTC)
- [View source↑]
- [History↑]