Talk:Range Search

From Robowiki
Revision as of 21:30, 8 August 2011 by Voidious (talk | contribs) (notions ;))
Jump to navigation Jump to search

Your KNN algorithm illustration isn't correct, as it is showing a range box like the others. A more accurate depiction would be lines drawn to the nearest 4/8 neighbors. As for the range search, My KdTree Implementation supports them. — Chase-san 10:47, 8 August 2011 (UTC)

Yes, you're rigth. i'll redraw it later. but it's any way will be rhomb for Manhettan distance formula and circle for Euclidean distance formula, where radius is max distance to neighbor. --Jdev 11:07, 8 August 2011 (UTC)
Actually not so, as points of equal distance will be selected in order of first or last visited in a KNN depending on implementation. — Chase-san 11:32, 8 August 2011 (UTC)

Thanks to all for help. A little clarification: imho, advantage of RS, that if you have many data points, but current situation is placed in area with low density of data, you did not get not corresponding data --Jdev 16:54, 8 August 2011 (UTC)

I quite agree this is an idea worth looking into. I've tried it in guns and not found any improvement over KNN, but it's something I'd tinker with again. It's worth noting that some of us weight data points by inverse distance to the current data point, so theoretically we're not being hurt by including the less relevant data because we already weight it accordingly. --Voidious 18:46, 8 August 2011 (UTC)

here another case: current situation is in area with high density, and you can get much more relevant data than k --Jdev 18:54, 8 August 2011 (UTC)
Yeah, I agree with that notion too. It fueled a lot of my research into other clustering techniques, where I'd sometimes be using up to 2000 data points in the densest parts of the graph. But in the end I settled on KNN with a pretty large k - right now in Diamond, I scale up linearly from 1 to 340, which I hit at around 5,000 data points. I never got my fancy clustering guns to outperform my best KNN and they were 5-10x slower... --Voidious 18:59, 8 August 2011 (UTC)
I quite strongly disagree with that notion actually. Why? Because, if the way you weight points based on distance is ideal, then there is never an advantage to a small "k" value. If the weighting is formulated right, the only reason to cap the "k" value is for performance reasons (and if a number of points is too slow for KNN output, it's also too high for Range Search anyway). --Rednaxela 19:18, 8 August 2011 (UTC)
Well, maybe we should be more specific about what notion we're talking about. =) I think we can agree that in the densest areas of the graph, most to all of us outside of VCS are ignoring gobs of relevant data. There can be spots with thousands of data points very, very tightly grouped. Taking all points within a certain distance instead of a fixed number of points seems like a pretty elegant way to handle the situation. Always using a huge k and weighting things perfectly sounds good to me, too, but it isn't necessarily easy or even possible... --Voidious 20:30, 8 August 2011 (UTC)