Difference between revisions of "Talk:Range Search"
(yeah, data density too...) |
(disagree ;-)) |
||
Line 10: | Line 10: | ||
: here another case: current situation is in area with high density, and you can get much more relevant data than k --[[User:Jdev|Jdev]] 18:54, 8 August 2011 (UTC) | : here another case: current situation is in area with high density, and you can get much more relevant data than k --[[User:Jdev|Jdev]] 18:54, 8 August 2011 (UTC) | ||
: Yeah, I agree with that notion too. It fueled a lot of my research into other clustering techniques, where I'd sometimes be using up to 2000 data points in the densest parts of the graph. But in the end I settled on KNN with a pretty large k - right now in [[Diamond]], I scale up linearly from 1 to 340, which I hit at around 5,000 data points. I never got my fancy clustering guns to outperform my best KNN and they were 5-10x slower... --[[User:Voidious|Voidious]] 18:59, 8 August 2011 (UTC) | : Yeah, I agree with that notion too. It fueled a lot of my research into other clustering techniques, where I'd sometimes be using up to 2000 data points in the densest parts of the graph. But in the end I settled on KNN with a pretty large k - right now in [[Diamond]], I scale up linearly from 1 to 340, which I hit at around 5,000 data points. I never got my fancy clustering guns to outperform my best KNN and they were 5-10x slower... --[[User:Voidious|Voidious]] 18:59, 8 August 2011 (UTC) | ||
+ | : I quite strongly disagree with that notion actually. Why? Because, if the way you weight points based on distance is ideal, then there is never an advantage to a small "k" value. If the weighting is formulated right, the only reason to cap the "k" value is for performance reasons (and if a number of points is too slow for KNN output, it's also too high for Range Search anyway). --[[User:Rednaxela|Rednaxela]] 19:18, 8 August 2011 (UTC) |
Revision as of 20:18, 8 August 2011
Your KNN algorithm illustration isn't correct, as it is showing a range box like the others. A more accurate depiction would be lines drawn to the nearest 4/8 neighbors. As for the range search, My KdTree Implementation supports them. — Chase-san 10:47, 8 August 2011 (UTC)
- Yes, you're rigth. i'll redraw it later. but it's any way will be rhomb for Manhettan distance formula and circle for Euclidean distance formula, where radius is max distance to neighbor. --Jdev 11:07, 8 August 2011 (UTC)
Thanks to all for help. A little clarification: imho, advantage of RS, that if you have many data points, but current situation is placed in area with low density of data, you did not get not corresponding data --Jdev 16:54, 8 August 2011 (UTC)
I quite agree this is an idea worth looking into. I've tried it in guns and not found any improvement over KNN, but it's something I'd tinker with again. It's worth noting that some of us weight data points by inverse distance to the current data point, so theoretically we're not being hurt by including the less relevant data because we already weight it accordingly. --Voidious 18:46, 8 August 2011 (UTC)
- here another case: current situation is in area with high density, and you can get much more relevant data than k --Jdev 18:54, 8 August 2011 (UTC)
- Yeah, I agree with that notion too. It fueled a lot of my research into other clustering techniques, where I'd sometimes be using up to 2000 data points in the densest parts of the graph. But in the end I settled on KNN with a pretty large k - right now in Diamond, I scale up linearly from 1 to 340, which I hit at around 5,000 data points. I never got my fancy clustering guns to outperform my best KNN and they were 5-10x slower... --Voidious 18:59, 8 August 2011 (UTC)
- I quite strongly disagree with that notion actually. Why? Because, if the way you weight points based on distance is ideal, then there is never an advantage to a small "k" value. If the weighting is formulated right, the only reason to cap the "k" value is for performance reasons (and if a number of points is too slow for KNN output, it's also too high for Range Search anyway). --Rednaxela 19:18, 8 August 2011 (UTC)