Difference between revisions of "User talk:Pedersen/kdTree"
(kdTree suffering) |
(comment on what's perhaps wrong) |
||
Line 1: | Line 1: | ||
I don't have a application for a tree yet (though conceivably it would work with a gun if I bothered to develop weights for the various dimensions, after determining what those dimensions should be) but I figured it would be fun to whip one up. It turns out it was pretty quick to crank out, and after a little debugging I not only had blazing fast speeds at high volumes, faster by far than the samples that came with the benchmark at about 5000 data points, I found that at about that range I started dipping below 100% accuracy. I have spent most of today, longer than it took to write the first draft, trying to find out what is going wrong. As the day winds to a close, I've found that I can get it to bleed accuracy by setting the capacities really low, which will also help with debugging (in terms of diagramming out in code and by hand). When I set the dimensions to 2, neighbors to 10, and sample size to 50, along with a bucket size of 4, I caused it to hit 90% accuracy. My time is up today though, so I'll have to proceed with debugging another evening. --[[User:Pedersen|Martin]] 00:27, 6 March 2010 (UTC) | I don't have a application for a tree yet (though conceivably it would work with a gun if I bothered to develop weights for the various dimensions, after determining what those dimensions should be) but I figured it would be fun to whip one up. It turns out it was pretty quick to crank out, and after a little debugging I not only had blazing fast speeds at high volumes, faster by far than the samples that came with the benchmark at about 5000 data points, I found that at about that range I started dipping below 100% accuracy. I have spent most of today, longer than it took to write the first draft, trying to find out what is going wrong. As the day winds to a close, I've found that I can get it to bleed accuracy by setting the capacities really low, which will also help with debugging (in terms of diagramming out in code and by hand). When I set the dimensions to 2, neighbors to 10, and sample size to 50, along with a bucket size of 4, I caused it to hit 90% accuracy. My time is up today though, so I'll have to proceed with debugging another evening. --[[User:Pedersen|Martin]] 00:27, 6 March 2010 (UTC) | ||
+ | |||
+ | The fact that accuracy bleeds as bucket capacity goes down, suggests to me that something is wrong with the logic for when to descend a branch or not is broken but the logic for processing points within a leaf is fine. Chances are you'll see accuracy decrease much further if you increase the count of fetched neighbors while keeping bucket size constant. Given the results you've said, it's quite possible that it is never descending into 'second choice' paths ever in fact. --[[User:Rednaxela|Rednaxela]] 01:10, 6 March 2010 (UTC) |
Revision as of 02:10, 6 March 2010
I don't have a application for a tree yet (though conceivably it would work with a gun if I bothered to develop weights for the various dimensions, after determining what those dimensions should be) but I figured it would be fun to whip one up. It turns out it was pretty quick to crank out, and after a little debugging I not only had blazing fast speeds at high volumes, faster by far than the samples that came with the benchmark at about 5000 data points, I found that at about that range I started dipping below 100% accuracy. I have spent most of today, longer than it took to write the first draft, trying to find out what is going wrong. As the day winds to a close, I've found that I can get it to bleed accuracy by setting the capacities really low, which will also help with debugging (in terms of diagramming out in code and by hand). When I set the dimensions to 2, neighbors to 10, and sample size to 50, along with a bucket size of 4, I caused it to hit 90% accuracy. My time is up today though, so I'll have to proceed with debugging another evening. --Martin 00:27, 6 March 2010 (UTC)
The fact that accuracy bleeds as bucket capacity goes down, suggests to me that something is wrong with the logic for when to descend a branch or not is broken but the logic for processing points within a leaf is fine. Chances are you'll see accuracy decrease much further if you increase the count of fetched neighbors while keeping bucket size constant. Given the results you've said, it's quite possible that it is never descending into 'second choice' paths ever in fact. --Rednaxela 01:10, 6 March 2010 (UTC)