Difference between revisions of "Talk:K-NN algorithm benchmark"

From Robowiki
Jump to navigation Jump to search
(reply to Nat)
(Spoke too soon)
Line 215: Line 215:
  
 
Yeah, the 10 test result isn't very precise I know, I'm going to run a longer one, but first I'm going to modify the test to use ThreadMXBean for timing, to ensure other things running on the system to not have a significant impact on the benchmark. And yes, that worst case time isn't too great but I have ideas of how to improve it notably hopefully. Additionally, that test above was with a bucket size of 50, and I've since found that a bucket size of 20 improves the result quite notably. I think there are still plenty of ways I can make it faster. My secret though? TAoR = Total Avoidance of Recursion :) --[[User:Rednaxela|Rednaxela]] 12:14, 20 August 2009 (UTC)
 
Yeah, the 10 test result isn't very precise I know, I'm going to run a longer one, but first I'm going to modify the test to use ThreadMXBean for timing, to ensure other things running on the system to not have a significant impact on the benchmark. And yes, that worst case time isn't too great but I have ideas of how to improve it notably hopefully. Additionally, that test above was with a bucket size of 50, and I've since found that a bucket size of 20 improves the result quite notably. I think there are still plenty of ways I can make it faster. My secret though? TAoR = Total Avoidance of Recursion :) --[[User:Rednaxela|Rednaxela]] 12:14, 20 August 2009 (UTC)
 +
: Hm, spoke too soon. It seems that for this the resolution provided by ThreadMXBean is simply insufficent. In any case I'll be running more tests, and further improving my tree... :) --[[User:Rednaxela|Rednaxela]] 12:28, 20 August 2009 (UTC)

Revision as of 13:28, 20 August 2009

(Continue from Talk:Kd-Tree#Bucket_PR_k-d_tree)

Thank you, I have changed my local implementation to fixed your. Just found some bug in the benchmark, all trees except Voidious' and Rednaxela's does full Euclidian distance. Now it use squared distance, but the time didn't effect much. And just found some bug in my tree that make it run a little faster, or not? Anyway, my benchmarks is now running... with large input set(15,400000,150) and 50 tests at once. Will post new result and upload new version soon. It is now fully automated and get the input size/dimensions/neighbours from argv. » Nat | Talk » 15:03, 17 August 2009 (UTC)

NAT'S K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
------------------------------------------------
Running 50 test(s) for k-nearest neighbours search:
:: 15 dimension(s); 400000 data point(s); 150 neighbour(s)


Running tests... COMPLETED.

RESULT << k-nearest neighbours search with flat/linear searching >>
: Averaged used time              = 478.0353 miliseconds
: Average adding time             = 1.359 microseconds
: Average last node adding time   = 2.406 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 776.3058 miliseconds
: Best case used time             = 423.6457 miliseconds

RESULT << k-nearest neighbours search with Rednaxela's k-d tree >>
: Averaged used time              = 318.4575 miliseconds
: Average adding time             = 14.808 microseconds
: Average last node adding time   = 13.042 microseconds
: Averaged  accuracy              = 83%
: Worst case used time            = 453.7179 miliseconds
: Best case used time             = 157.7133 miliseconds

RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
: Averaged used time              = 78.3186 miliseconds
: Average adding time             = 3.446 microseconds
: Average last node adding time   = 4.113 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 247.4606 miliseconds
: Best case used time             = 41.4053 miliseconds

RESULT << k-nearest neighbours search with Nat's Bucket PR k-d tree >>
: Averaged used time              = 193.9578 miliseconds
: Average adding time             = 3.556 microseconds
: Average last node adding time   = 3.515 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 805.729 miliseconds
: Best case used time             = 41.7864 miliseconds

RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
: Averaged used time              = 83.3344 miliseconds
: Average adding time             = 4.1 microseconds
: Average last node adding time   = 3.832 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 96.4601 miliseconds
: Best case used time             = 79.8629 miliseconds


BEST RESULT: 
 - #1 Simonton's Bucket PR k-d tree [78.3186]
 - #2 Voidious' Bucket PR k-d tree [83.3344]
 - #3 Nat's Bucket PR k-d tree [193.9578]
 - #4 Rednaxela's k-d tree [318.4575]
 - #5 flat/linear searching [478.0353]

Even though Simonton's is the fastest averaged, it has quite large worst case. Voidious' the best here. And it seems that my tweak make it slower... My worst case is larger than linear worst case. This test run for 15 minutes on my machine so if you run it yourself please be patient. » Nat | Talk » 15:16, 17 August 2009 (UTC)

It might be more useful to use 6-10 dimensions and 30-50 neighbors, as I think that's more common for DC guns. With 15/150, the brute force is not even that slow, and would probably outperform the kd-trees in a normal battle (~25,000 data points). Over 400,000 data points with 6 dimensions / 30 neighbors, the kd-tree will be waaay faster than brute force. Also, I wonder what Simonton is using for bucket size, as that impacts the speed, too. It might be worth modifying the Bucket PR kd-trees to use the same bucket size for fair comparisons. Glad to see mine is looking reliable. =) --Voidious 15:32, 17 August 2009 (UTC)

One note about bucket size: The optimal bucket size would depend on implementation details somewhat, so I think that if they are made to use the same bucket size, it should be tested at a variety of different bucket sizes. Also, not that I care much for my old/crappy/inefficient/inaccurate tree, but it is also a bucket variant Nat, despite what it's title in the tests implies. --Rednaxela 15:42, 17 August 2009 (UTC)

Oh, I forget that. I use bucket size of 8 in Simonton's tree. I'll re-run it this evening. Note that 478ms is only half a second! But I wonder, I haven't checked yet, which distance does Rednaxela's tree use? I'll change my tree to binary, change to 8 buckets, check Voidious' and Rednaxela's (is Rednaxela's bucketPR or just plain K-d tree?) and apply final speed update to my tree and run the test again with suggested neighbours/dimensions. » Nat | Talk » 07:18, 18 August 2009 (UTC)

ARGH!!!

  • Simonton's : 8 buckets since he told that bucket size of 8-16 is the best.
  • Voidious's : 32
  • Rednaxela's : 20
  • Nat's : 22 due a bit of test result with difference m-ary and bucket size.

Will change to 10 for all. Result next. » Nat | Talk » 11:47, 18 August 2009 (UTC)

Having problem =( Trying to do configurable bucket size, and result in memory leak plus several exception and main NPE with Rednaxela's... I think I just revert and just change a constant...

Here is result from 10 tests:

NAT'S K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
------------------------------------------------
Running 10 test(s) for k-nearest neighbours search:
:: 8 dimension(s); 400000 data point(s); 45 neighbour(s)


Running tests... COMPLETED.

RESULT << k-nearest neighbours search with flat/linear searching >>
: Averaged used time              = 503.8392 miliseconds
: Average adding time             = 1.379 microseconds
: Average last node adding time   = 2.409 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 776.594 miliseconds
: Best case used time             = 410.977 miliseconds

RESULT << k-nearest neighbours search with Rednaxela's k-d tree >>
: Averaged used time              = 12.7223 miliseconds
: Average adding time             = 15.628 microseconds
: Average last node adding time   = 10.357 microseconds
: Averaged  accuracy              = 47%
: Worst case used time            = 46.5419 miliseconds
: Best case used time             = 5.269 miliseconds

RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
: Averaged used time              = 3.3432 miliseconds
: Average adding time             = 3.734 microseconds
: Average last node adding time   = 3.422 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 9.7065 miliseconds
: Best case used time             = 1.2692 miliseconds

RESULT << k-nearest neighbours search with Nat's Bucket PR k-d tree >>
: Averaged used time              = 64.9766 miliseconds
: Average adding time             = 3.485 microseconds
: Average last node adding time   = 3.289 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 603.5304 miliseconds
: Best case used time             = 2.4253 miliseconds

RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
: Averaged used time              = 6.5825 miliseconds
: Average adding time             = 4.049 microseconds
: Average last node adding time   = 3.54 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 14.8861 miliseconds
: Best case used time             = 3.835 miliseconds


BEST RESULT: 
 - #1 Simonton's Bucket PR k-d tree [3.3432]
 - #2 Voidious' Bucket PR k-d tree [6.5825]
 - #3 Rednaxela's k-d tree [12.7223]
 - #4 Nat's Bucket PR k-d tree [64.9766]
 - #5 flat/linear searching [503.8392]

Benchmark running time: 67.82 seconds

Sorry, I didn't have enough patient to run it for enough test. But, just thinking, this is "K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK", not "BUCKET PR K-D TREE BENCHMARK" so I think I will use the default configuration for each tree. Any comment? » Nat | Talk » 12:21, 18 August 2009 (UTC)

I'd say that the trees should be kept in default configuration, unless a provably better configuration for that tree is found, in which case that should be used and documented. What configuration (i.e. bucket size) is best depends on the particular implementation, and chances are the default configuration is decent, but if we find a better configuration for a particular one there's no reason not to use it and let the author know. --Rednaxela 14:39, 18 August 2009 (UTC)

Since I kept a whole lot of points in both the tree and array in my KNNRunner, last night I run the test before I slept so my computer didn't run any other programs. Today, now, I run a lot of other programs/applications so... I just run another test and get OutOfMemoryError with -Xmx1536M... I wonder if in my last change with my tree create some memory leak... I couldn't identify it anyway... But overall I have finished my own worked K-d tree so I think I will just use Voidious' in my next robot. I've already Learned this algorithm anyway. » Nat | Talk » 12:42, 18 August 2009 (UTC)

Hmm... well I have a better tree almost ready. There are still some considerable improvements I still plan to planned to make, but I'm pretty sure it currently outperforms anything else that's been benchmarked here... Full benchmark results pending. Hint: TAoR :) --Rednaxela 06:48, 20 August 2009 (UTC)

Oh, just as I finish typing the benchmark finishes:

NAT'S K-NEAREST NEIGHBOURS ALGORITHMS BENCHMARK
------------------------------------------------
Running 10 test(s) for k-nearest neighbours search:
:: 8 dimension(s); 400000 data point(s); 45 neighbour(s)


Running tests... COMPLETED.

RESULT << k-nearest neighbours search with flat/linear searching >>
: Averaged used time              = 656.5265 miliseconds
: Average adding time             = 3.289 microseconds
: Average last node adding time   = 4.476 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 1068.1753 miliseconds
: Best case used time             = 460.1359 miliseconds

RESULT << k-nearest neighbours search with Rednaxela's k-d tree >>
: Averaged used time              = 24.5219 miliseconds
: Average adding time             = 14.996 microseconds
: Average last node adding time   = 10.993 microseconds
: Averaged  accuracy              = 60%
: Worst case used time            = 84.1178 miliseconds
: Best case used time             = 5.0381 miliseconds

RESULT << k-nearest neighbours search with Simonton's Bucket PR k-d tree >>
: Averaged used time              = 14.1402 miliseconds
: Average adding time             = 6.581 microseconds
: Average last node adding time   = 6.097 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 64.0482 miliseconds
: Best case used time             = 1.9734 miliseconds

RESULT << k-nearest neighbours search with Nat's Bucket PR k-d tree >>
: Averaged used time              = 36.161 miliseconds
: Average adding time             = 5.812 microseconds
: Average last node adding time   = 5.377 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 168.6711 miliseconds
: Best case used time             = 3.9539 miliseconds

RESULT << k-nearest neighbours search with Voidious' Bucket PR k-d tree >>
: Averaged used time              = 8.8872 miliseconds
: Average adding time             = 7.051 microseconds
: Average last node adding time   = 7.892 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 20.6054 miliseconds
: Best case used time             = 3.1616 miliseconds

RESULT << k-nearest neighbours search with Rednaxela's BETTER tree >>
: Averaged used time              = 8.4796 miliseconds
: Average adding time             = 5.032 microseconds
: Average last node adding time   = 6.467 microseconds
: Averaged  accuracy              = 100%
: Worst case used time            = 48.2494 miliseconds
: Best case used time             = 1.6761 miliseconds


BEST RESULT: 
 - #1 Rednaxela's BETTER tree [8.4796]
 - #2 Voidious' Bucket PR k-d tree [8.8872]
 - #3 Simonton's Bucket PR k-d tree [14.1402]
 - #4 Rednaxela's k-d tree [24.5219]
 - #5 Nat's Bucket PR k-d tree [36.161]
 - #6 flat/linear searching [656.5265]

--Rednaxela 06:50, 20 August 2009 (UTC)

If you don't mind, I would like to see 100 or 50 tests result. But still 48.2494 > 20.6054. I have a strange habit to choose the algorithm by its worst case (so I prefer merge sort than quick sort =)). Anyway, its best case is better than Simonton's! Congrats. Do you mind sharing your secret? Oh! I just realised that milli- has 2 t... » Nat | Talk » 11:24, 20 August 2009 (UTC)

Yeah, the 10 test result isn't very precise I know, I'm going to run a longer one, but first I'm going to modify the test to use ThreadMXBean for timing, to ensure other things running on the system to not have a significant impact on the benchmark. And yes, that worst case time isn't too great but I have ideas of how to improve it notably hopefully. Additionally, that test above was with a bucket size of 50, and I've since found that a bucket size of 20 improves the result quite notably. I think there are still plenty of ways I can make it faster. My secret though? TAoR = Total Avoidance of Recursion :) --Rednaxela 12:14, 20 August 2009 (UTC)

Hm, spoke too soon. It seems that for this the resolution provided by ThreadMXBean is simply insufficent. In any case I'll be running more tests, and further improving my tree... :) --Rednaxela 12:28, 20 August 2009 (UTC)