Cache effects on benchmark
The highlighted comment was created in this revision.
I was thinking that running *just* the benchmark vs a single tree at a time would result in the KD-Tree code and data being cached quite a bit better than would be realistic for real-life situations. Perhaps it would make more sense to run all of the trees at the same time, giving each tree the new search/data one tree after the next. This would simulate the cache thrashing between turns that happens when you are running multiple robots at the same time, because the trees would be competing with each other for cache space.
Thoughts?
The reason I ask is that I designed/wrote a tree to deal with the cache problem. It outperforms Rednaxela Gen2 on large (2mil points, 12 dim) random datasets by ~2X, but ties on smaller (30k point, 12 dim) datasets. I think the fact that for the small dataset the entire thing is in cache might be causing the difference.
Maybe using a reference bot would make benchmarking more meaninful?
Pick one bot, put each tree inside it, one at a time, and run it against a 1v1 test bed.
Running them all at the same time could make sense, but I would suggest being if you do that, because the order that they get run in may matter. Even when running them one at a time in sequence, I've recall noticing that the order in which they are run could very slightly impact the apparent performance, I suspect due to caching, JIT, and/or garbage collection characteristics. It's been a while, but IIRC the System.gc() call I have in there between running different trees was to lessen that effect.
Cache performance is one of those things that's tricky with Robocode, because your robot is also sharing the CPU with another bot which could be doing who knows that with it's memory accesses. For that reason I wouldn't trust optimizations for better caching behavior to necessarily pan out in practice with bots. I may be wrong about that though.
One could put it in a reference bot yeah, though the tests would be far slower and less consistent, thus requiring a much greater number of test iterations to have a reliable result.