20MB is too small. I generally record 2G of data via roborunner, 4 robocodes with 500M each.
I’m not experiencing data truncation. I’m using a worker thread that logs data asynchronously with java.nio FileChannel. However OutputStream API should be enough and you shouldn’t experience data truncation anyway. Where do you do file writing? Did you flush the higher level stream when it’s done? If you don’t do, robocode will close the lower level ones, resulting lost of data.
- "Did you flush the higher level stream when it’s done?" I really don't have any idea about its meaning =(
- How long does a generation take with 2G data? Even When I do not fill the quota a single generation takes about 30 seconds with a population size of 102.
- I use the compressed serialization method in the wiki.
- Edit: Data truncation problem just disappeared after I restarted my computer.
2G of data takes me 5s (4 threads in parallel), which is 1NN with less than 5 attributes which should be lightning fast anyway.
Using all the waves (including virtual ones) and use maxK=100 with a 10+ attributes huge tree takes me less than a minute (still 4 threads in parallel).
I'm using NIO for file reading, and I use handmade serialization instead of the java builtin one, which the secret to speed.
- 5 seconds?? I just started using 4 threads and it takes 11 seconds with 1.4 MB's of data without virtual waves, max K 100 and 102 population size.
- What is your fitness function? Mine perfectly simulates WhiteFang's targeting including bot width calculations. I don't think the 51 bin system slows down the robot since it should just be faster as long as I have K more than 51.
- I convert all the data into ArrayLists so file reading speed shouldn't affect much(Or the memory it takes slows it down?).
It's 1NN with only firing waves. It seems that kd-tree is the only slow part.
Worth mention that I already store everything slow to file, e.g. precise intersection, precise mea etc. So all I do is load those attributes, transform with my formula, load into tree and do kde for every firing wave.
Anyway this can be considered as 1 population and 1 generation, as I'm tuning it by hand yet.
- OK, now I understand. I was afraid that I had a big flaw in the algorithm that made it slow. What I learned is genetic algorithm always works better than manual tuning in the long run. What I do when it ends is to roll the numbers that are really high and low to the max/min values and then I get about a 1% boost in score which easily surpasses the hand tuning. Since only my GA is multi-threaded hand tuning is a little slower too.
- One final question, where do the files I save go on RoboRunner-GUI? I didn't even test(Std Me) before putting WhiteFang against 28 surfers for 10 seasons then I accidentally compiled the project(Me again).
Update: after some profiling, it confirmed that kd-tree is the only bottleneck.
However, it seems that file reading time grows as kd-tree time grows.
And after putting deserializaion into separate thread and use some producer-consumer pattern to communicate, total run time stays the same and file reading time decreased greatly. Maybe my profiling tool is yielding inaccurate result.