how to build a good test bed?
I’m doing nearly the same thing now. I write knn data points and gfs to files, so all I do is just:
read data from file; add to tree; knn/kde; count inliers vs outliers. and I’m only doing knn/kde on firing waves.
However it takes me ~10min per generation with only 1500 tcrm battles.
My population size is also 20, and I’m also using 4 threads. It’s Core i7 with 4 cores at 2.6Ghz, so it should be even faster than i5-2410M which has only 2 cores.
Are you reading data and adding to tree at the same time, or reading data to memory in one go and adding to tree then?
It was read a line, add to tree, and if it was a firing tick do a prediction. For parallelization I just started a new thread for each bot, and join the thread when the bot is processed. It would probably br a bit faster with a thread pool.
Unfortunately I think I lost this code, I think it was on my University computer...
I'm even using thread pool & nio for potentially faster execution. Maybe 5000 roborumble battles should not take 3x time as 1500 tcrm battles, as the rumble contains a lot of easy targets which get destroyed in a second. I'll experiment later.
Btw, my crossover code is not simply doing some gradient descent, but rather do gradient descent or use the weight from one parent directly, based on some random process. Random noise is also added on small probability though. I think this process explores more possible searching space than simply gradient descent + random component. As my experience, the searching space of knn weights is non-trivial, although some pattern exists for most good weights.
one more question: how many generations are you generally using?
For me, 10 generations produces result good enough, and increasing it further to 100 doesn’t improve much.
However, it seems that 1500 tcrm battles suffers from overfitting a lot, and I’m trying full rumble now.
Each time I collect data & do genetic tuning with 1500 tcrm battles, the hit rate increases from ~16% to ~17%, however actual tcrm score even decreases sometimes.
It depended on the population size and the sampling strategy I used. If I used larger population and less converging sampling strategy then I could run up to about 100 generations before it would converge.
And I think the solution space is very non-convex with lots of local minima, I ran quite a few simulations and it converged to different solutions each time.