how to build a good test bed?
Still not quite, because it uses a population like GA does, and used linear combinations between the population to estimate gradient similarly to how gradient descent would. Honestly, there were probably better/faster algorithms that would have worked better out-the-box, but this worked fine, it just took a bit longer.
Well, this combination sounds great, and it is more like how I'm tuning weights by hand than traditional GAs. And this way it should work way better than hand, as it's running way more battles with way more population.
And it's way faster (and also with less deviation) with recorded battles. The only problem is overfitting the recorded battles, but that should be solved well with many tune–rerecord iterations.
Anyway, I'm still wondering about — will it forget the previous tune–rerecord iterations to overfit new iterations? Anyway, since it sounds more like metric learning, it won't surprise me if this one is different. Did you experiment rerunning the old battles after tuning for newer ones to see that?