Thanks for the offer. Could you test DrussGT 3.2.1 vs. Garm, Shadow, Hydra and Gilgalad? Functionally, 3.2.1 should be almost equivalent to 2.8.16 (3.2 has 3-series bullet shielding disabled), but as this comparison shows they are getting very different scores.
There are also a few bots DrussGT now gets 100% against - this worries me, as it points to them crashing.
What should be the setting on a client to perform such test? Or should I use RoboRunner?
Also my observation showed that sometimes short test set shows drastically (1% APS) higher scores. I was fulled by it couple times thinking that my older bot is more superior to a new version. They end up to have the same score or even reversed as I keep waiting for stats accumulation.