I found out good test bed - 7 seasons against every second bot from roborumble (~2900 battles). It produces repeating results within +/- 0.05 APS and error against real RR within +/- 0.1 APS