Initialization Code Runtime Reduction Effort
The highlighted comment was created in this revision.
Here I am going to post information on CPU performance of configuration, construction, drive, gun, and radar. Configuration is one time setting of parameters at beginning of first round. Construction is construction of the scenarios, drives, guns, etc. and loading them into my component chain, also a one time event at beginning of first round. Load stats is loading the previous set of statistics from disk so they can be updated and written back out at the end of the battle. Drive, gun, and radar times are averages and peaks over every tick for the entire battle.
In all I ran averages against 10 seasons.
12.2 is XanderCat 12.2, while 12.3 is the development version of XanderCat 12.3 with whatever CPU performance improvements I can make. Originally I was going to focus on trying to improve initialization, but since turn 0 apparently wasn't getting skipped, I decided to focus on drive and gun improvements instead.
12.2 Normal | 12.2 Shielding | 12.3 Normal | 12.3 Shielding | |
---|---|---|---|---|
Opponent | Tron | Virus | Tron | Virus |
Configure AVG | 0.454 | 0.478 | 0.459 | 0.469 |
Construction AVG | 1.304 | 1.353 | 1.306 | 1.310 |
Load Stats AVG | 3.730 | 3.686 | 3.690 | 3.321 |
Drive AVG | 0.478 | 0.031 | 0.299 | 0.016 |
Drive P1 | 12.51 | 5.25 | 7.34 | 3.82 |
Drive P2 | 11.91 | 4.70 | 6.95 | 3.18 |
Drive P3 | 11.46 | 4.55 | 6.81 | 2.99 |
Gun AVG | 0.465 | 0.153 | 0.267 | 0.082 |
Gun P1 | 7.12 | 7.17 | 5.23 | 5.84 |
Gun P2 | 6.03 | 5.13 | 4.44 | 4.09 |
Gun P3 | 5.65 | 4.26 | 3.77 | 2.66 |
Radar AVG | 0.0019 | 0.0018 | 0.0018 | 0.0018 |
Radar P1 | 0.10 | 0.28 | 0.07 | 0.35 |
Radar P2 | 0.06 | 0.14 | 0.05 | 0.05 |
Radar P3 | 0.04 | 0.04 | 0.03 | 0.04 |
I hadn't thought about it previously, but I can't help but wonder if the loading of battle statistics is one of the bigger problems with skipping turns at the beginning of the first round. On my system, the CPU hit is not all that dramatic, but it could be worse on other systems. Maybe I could defer loading the battle stats until the end of the battle? They are not really needed until the end. How much time does Robocode give at the end of battle for final processing like writing to files?
I can think of 2 factors which can mess up skipped turns. Dynamic overclocking and excessive amounts of client instances.
- Dynamic overclocking changes CPU speed based on load. If the Robocode engine CPU constant is calibrated for an overclocked CPU, then in the beginning of a battle, the reduced clock will provoke skipped turns.
- Too many client instances make them interfere with each other. Sometimes clients use 2 threads, sometimes only 1. When they use 2 and there isn't any idle core, they steal CPU from another instance and the slower processing speed can provoke skipped turns.
Some thoughts...
- For what it's worth, I tried to generate different CPU constants based on whether it was dynamic overclocking or not, but wasn't able to. Maybe the trigger threshold is well below what a single Robocode instance or the CPU constant calculation uses on my machine.
- While Robocode does use multiple threads - you might see 50% on two cores or 100% on one core - it only uses 1 at a time, so I'm not sure it's ever really stealing CPU.
- Since a lot of modern multi-core processors include hyperthreading, there should be some extra buffer such that using 1 Robocode per core is ok.
To me, the bigger problems are:
- Measuring CPU time for a single tick in nanoseconds is not nearly as accurate as we need it to be.
- There's always other system stuff that could use some CPU. Combine with the lack of accuracy and it's really dangerous to penalize for taking too long over a timeframe of just one tick.
I don't know, I agree that dynamic overclocking is theoretically a concern, but from what I can remember, skipped turns have been quirky since long before we all had huge multi-core machines with hyperthreading and dynamic overclocking.
Quirky like skipping more turns vs an opponent that uses a lot of CPU (but doesn't skip turns!), or skipping lots more turns in the first 50-100 ticks of a battle. Obviously you do some initialization in the first tick, but shouldn't the subsequent ticks be mostly unaffected by that? Seems like that was never the case and still isn't. I guess garbage collection being outside of the scope of Robocode's vision is the most likely culprit for all this. But it's pretty frustrating, in any case.
That's a good point I wasn't even thinking about. Looking at the log Wompi posted again, I can see that all setup was completed on turn 0, and turn 0 wasn't skipped.
Voidious -- did you check to see if XanderCat appeared to be skipping a lot of turns on your Ubuntu/OJDK machine?
I can try to improve the CPU performance of my wave surfing drive and guess factor guns, and that might help, but it doesn't really explain the huge number of skipped turns on the first round. In the meantime, if skipped turns is a problem on Voidious' machine, I suppose for now I could just modify my bullet shielding system to try to account for skipped turns.
Remembered a 3rd possible cause of skipped turns. The JVM executes all code in interpreted mode for a while until JIT compiler kicks in. And at least in Sun/Oracle JVMs, the default JIT mode, client or server, varies with OS.
Hmm... frustratingly, XanderCat seems to have a higher success rate vs Virus (the most problematic on my system) when running through the UI. I did catch one 56% score with only 2 skipped turns, though, which raises some doubt that skipped turns are the issue (or only issue). Running some more single-threaded seasons with RoboRunner now, in case it's something like a fresh reboot helped (after testing in Windows earlier).
I thought once about Robocode being more multi-thread friendly if the engine called Thread.yield() in strategic places, like between ticks. But I never posted this suggestion in sourceforge.
This way, most processing outside a Robocode instance would be done between ticks and not interfere with turn skipping.
About dynamic overclocking, here I delete config/robocode.properties and run a single instance, which will measure CPU constant at minimum possible speed. Then I copy the file to all other installations. Tried to disable dynamic overclocking as well, but my laptop doesn't have the option in the BIOS.
Hi mates.
We had once a quite long discussion about skipped turns (Skipped Turns) and I remembered this gc-tuning page. I played around with some of the gc settings and it generated a very different robocode experience. To sad, I don't have the time right now to provide some serious tests but maybe it gives someone a hunch to find some appropriate settings for the gc.
One other thing, do you look out for 'hidden' skipped turns or just take the skipped turn events?
My overall guess would be, that initialization is not a big problem and could easily detected if you spot skipped turns at the start of the round (I remember DrussGT had once an issue where he skipped the first 10+ turns). It is more likely that bots generate a bunch of initialization objects and this bunch of objects get garbage collected a couple of turns later (lets say turn 30+) and that would be the time where it hurts most. Of course this is just a wild guess and i could completely wrong but it could be one explanation why quite a few bots have issues with skipped turns mainly within the first round.
I think another reason the first round is difficult is that the JIT hasn't yet optimized code, but also a few ticks later the JIT figures out which code needs optimization and then starts running the compiler in a parallel thread. This competes with the robot time, so not only do you not yet have optimized code, but the time is also being shared with the JIT compiler. This would also explain skipping turns starting a few ticks in instead of on the first turn. Also, the more code you have that gets run regularly, the more the JIT will try to optimize, so it will take longer and possibly cause more skipped turns.
Well, I couldn't hold it and had to run some tests.
I tested all three collectors on XanderCat 12.3 vs Virus and Diamond vs DrussGT
-XX:+UseSerialGC
DvsD: both bots dropped turns like crazy and DrussGT almost ever won with 64% APS
XvsV: XanderCat dropped constantly turns over all battle rounds (sightly more on the first two rounds) and lost quite a few rounds against Virus
-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=1
DvsD: both bots dropped even more turns than with the SerialGC - DrussGT still won with 64+% APS
XvsV: XanderCat dropped a lot turnes in the first round and some in the second - all other rounds had no skipped turns (because of the crummy first round he took the wrong movement and lost a bunch of APS vs Virus)
-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=30
DvsD: less skipped turns for both but still to much - DrussGT still on 64% APS
XvsV: no changes to 1ms pause
-XX:+UseConcMarkSweepGC
DvsD: DrussGT drops a turn here and there - Diamond still drops a lot turns but with a lower frequency - DrussGT wins with 50-55% APS
XvsV: XanderCat stops dropping turns at all (just a few here and there) and wins against Virus with 99%
I also changed some other ratio settings but could not see any visible changes to the overall behavior. To me the concurrent collector looks quite promising and I think I play a litlte with the CMSIncremental.. options to see if there is still room for some improvements.
Very interesting. Thanks again for all the testing. Even if the issue can be fixed by GC settings, maybe I should look to see if I can make my robot more environmentally friendly and stop producing so much garbage? My first thought turns to the way I pick data out of my KD trees (because I sloppily move all data points from a MaxHeap into a List in order to stick to my existing interface, and that List gets created and thrown away on every tree read), but I'm not sure that would help on the first round issue.
Wow! Great research here. I wonder if you enable incremental mode for the concurrent mark sweep GC using -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
how that would work. Also, the concurrent collector uses additional cores to do the work, so might not work so well unless there are cores free. Of course, designing a bot so that it is resistant to skipped turns, and uses little CPU and memory, is probably the most important aspect of all.
Uh, I wouldn't change anything in this direction. It was just a quick test (<10 seasons) and I doubt it solves anything. To me it just shows a dependency and might worth further test. I'm still convinced that another skipped turn handling (like Voidious suggested) would be far better. Maybe raising the cpu variable for rumble clients would be a workaround - but this would need that everyone develop his bots within the normal cpu range and not abuse this workaround. Probably not a good idea but like many others I get more and more annoyed by the skipped turns because it complicates even very simple tasks.