Initialization Code Runtime Reduction Effort

Jump to navigation Jump to search

Initialization Code Runtime Reduction Effort

Edited by author.
Last edit: 06:04, 17 February 2013

Here I am going to post information on CPU performance of configuration, construction, drive, gun, and radar. Configuration is one time setting of parameters at beginning of first round. Construction is construction of the scenarios, drives, guns, etc. and loading them into my component chain, also a one time event at beginning of first round. Load stats is loading the previous set of statistics from disk so they can be updated and written back out at the end of the battle. Drive, gun, and radar times are averages and peaks over every tick for the entire battle.

In all I ran averages against 10 seasons.

12.2 is XanderCat 12.2, while 12.3 is the development version of XanderCat 12.3 with whatever CPU performance improvements I can make. Originally I was going to focus on trying to improve initialization, but since turn 0 apparently wasn't getting skipped, I decided to focus on drive and gun improvements instead.

12.2 Normal 12.2 Shielding 12.3 Normal 12.3 Shielding
Opponent Tron Virus Tron Virus
Configure AVG 0.454 0.478 0.459 0.469
Construction AVG 1.304 1.353 1.306 1.310
Load Stats AVG 3.730 3.686 3.690 3.321
Drive AVG 0.478 0.031 0.299 0.016
Drive P1 12.51 5.25 7.34 3.82
Drive P2 11.91 4.70 6.95 3.18
Drive P3 11.46 4.55 6.81 2.99
Gun AVG 0.465 0.153 0.267 0.082
Gun P1 7.12 7.17 5.23 5.84
Gun P2 6.03 5.13 4.44 4.09
Gun P3 5.65 4.26 3.77 2.66
Radar AVG 0.0019 0.0018 0.0018 0.0018
Radar P1 0.10 0.28 0.07 0.35
Radar P2 0.06 0.14 0.05 0.05
Radar P3 0.04 0.04 0.03 0.04
Skotty22:25, 16 February 2013

I hadn't thought about it previously, but I can't help but wonder if the loading of battle statistics is one of the bigger problems with skipping turns at the beginning of the first round. On my system, the CPU hit is not all that dramatic, but it could be worse on other systems. Maybe I could defer loading the battle stats until the end of the battle? They are not really needed until the end. How much time does Robocode give at the end of battle for final processing like writing to files?

Skotty22:30, 16 February 2013
 

I can think of 2 factors which can mess up skipped turns. Dynamic overclocking and excessive amounts of client instances.

- Dynamic overclocking changes CPU speed based on load. If the Robocode engine CPU constant is calibrated for an overclocked CPU, then in the beginning of a battle, the reduced clock will provoke skipped turns.

- Too many client instances make them interfere with each other. Sometimes clients use 2 threads, sometimes only 1. When they use 2 and there isn't any idle core, they steal CPU from another instance and the slower processing speed can provoke skipped turns.

MN22:49, 16 February 2013
 

You get extra processing time on ticks that have disk access, but I'm not sure how much.

Skilgannon22:56, 16 February 2013
 

Some thoughts...

  • For what it's worth, I tried to generate different CPU constants based on whether it was dynamic overclocking or not, but wasn't able to. Maybe the trigger threshold is well below what a single Robocode instance or the CPU constant calculation uses on my machine.
  • While Robocode does use multiple threads - you might see 50% on two cores or 100% on one core - it only uses 1 at a time, so I'm not sure it's ever really stealing CPU.
  • Since a lot of modern multi-core processors include hyperthreading, there should be some extra buffer such that using 1 Robocode per core is ok.

To me, the bigger problems are:

  • Measuring CPU time for a single tick in nanoseconds is not nearly as accurate as we need it to be.
  • There's always other system stuff that could use some CPU. Combine with the lack of accuracy and it's really dangerous to penalize for taking too long over a timeframe of just one tick.
Voidious22:59, 16 February 2013

Hmm, maybe I'm wrong and System.nanoTime() is accurate enough... [1]

Voidious23:09, 16 February 2013
 

I don't know, I agree that dynamic overclocking is theoretically a concern, but from what I can remember, skipped turns have been quirky since long before we all had huge multi-core machines with hyperthreading and dynamic overclocking.

Quirky like skipping more turns vs an opponent that uses a lot of CPU (but doesn't skip turns!), or skipping lots more turns in the first 50-100 ticks of a battle. Obviously you do some initialization in the first tick, but shouldn't the subsequent ticks be mostly unaffected by that? Seems like that was never the case and still isn't. I guess garbage collection being outside of the scope of Robocode's vision is the most likely culprit for all this. But it's pretty frustrating, in any case.

Voidious23:36, 16 February 2013

That's a good point I wasn't even thinking about. Looking at the log Wompi posted again, I can see that all setup was completed on turn 0, and turn 0 wasn't skipped.

Voidious -- did you check to see if XanderCat appeared to be skipping a lot of turns on your Ubuntu/OJDK machine?

I can try to improve the CPU performance of my wave surfing drive and guess factor guns, and that might help, but it doesn't really explain the huge number of skipped turns on the first round. In the meantime, if skipped turns is a problem on Voidious' machine, I suppose for now I could just modify my bullet shielding system to try to account for skipped turns.

Skotty00:48, 17 February 2013
 

Remembered a 3rd possible cause of skipped turns. The JVM executes all code in interpreted mode for a while until JIT compiler kicks in. And at least in Sun/Oracle JVMs, the default JIT mode, client or server, varies with OS.

MN03:58, 17 February 2013
 

Hmm... frustratingly, XanderCat seems to have a higher success rate vs Virus (the most problematic on my system) when running through the UI. I did catch one 56% score with only 2 skipped turns, though, which raises some doubt that skipped turns are the issue (or only issue). Running some more single-threaded seasons with RoboRunner now, in case it's something like a fresh reboot helped (after testing in Windows earlier).

Voidious05:15, 17 February 2013

If you want to play around with it, I went ahead and released XanderCat 12.3. It's basically the same as 12.2 but I fixed some CPU performance issues. The main drives and guns execute quite a bit faster now with much lower peaks. Maybe it will make a difference...maybe not.

Skotty07:26, 17 February 2013
 
 

I thought once about Robocode being more multi-thread friendly if the engine called Thread.yield() in strategic places, like between ticks. But I never posted this suggestion in sourceforge.

This way, most processing outside a Robocode instance would be done between ticks and not interfere with turn skipping.

About dynamic overclocking, here I delete config/robocode.properties and run a single instance, which will measure CPU constant at minimum possible speed. Then I copy the file to all other installations. Tried to disable dynamic overclocking as well, but my laptop doesn't have the option in the BIOS.

MN23:50, 16 February 2013
 

Hi mates.

We had once a quite long discussion about skipped turns (Skipped Turns) and I remembered this gc-tuning page. I played around with some of the gc settings and it generated a very different robocode experience. To sad, I don't have the time right now to provide some serious tests but maybe it gives someone a hunch to find some appropriate settings for the gc.
One other thing, do you look out for 'hidden' skipped turns or just take the skipped turn events?
My overall guess would be, that initialization is not a big problem and could easily detected if you spot skipped turns at the start of the round (I remember DrussGT had once an issue where he skipped the first 10+ turns). It is more likely that bots generate a bunch of initialization objects and this bunch of objects get garbage collected a couple of turns later (lets say turn 30+) and that would be the time where it hurts most. Of course this is just a wild guess and i could completely wrong but it could be one explanation why quite a few bots have issues with skipped turns mainly within the first round.

Wompi10:04, 17 February 2013
 

I think another reason the first round is difficult is that the JIT hasn't yet optimized code, but also a few ticks later the JIT figures out which code needs optimization and then starts running the compiler in a parallel thread. This competes with the robot time, so not only do you not yet have optimized code, but the time is also being shared with the JIT compiler. This would also explain skipping turns starting a few ticks in instead of on the first turn. Also, the more code you have that gets run regularly, the more the JIT will try to optimize, so it will take longer and possibly cause more skipped turns.

Skilgannon10:42, 17 February 2013
 

Well, I couldn't hold it and had to run some tests.

I tested all three collectors on XanderCat 12.3 vs Virus and Diamond vs DrussGT

-XX:+UseSerialGC

DvsD: both bots dropped turns like crazy and DrussGT almost ever won with 64% APS
XvsV: XanderCat dropped constantly turns over all battle rounds (sightly more on the first two rounds) and lost quite a few rounds against Virus

-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=1

DvsD: both bots dropped even more turns than with the SerialGC - DrussGT still won with 64+% APS
XvsV: XanderCat dropped a lot turnes in the first round and some in the second - all other rounds had no skipped turns (because of the crummy first round he took the wrong movement and lost a bunch of APS vs Virus)

-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=30

DvsD: less skipped turns for both but still to much - DrussGT still on 64% APS
XvsV: no changes to 1ms pause

-XX:+UseConcMarkSweepGC

DvsD: DrussGT drops a turn here and there - Diamond still drops a lot turns but with a lower frequency - DrussGT wins with 50-55% APS
XvsV: XanderCat stops dropping turns at all (just a few here and there) and wins against Virus with 99%

I also changed some other ratio settings but could not see any visible changes to the overall behavior. To me the concurrent collector looks quite promising and I think I play a litlte with the CMSIncremental.. options to see if there is still room for some improvements.

Wompi12:04, 17 February 2013

Very interesting. Thanks again for all the testing. Even if the issue can be fixed by GC settings, maybe I should look to see if I can make my robot more environmentally friendly and stop producing so much garbage? My first thought turns to the way I pick data out of my KD trees (because I sloppily move all data points from a MaxHeap into a List in order to stick to my existing interface, and that List gets created and thrown away on every tree read), but I'm not sure that would help on the first round issue.

Skotty15:23, 17 February 2013
 

Wow! Great research here. I wonder if you enable incremental mode for the concurrent mark sweep GC using -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode how that would work. Also, the concurrent collector uses additional cores to do the work, so might not work so well unless there are cores free. Of course, designing a bot so that it is resistant to skipped turns, and uses little CPU and memory, is probably the most important aspect of all.

Skilgannon16:54, 17 February 2013
 

Uh, I wouldn't change anything in this direction. It was just a quick test (<10 seasons) and I doubt it solves anything. To me it just shows a dependency and might worth further test. I'm still convinced that another skipped turn handling (like Voidious suggested) would be far better. Maybe raising the cpu variable for rumble clients would be a workaround - but this would need that everyone develop his bots within the normal cpu range and not abuse this workaround. Probably not a good idea but like many others I get more and more annoyed by the skipped turns because it complicates even very simple tasks.

Wompi16:55, 17 February 2013
 

I did a little reading on the garbage collector looking for information on how I could reduce my garbage collection footprint. Short of stripping things out of my robot permanently, I did come up with a few things I could potentially do to reduce the amount of garbage collection. I'm going to try to implement some of it, but it's just guesswork as to whether or not it will actually improve things or not.

Skotty02:30, 18 February 2013
 

The recommended garbage collector for real-time applications is the copy collector. Or the treadmill collector, which is a variation of the copy collector.

All JVMs have the copy collector, but only for young objects. You can make it being used for all objects by increasing the young generation size. Set -Xmn<value> to the highest value the JVM allows, like "-Xmn511M -Xmx512M" and the copy collector will become the main GC. There are other parameters as well, but I can't find them right now.

MN17:15, 17 February 2013