ThreadDeath problem and large amount of skipped turns

Fragment of a discussion from Talk:RoboRumble
Jump to navigation Jump to search

It's interesting, I actually see considerable speed difference between running at 1000TPS and max, even though my CPU constant is 5.3ms. I think this is because many ticks have much less than 1ms of calculations, and these cause a sleep in 1000TPS mode. Also, just to confirm, I also only see the ThreadDeath issues when running at max, and never at 1000TPS.

From what I remember the engine works something like this:

  1. Engine triggers condition variable to release bot thread
  2. Engine goes to sleep for up to CPU_CONSTANT milliseconds
  3. Bot thread runs
  4. Bot thread finishes, triggers interrupts to wake up engine thread early
  5. Engine thread wakes up, and checks time. If time exceeded, divide time taken by CPU_CONSTANT milliseconds, subtract 1, say bot skipped this many turns, if exceeds 30 kill bot. Otherwise continue as normal.

So I'm wondering if maybe there is another thread that is delaying the engine thread from waking up, perhaps to do some maintenance (garbage collection, JIT compiler swapping out methods with optimized versions, etc). There are probably heuristics that say it is a good time to do maintenance, when switching between threads, if nothing else is available. When in 1000TPS mode these actions would happen when the sleep happens (ie. nicely scheduled, not interfering with bot timing etc), since the heuristics say it is better to work in a sleep than to not switch between threads quickly.

In my mind, the easiest way to fix this would be by also doing timing in the bot thread, and only checking the timing in the engine thread if the bot thread hasn't finished yet.

Thoughts? Does anybody see issues with my thinking here?

Skilgannon (talk)20:54, 7 September 2017

It does not look that item 5 is performed the way which would be reasonable (i.e. how you describe).

Have a look checkSkippedTurn() where decision about penalty is done (I believe it is actually your code :). It does not check CPU time, it makes comparison based on the internal robocode Ticks.

 int numSkippedTurns = (currentExecutionTime - lastExecutionTime) - 1;

Robocode should call something to increase time (tic) inside of robot peer. If it does not do so for a bot, that bot will be punished. I still cannot find the part of the code where time++ logic is executed. These threads drive me nuts.

So, I like your proposal to time the bot inside its thread.

Beaming (talk)21:45, 7 September 2017

Hmm, there is still a problem with my idea, robocode will still kill the thread if it doesn't respond in time.

I see two different ways to combat this:

  1. Put a turnState enum with states {START, RUNNING, FINISHED} which can be polled to know if the bot thread is finished, combine this with timing in the bot thread, and then if it is finished we know it is not the bot causing the delay, so we don't kill the thread.
  2. Put some small sleeps every ~100 ticks to give the JVM time to perform optimizations and cleanup without interfering with the bot threads.

These could be combined, since they attack the problem from different sides.

Skilgannon (talk)22:20, 7 September 2017