ThreadDeath problem and large amount of skipped turns

From RoboWiki
Fragment of a discussion from Talk:RoboRumble
Jump to: navigation, search

It's interesting, I actually see considerable speed difference between running at 1000TPS and max, even though my CPU constant is 5.3ms. I think this is because many ticks have much less than 1ms of calculations, and these cause a sleep in 1000TPS mode. Also, just to confirm, I also only see the ThreadDeath issues when running at max, and never at 1000TPS.

From what I remember the engine works something like this:

  1. Engine triggers condition variable to release bot thread
  2. Engine goes to sleep for up to CPU_CONSTANT milliseconds
  3. Bot thread runs
  4. Bot thread finishes, triggers interrupts to wake up engine thread early
  5. Engine thread wakes up, and checks time. If time exceeded, divide time taken by CPU_CONSTANT milliseconds, subtract 1, say bot skipped this many turns, if exceeds 30 kill bot. Otherwise continue as normal.

So I'm wondering if maybe there is another thread that is delaying the engine thread from waking up, perhaps to do some maintenance (garbage collection, JIT compiler swapping out methods with optimized versions, etc). There are probably heuristics that say it is a good time to do maintenance, when switching between threads, if nothing else is available. When in 1000TPS mode these actions would happen when the sleep happens (ie. nicely scheduled, not interfering with bot timing etc), since the heuristics say it is better to work in a sleep than to not switch between threads quickly.

In my mind, the easiest way to fix this would be by also doing timing in the bot thread, and only checking the timing in the engine thread if the bot thread hasn't finished yet.

Thoughts? Does anybody see issues with my thinking here?

Skilgannon (talk)19:54, 7 September 2017

It's interesting, I actually see considerable speed difference between running at 1000TPS and max, even though my CPU constant is 5.3ms. I think this is because many ticks have much less than 1ms of calculations, and these cause a sleep in 1000TPS mode. Also, just to confirm, I also only see the ThreadDeath issues when running at max, and never at 1000TPS.

From what I remember the engine works something like this:

  1. Engine triggers condition variable to release bot thread
  2. Engine goes to sleep for up to CPU_CONSTANT milliseconds
  3. Bot thread runs
  4. Bot thread finishes, triggers interrupts to wake up engine thread early
  5. Engine thread wakes up, and checks time. If time exceeded, divide time taken by CPU_CONSTANT milliseconds, subtract 1, say bot skipped this many turns, if exceeds 30 kill bot. Otherwise continue as normal.

So I'm wondering if maybe there is another thread that is delaying the engine thread from waking up, perhaps to do some maintenance (garbage collection, JIT compiler swapping out methods with optimized versions, etc). There are probably heuristics that say it is a good time to do maintenance, when switching between threads, if nothing else is available. When in 1000TPS mode these actions would happen when the sleep happens (ie. nicely scheduled, not interfering with bot timing etc), since the heuristics say it is better to work in a sleep than to not switch between threads quickly.

In my mind, the easiest way to fix this would be by also doing timing in the bot thread, and only checking the timing in the engine thread if the bot thread hasn't finished yet.

Thoughts? Does anybody see issues with my thinking here?

Beaming (talk)20:45, 7 September 2017

It's interesting, I actually see considerable speed difference between running at 1000TPS and max, even though my CPU constant is 5.3ms. I think this is because many ticks have much less than 1ms of calculations, and these cause a sleep in 1000TPS mode. Also, just to confirm, I also only see the ThreadDeath issues when running at max, and never at 1000TPS.

From what I remember the engine works something like this:

  1. Engine triggers condition variable to release bot thread
  2. Engine goes to sleep for up to CPU_CONSTANT milliseconds
  3. Bot thread runs
  4. Bot thread finishes, triggers interrupts to wake up engine thread early
  5. Engine thread wakes up, and checks time. If time exceeded, divide time taken by CPU_CONSTANT milliseconds, subtract 1, say bot skipped this many turns, if exceeds 30 kill bot. Otherwise continue as normal.

So I'm wondering if maybe there is another thread that is delaying the engine thread from waking up, perhaps to do some maintenance (garbage collection, JIT compiler swapping out methods with optimized versions, etc). There are probably heuristics that say it is a good time to do maintenance, when switching between threads, if nothing else is available. When in 1000TPS mode these actions would happen when the sleep happens (ie. nicely scheduled, not interfering with bot timing etc), since the heuristics say it is better to work in a sleep than to not switch between threads quickly.

In my mind, the easiest way to fix this would be by also doing timing in the bot thread, and only checking the timing in the engine thread if the bot thread hasn't finished yet.

Thoughts? Does anybody see issues with my thinking here?

Skilgannon (talk)21:20, 7 September 2017
 
 
Personal tools