ThreadDeath problem
I decided to put a bit more info about the ThreadDeath problem. The offending log entry would look like
Round 4 of 10 ========================= java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:853) at net.sf.robocode.host.security.RobotThreadManager.stop(RobotThreadManager.java:190) at net.sf.robocode.host.security.RobotThreadManager.stopSteps(RobotThreadManager.java:172) at net.sf.robocode.host.security.RobotThreadManager.forceStop(RobotThreadManager.java:143) at net.sf.robocode.host.proxies.HostingRobotProxy.forceStopThread(HostingRobotProxy.java:144) at net.sf.robocode.host.proxies.BasicRobotProxy.forceStopThread(BasicRobotProxy.java:44) at net.sf.robocode.battle.peer.RobotPeer.checkSkippedTurn(RobotPeer.java:649) at net.sf.robocode.battle.peer.RobotPeer.executeImpl(RobotPeer.java:492) at net.sf.robocode.host.proxies.BasicRobotProxy.executeImpl(BasicRobotProxy.java:384) at net.sf.robocode.host.proxies.BasicRobotProxy.execute(BasicRobotProxy.java:122) at robocode.AdvancedRobot.execute(AdvancedRobot.java:552) at eem.frame.gameInfo.gameInfo.run(gameInfo.java:95) at eem.frame.core.CoreBot.run(CoreBot.java:206) at net.sf.robocode.host.proxies.HostingRobotProxy.callUserCode(HostingRobotProxy.java:274) at net.sf.robocode.host.proxies.HostingRobotProxy.run(HostingRobotProxy.java:221) at net.sf.robocode.host.proxies.BasicRobotProxy.run(BasicRobotProxy.java:44) at java.lang.Thread.run(Thread.java:748)
Note that something was very wrong. My bot did not received even SYSTEM onDeath event as can be seen in the more complete log below. Though, my bot was counted as a looser for the first 3 rounds by the other bot. Also, my bot outputted some log during rounds 1 to 4, so the error happens somewhere during the round and steal control from the bot. Also note some time profiling at the end, many methods report max as a factor of 100 times more than an average time. I.e. some catastrophic event happend. One issue that the number of operation is different in every method execution, but I bet not by factor of 100.
========================= Round 1 of 10 ========================= =========== Round #1============= 100007 best target penguin.Joker .611wr SYSTEM: EvBotNG vtest skipped turn 19 Skipped turn 19 reported at 100020 SYSTEM: EvBotNG vtest skipped turn 32 Skipped turn 32 reported at 100033 100221: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = -0.6450138864756005 with real bullet 100314: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = -0.2827484567498845 with real bullet 100387: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = -0.20989274379107656 with real bullet 100431: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.19347401753112997 with real bullet ========================= Round 2 of 10 ========================= =========== Round #2============= Probing new bullet energy 1.95 SYSTEM: EvBotNG vtest skipped turn 8 Skipped turn 8 reported at 200009 200055: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.3557362360118722 with real bullet 200307: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.08198800256947705 with real bullet 200335: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.06521832333874987 with real bullet penguin.Joker .611wr hit enemy: eem.EvBotNG vtest with gun: circularGun fired at dist: 566.5311086790286 200354: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = -0.10610623841145142 with real bullet SYSTEM: EvBotNG vtest skipped turn 396 Skipped turn 396 reported at 200397 ========================= Round 3 of 10 ========================= java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:853) at net.sf.robocode.host.security.RobotThreadManager.stop(RobotThreadManager.java:190) at net.sf.robocode.host.security.RobotThreadManager.stopSteps(RobotThreadManager.java:172) at net.sf.robocode.host.security.RobotThreadManager.forceStop(RobotThreadManager.java:143) at net.sf.robocode.host.proxies.HostingRobotProxy.forceStopThread(HostingRobotProxy.java:144) at net.sf.robocode.host.proxies.BasicRobotProxy.forceStopThread(BasicRobotProxy.java:44) at net.sf.robocode.battle.peer.RobotPeer.checkSkippedTurn(RobotPeer.java:649) at net.sf.robocode.battle.peer.RobotPeer.executeImpl(RobotPeer.java:492) at net.sf.robocode.host.proxies.BasicRobotProxy.executeImpl(BasicRobotProxy.java:384) at net.sf.robocode.host.proxies.BasicRobotProxy.execute(BasicRobotProxy.java:122) at robocode.AdvancedRobot.execute(AdvancedRobot.java:552) at eem.frame.gameInfo.gameInfo.run(gameInfo.java:95) at eem.frame.core.CoreBot.run(CoreBot.java:206) at net.sf.robocode.host.proxies.HostingRobotProxy.callUserCode(HostingRobotProxy.java:274) at net.sf.robocode.host.proxies.HostingRobotProxy.run(HostingRobotProxy.java:221) at net.sf.robocode.host.proxies.BasicRobotProxy.run(BasicRobotProxy.java:44) at java.lang.Thread.run(Thread.java:748) =========== Round #3============= Probing new bullet energy 1.95 300093: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.37068349323008726 with real bullet SYSTEM: EvBotNG vtest skipped turn 156 Skipped turn 156 reported at 300157 SYSTEM: EvBotNG vtest skipped turn 191 Skipped turn 191 reported at 300192 300301: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = -0.2802401527715257 with real bullet 300493: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.1975865759018777 with real bullet SYSTEM: EvBotNG vtest skipped turn 559 Skipped turn 559 reported at 300560 ========================= Round 4 of 10 ========================= java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:853) at net.sf.robocode.host.security.RobotThreadManager.stop(RobotThreadManager.java:190) at net.sf.robocode.host.security.RobotThreadManager.stopSteps(RobotThreadManager.java:172) at net.sf.robocode.host.security.RobotThreadManager.forceStop(RobotThreadManager.java:143) at net.sf.robocode.host.proxies.HostingRobotProxy.forceStopThread(HostingRobotProxy.java:144) at net.sf.robocode.host.proxies.BasicRobotProxy.forceStopThread(BasicRobotProxy.java:44) at net.sf.robocode.battle.peer.RobotPeer.checkSkippedTurn(RobotPeer.java:649) at net.sf.robocode.battle.peer.RobotPeer.executeImpl(RobotPeer.java:492) at net.sf.robocode.host.proxies.BasicRobotProxy.executeImpl(BasicRobotProxy.java:384) at net.sf.robocode.host.proxies.BasicRobotProxy.execute(BasicRobotProxy.java:122) at robocode.AdvancedRobot.execute(AdvancedRobot.java:552) at eem.frame.gameInfo.gameInfo.run(gameInfo.java:95) at eem.frame.core.CoreBot.run(CoreBot.java:206) at net.sf.robocode.host.proxies.HostingRobotProxy.callUserCode(HostingRobotProxy.java:274) at net.sf.robocode.host.proxies.HostingRobotProxy.run(HostingRobotProxy.java:221) at net.sf.robocode.host.proxies.BasicRobotProxy.run(BasicRobotProxy.java:44) at java.lang.Thread.run(Thread.java:748) =========== Round #4============= Probing new bullet energy 1.95 400161: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.6168884721887381 with real bullet 400292: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.09859694511664721 with real bullet 400317: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = -0.47485553157064525 with real bullet 400345: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.2798221445512617 with real bullet 400476: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.2817195309815388 with real bullet 400492: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = 0.34782608695652234 with real bullet ========================= Round 5 of 10 ========================= =========== Round #5============= Probing new bullet energy 1.95 SYSTEM: Bonus for killing penguin.Joker .611wr: 11 SYSTEM: eem.EvBotNG vtest wins the round. Hit by bullet: [5, 4, 3, 6, 0, 0, 0, 0, 0, 0] Wall hits stats: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] Skipped turns stats: [2, 2, 3, 0, 0, 0, 0, 0, 0, 0] Hit rate stats: 16/158 = 10.13% Bullet hit bullet stats:: 10/158 = 6.33% Rounds ratio of win/lose = 1/2 Finishing places stats: [1, 2] myWinLosePlace = 0 My score in this round = 113.4 enemy score = 0.0 My total score = 176.89999999999998 enemy score = 246.76000000000022 Round APS stats: [25.01444251877526, 0.0, 0.0, 14.727325750947783, 100.0, 0.0, 0.0, 0.0, 0.0, 0.0] Accumulated APS = 41.76% --- bot penguin.Joker .611wr stats: hit rate for the following bot(s) out of 172 shots eem.EvBotNG vtest: 18/172 = 10.47% My virtual gun hit rate stats stats enemy name circularGun headOnGun realHitsGun unknownGun eem.EvBotNG vtest 1/172 = 0.58% 0/172 = 0.00% 18/172 = 10.47% 17/172 = 9.88% --- bot eem.EvBotNG vtest stats: wrong path prediction count: 7 <------------------------------ improve me! Hit me count by the following bot(s) penguin.Joker .611wr: 18/172 = 10.47% Enemies virtual gun stats circularGun of bot penguin.Joker .611wr hit me 12/172 = 6.98% headOnGun of bot penguin.Joker .611wr hit me 13/172 = 7.56% realHitsGun of bot penguin.Joker .611wr hit me 1/172 = 0.58% hit rate for the following bot(s) out of 158 shots penguin.Joker .611wr: 16/158 = 10.13% My virtual gun hit rate stats stats enemy name circularGun circularAccelGun headOnGun kdtGF50 kdtGF100 kdtGF200 kdtGF400 kdtGF800 kdtGF1600 guessFactorGun randomGun assistedGFGun decayingGuessFactorGun flipLastGuessFactorGun penguin.Joker .611wr 11/147 = 7.48% 13/147 = 8.84% 17/147 = 11.56% 23/147 = 15.65% 14/147 = 9.52% 12/147 = 8.16% 19/147 = 12.93% 14/147 = 9.52% 18/147 = 12.24% 11/147 = 7.48% 17/147 = 11.56% 13/147 = 8.84% 7/147 = 4.76% 17/147 = 11.56% bot penguin.Joker .611wr intercepted my bullet 10/172 = 5.81% Profiler stats--------------------------- getOverlap was executed 22026 execution times: min 67 nS, average 118 nS, max 58 uS getJoin was executed 5 execution times: min 2 uS, average 2 uS, max 3 uS choseNewPath was executed 2529 execution times: min 95 uS, average 321 uS, max 14 mS rankAimAtAllSolutions was executed 2358 execution times: min 243 nS, average 4 uS, max 2 mS waveWithBullets.getDanger was executed 312940 execution times: min 74 nS, average 331 nS, max 2 mS calculateDanger was executed 10899 execution times: min 801 nS, average 43 uS, max 13 mS waveWithBullets.getWaveDanger was executed 26899 execution times: min 450 nS, average 2 uS, max 2 mS gunManagerManage was executed 2529 execution times: min 3 uS, average 586 uS, max 12 mS --------------------------- 500353: penguin.Joker .611wr hit eem.EvBotNG vtest at GF = -0.31909710498646854 with real bullet =========================
Would you try modifing the source of robocode to make it ouput the result of checkSkippedTurn? I think the bug may be whether the skipped turns are counted incorrectly or some skipped turns are not logged to console.
Skipped turns are logged (and SkippedTurnEvents happens) only when the Robocode regains control over the execution of the thread. Since there is an unhandled exception being thrown right during the skipped turn checking, it makes sense that the skipped turns are not logged. I'm pretty sure you are skipping too many turns, Thread.stop() is being called as it is supposed to be (by looking at Robocode source), the exception is being thrown (because Thread.stop() is deprecated) and for some reason it's not being handled.
So, the error message is pretty much "you have skipped too many turns, I tried to stop your thread but for this I called a deprecated function which throws an error". The bot punishment for skipping too many turns is correct. The correct questions are: this should really happen? not the error, but the "too many skipped turns" thing. Does it happen when locking to 1000 FPS? If it was a "fair" turn skipping, it should happen while locked to 1000 FPS as well? If yes, why it's behaving differently when on full speed? Can others reproduce the same issue? (pick Roborio vs Neuromancer, for example, and go full speed). Why it's more likely to happen when a slow bot is against a fast bot? And so on.
Does you guys have any other interesting observations on that?
Ok. Roborio also fall into ThreadDeath in the battle:
rsalesc.roborio.Roborio 1.2.7 vs penguin.Joker .611wr
It does not happens at 1000 FPS, only if I push the slider to max.
I also see DrussGT falling into ThreadDeath in the battle:
jk.mega.DrussGT 3.2.1 vs penguin.Joker .611wr
It seems that in a pair of bots the most CPU demanding gets the exception.
I looked at the internals. The difference between desired 1000 TPS and max is that in the later case the Thread.sleep is called (actually if you have more than 0.5 ms to spare). I am guessing that probabilistically Thread.Sleep is called couple times per round.
I do not know what java does when thread is asleep, but I would imagine that the garbage collector and other gizmos kick in during the sleep. While at max TPS the java chooses as it wishes when to run GC and if you are the unlucky one, the GC time will be counted against your bot. This would explain sporadic super long executions in my bot time profiling.
I must be very distracted, the skipped turns should be logged, right?
They are logged before the piece of code which calls the thread stopping procedure. It's logged if the bot is still alive. That means that the bots which are getting the ThreadError are not alive at all?
Yes, that's very strange. And that's why I think hack into robocode source code and make it output some useful information in checkSkippedTurn would help.
This would help us find out 1. why skipped turns are getting greater than 30 eventually. 2. why there is no output before at all (is it still alive? )
I'm also thinking about the possibility that the main thread of robocode is also skipping turns itself ;p
I cannot remember how I get this knowledge but I recall that the logic is following. Robocode let you consume as much time as you want per tic (see the exception below), than if you exceed time quota, it will force you to skip as many turns as you excess time over permited time per tic. That is why you often see skipped turns in succession (though, not in my log above).
However, if a bot have a glitch and never returns control back to the robocode (let's say there is an infinite loop), then robocode have a way to force stop such bot. I would not be surprised that it first kill the bot and then executes checkSkippedTurn. In this case a bot will see nothing in console.
For your questions:
- I/O robots (bots that read/write from the data directory) get 240 skipped turns. Apparently a robot can get marked as an I/O robot just by calling
getDataDirectory()
. - This is where things stop making sense. The stacktrace shows
RobotPeer.checkSkippedTurn()
callingBasicRobotProxy.forceStopThread()
, but the code shows that it can't happen without going throughprintln("SYSTEM: ...")
, there literally is no other code path. Areprintln()
calls getting silently dropped? - The Robocode main thread can't "skip turns" because it controls the turn counter.
Relevant code:
public void checkSkippedTurn() {
// Store last and current execution time for detecting skipped turns
lastExecutionTime = currentExecutionTime;
currentExecutionTime = battle.getTime();
int numSkippedTurns = (currentExecutionTime - lastExecutionTime) - 1;
if (numSkippedTurns >= 1) {
events.get().clear(false);
if (isAlive()) {
for (int skippedTurn = lastExecutionTime + 1; skippedTurn < currentExecutionTime; skippedTurn++) {
addEvent(new SkippedTurnEvent(skippedTurn));
println("SYSTEM: " + getShortName() + " skipped turn " + skippedTurn);
}
}
if ((!isIORobot && (numSkippedTurns > MAX_SKIPPED_TURNS))
|| (isIORobot && (numSkippedTurns > MAX_SKIPPED_TURNS_WITH_IO))) {
println("SYSTEM: " + getShortName() + " has not performed any actions in a reasonable amount of time.");
println("SYSTEM: No score will be generated.");
setHalt(true);
waitWakeupNoWait();
punishBadBehavior(BadBehavior.SKIPPED_TOO_MANY_TURNS);
robotProxy.forceStopThread();
}
}
}
Here is my current best guess.
Note that Beaming's stacktrace includes:
... at net.sf.robocode.host.proxies.BasicRobotProxy.forceStopThread(BasicRobotProxy.java:44) at net.sf.robocode.battle.peer.RobotPeer.checkSkippedTurn(RobotPeer.java:649) ...
Therefore, RobotPeer.checkSkippedTurn()
called BasicRobotProxy.forceStopThread()
The source code of RobotPeer.checkSkippedTurn()
shows one and only one possible call to BasicRobotProxy.forceStopThread()
:
public void checkSkippedTurn() {
// ...
if (/* snip */) {
println("SYSTEM: " + getShortName() + " has not performed any actions in a reasonable amount of time.");
println("SYSTEM: No score will be generated.");
/* snip */
robotProxy.forceStopThread();
}
}
Therefore, control flow must have passed through this path.
Note that this code sequence includes a println("SYSTEM: <Bot> has not performed any actions...")
call, which must have been executed, yet no such SYSTEM message is present in Beaming's log.
The source code of the println()
method:
public void println(String s) {
synchronized (proxyText) {
battleText.append(s);
battleText.append("\n");
}
}
Note the following:
- Robocode is multithreaded. Heavy multithreading causes race conditions.
- The variable
battleText
is aStringBuilder
, which is not thread-safe (unlike aStringBuffer
), which is. - The code modifies
battleText
, but only synchronizes againstproxyText
.
Therefore, EvBotNG is skipping turns, but no log messages (from either Robocode or the bot) are printed, because they are silently dropped due to race conditions in the println()
method. The only thing you ultimately see is the ThreadDeath exception caused by Thread.stop()
forcibly terminating the thread.
Presumably, this occurs because of unlucky interaction between RobotPeer.println()
and RobotPeer.readOutText()
.
readOutText() | println() ============================================= | =============================== final String robotText = | battleText.toString() + proxyText.toString(); | | battleText.append(s); | battleText.append("\n"); battleText.setLength(0); | proxyText.setLength(0); | return robotText; |
Therefore, the solution is to add synchronized(battleText) {
guards around readOutText()
and println()
.
Or maybe I'm way off track.
I think you're on the right track, it definitely feels like race conditions to me. However in this case you're wrong - both code blocks (println and readOutText) are synchronized with the same object, so they can't run at the same time. It isn't important which one they are synchronized with, it could even be a different object. See the sharedLock example here.
I think a good place to start looking is in the 'fixes' to the skipped turn behavior, which were documented here, as well as a very busy discussion on the wiki.
Well, println()
is definitely broken somehow. There is no way that checkSkippedTurn()
could have called forceStopThread()
(as is shown in the stacktrace), without going through println()
first, yet the print message never made its way to the console (as is shown in Beaming's log).
battleText
is also cleared in initializeRound()
and cleanup()
, both of which have no synchronization guards at all.
I think skipped turns are working as they should, they're just not being reported.
Good catch. I've added synchronization guards on the initializeRound() and cleanup() and now I get the SYSTEM: Neuromancer 5.4 has not performed any actions in a reasonable amount of time.
printed inside the correct round. The ThreadDeath is still printed in the round after, but I can live with that I guess.
Now for the next question, why is this even happening?
You do not have permission to edit this page, for the following reasons:
You can view and copy the source of this page.
Return to Thread:Talk:RoboRumble/ThreadDeath problem/reply (27).
To add to mysteries. Why I do not see dramatic drop in scores if I run Voidious' RoboRunner? I see missing SYSTEM messages in logs but it does not affect scores the same drastic way as with the literumble client. I also see no ThreadDeath messages. I am guessing RoboRunner catches those somehow and process the score differently.
Just looked at those functions showing in error stack, in checkSkippedTurn(), it says: SYSTEM: No score will be generated. before the thread is killed. Therefore the good news is that these errors should not affect your scores.
And oracle says An instance of ThreadDeath is thrown in the victim thread when the Thread.stop() method is invoked. and The top-level error handler does not print out a message if ThreadDeath is never caught.
therefore the error message you see in the console could simply because robocode caught that error to make some clean-ups.
actually robocode kills your thread when you are skipping 30 turns in a row (or if you have file access, this will be increased to 240).
Searching on my system and old backups I was able to find Robocode versions for 1.7.4.2, 1.8.1.0. I was also able to download 1.8.0.0 and 1.9.0.0. So I ran some experiments using Neuromaner vs RaikoMicro. Note, dev version of Neuromancer I'm using here prints out each time it gets a SkippedTurnEvent.
- I can't reproduce it in at all 1.7.4.2. All output looks perfect - every round has printed SYSTEM lines either saying win or death, and Neuromancer correctly gets win/death events. Neuromancer doesn't skip any turns.
- I can't reproduce it at all in 1.8.0.0 - not even "SYSTEM: Neuromancer 5.4 has not performed any actions in a reasonable amount of time." However there are some rounds where nothing is printed, which is very suspicious. Still no skipped turn events.
- In 1.8.1.0 I can sometimes get a "SYSTEM: Neuromancer 5.4 has not performed any actions in a reasonable amount of time.". I saw this 3 times out of 10+ battles. There are also the occasional suspicious blank rounds where Neuromancer prints nothing, and SYSTEM prints nothing. Still no skipped turn events.
- In 1.9.0.0 I can reproduce ThreadDeath exceptions. On average 1 per battle, but sometimes 0 and max I saw was 4. Average of 3 skipped turn events per battle.
To note: all of them had similar CPU constants (between 5.3ms and 5.4ms).
Other things I tried:
- Changing the GC algorithm doesn't affect the number of ThreadDeaths in 1.9.0.0
- Doing a single getDataDirectory() at the beginning of each round doesn't affect the number of ThreadDeaths in 1.9.0.0
This makes me think it is a Robocode bug - possibly multiple bugs, since we have very different behaviors here over the different versions. It also greatly worries me that bots which were fine in 1.7.4.2 have problems in new versions.
Hi Skilgannon,
Isolating robocode issues from jdk version. Would it be correct to assume that you ran everything with openjdk-8?
Yes, all OpenJDK 8
openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Here is the old installs I used to do these tests, I'd appreciate if someone could try them on Windows as well.
Tying into what Beaming said about the sleep in the internals not happening when set to max, perhaps this has been changed since earlier versions of Robocode. However there are more things that may be an issue as well, I don't think that is the total of what is buggy here. The missing prints in a round particularly concerns me (and yes, this still happens in recent versions).