ThreadDeath problem

Jump to navigation Jump to search

Would you try modifing the source of robocode to make it ouput the result of checkSkippedTurn? I think the bug may be whether the skipped turns are counted incorrectly or some skipped turns are not logged to console.

Xor (talk)03:21, 5 September 2017

Skipped turns are logged (and SkippedTurnEvents happens) only when the Robocode regains control over the execution of the thread. Since there is an unhandled exception being thrown right during the skipped turn checking, it makes sense that the skipped turns are not logged. I'm pretty sure you are skipping too many turns, Thread.stop() is being called as it is supposed to be (by looking at Robocode source), the exception is being thrown (because Thread.stop() is deprecated) and for some reason it's not being handled.

So, the error message is pretty much "you have skipped too many turns, I tried to stop your thread but for this I called a deprecated function which throws an error". The bot punishment for skipping too many turns is correct. The correct questions are: this should really happen? not the error, but the "too many skipped turns" thing. Does it happen when locking to 1000 FPS? If it was a "fair" turn skipping, it should happen while locked to 1000 FPS as well? If yes, why it's behaving differently when on full speed? Can others reproduce the same issue? (pick Roborio vs Neuromancer, for example, and go full speed). Why it's more likely to happen when a slow bot is against a fast bot? And so on.

Does you guys have any other interesting observations on that?

Rsalesc (talk)03:35, 5 September 2017

Ok. Roborio also fall into ThreadDeath in the battle:

rsalesc.roborio.Roborio 1.2.7  vs penguin.Joker .611wr

It does not happens at 1000 FPS, only if I push the slider to max.

I also see DrussGT falling into ThreadDeath in the battle:

jk.mega.DrussGT 3.2.1 vs penguin.Joker .611wr

It seems that in a pair of bots the most CPU demanding gets the exception.

Beaming (talk)04:33, 5 September 2017

Yeah, here is the heart of the issue in my opinion. Weird things going under the hood when going full speed. I still have not fully understood Robocode skipped turn source so I'm not sure what's happening. I'll do more tests tomorrow.

Rsalesc (talk)12:51, 5 September 2017
 

I looked at the internals. The difference between desired 1000 TPS and max is that in the later case the Thread.sleep is called (actually if you have more than 0.5 ms to spare). I am guessing that probabilistically Thread.Sleep is called couple times per round.

I do not know what java does when thread is asleep, but I would imagine that the garbage collector and other gizmos kick in during the sleep. While at max TPS the java chooses as it wishes when to run GC and if you are the unlucky one, the GC time will be counted against your bot. This would explain sporadic super long executions in my bot time profiling.

Beaming (talk)19:45, 5 September 2017
 

I would gladly try. But how do I call it within the AdvancedBot?

Beaming (talk)04:24, 5 September 2017

What I mean is that you hack in to the robocode to make it output some useful information about these bugs. There is no need for your bot to call that, as checkSkippedTurn is called by the game every turn.

Xor (talk)05:17, 5 September 2017
 

I must be very distracted, the skipped turns should be logged, right?

They are logged before the piece of code which calls the thread stopping procedure. It's logged if the bot is still alive. That means that the bots which are getting the ThreadError are not alive at all?

Rsalesc (talk)14:45, 5 September 2017

Yes, that's very strange. And that's why I think hack into robocode source code and make it output some useful information in checkSkippedTurn would help.

This would help us find out 1. why skipped turns are getting greater than 30 eventually. 2. why there is no output before at all (is it still alive? )

I'm also thinking about the possibility that the main thread of robocode is also skipping turns itself ;p

Xor (talk)15:14, 5 September 2017

I cannot remember how I get this knowledge but I recall that the logic is following. Robocode let you consume as much time as you want per tic (see the exception below), than if you exceed time quota, it will force you to skip as many turns as you excess time over permited time per tic. That is why you often see skipped turns in succession (though, not in my log above).

However, if a bot have a glitch and never returns control back to the robocode (let's say there is an infinite loop), then robocode have a way to force stop such bot. I would not be surprised that it first kill the bot and then executes checkSkippedTurn. In this case a bot will see nothing in console.

Beaming (talk)18:26, 5 September 2017
 

For your questions:

  1. I/O robots (bots that read/write from the data directory) get 240 skipped turns. Apparently a robot can get marked as an I/O robot just by calling getDataDirectory().
  2. This is where things stop making sense. The stacktrace shows RobotPeer.checkSkippedTurn() calling BasicRobotProxy.forceStopThread(), but the code shows that it can't happen without going through println("SYSTEM: ..."), there literally is no other code path. Are println() calls getting silently dropped?
  3. The Robocode main thread can't "skip turns" because it controls the turn counter.

Relevant code:

public void checkSkippedTurn() {
    // Store last and current execution time for detecting skipped turns
    lastExecutionTime = currentExecutionTime;
    currentExecutionTime = battle.getTime();

    int numSkippedTurns = (currentExecutionTime - lastExecutionTime) - 1;
    if (numSkippedTurns >= 1) {
        events.get().clear(false);

        if (isAlive()) {
            for (int skippedTurn = lastExecutionTime + 1; skippedTurn < currentExecutionTime; skippedTurn++) {
                addEvent(new SkippedTurnEvent(skippedTurn));
                println("SYSTEM: " + getShortName() + " skipped turn " + skippedTurn);
            }
        }

        if ((!isIORobot && (numSkippedTurns > MAX_SKIPPED_TURNS))
                || (isIORobot && (numSkippedTurns > MAX_SKIPPED_TURNS_WITH_IO))) {
            println("SYSTEM: " + getShortName() + " has not performed any actions in a reasonable amount of time.");
            println("SYSTEM: No score will be generated.");
            setHalt(true);
            waitWakeupNoWait();
            punishBadBehavior(BadBehavior.SKIPPED_TOO_MANY_TURNS);
            robotProxy.forceStopThread();
        }
    }
}
MultiplyByZer0 (talk)18:48, 5 September 2017

Here is my current best guess.

Note that Beaming's stacktrace includes:

...
at net.sf.robocode.host.proxies.BasicRobotProxy.forceStopThread(BasicRobotProxy.java:44)
at net.sf.robocode.battle.peer.RobotPeer.checkSkippedTurn(RobotPeer.java:649)
...

Therefore, RobotPeer.checkSkippedTurn() called BasicRobotProxy.forceStopThread()

The source code of RobotPeer.checkSkippedTurn() shows one and only one possible call to BasicRobotProxy.forceStopThread():

public void checkSkippedTurn() {
    // ...
    if (/* snip */) {
        println("SYSTEM: " + getShortName() + " has not performed any actions in a reasonable amount of time.");
        println("SYSTEM: No score will be generated.");
        /* snip */
        robotProxy.forceStopThread();
    }
}

Therefore, control flow must have passed through this path.

Note that this code sequence includes a println("SYSTEM: <Bot> has not performed any actions...") call, which must have been executed, yet no such SYSTEM message is present in Beaming's log.

The source code of the println() method:

public void println(String s) {
    synchronized (proxyText) {
        battleText.append(s);
        battleText.append("\n");
    }
}

Note the following:

  1. Robocode is multithreaded. Heavy multithreading causes race conditions.
  2. The variable battleText is a StringBuilder, which is not thread-safe (unlike a StringBuffer), which is.
  3. The code modifies battleText, but only synchronizes against proxyText.

Therefore, EvBotNG is skipping turns, but no log messages (from either Robocode or the bot) are printed, because they are silently dropped due to race conditions in the println() method. The only thing you ultimately see is the ThreadDeath exception caused by Thread.stop() forcibly terminating the thread.

Presumably, this occurs because of unlucky interaction between RobotPeer.println() and RobotPeer.readOutText().

readOutText()                                 | println()
============================================= | ===============================
final String robotText =                      |
battleText.toString() + proxyText.toString(); |
                                              | battleText.append(s);
                                              | battleText.append("\n");
battleText.setLength(0);                      |
proxyText.setLength(0);                       |
return robotText;                             |

Therefore, the solution is to add synchronized(battleText) { guards around readOutText() and println().

Or maybe I'm way off track.

MultiplyByZer0 (talk)20:46, 5 September 2017

I think you're on the right track, it definitely feels like race conditions to me. However in this case you're wrong - both code blocks (println and readOutText) are synchronized with the same object, so they can't run at the same time. It isn't important which one they are synchronized with, it could even be a different object. See the sharedLock example here.

I think a good place to start looking is in the 'fixes' to the skipped turn behavior, which were documented here, as well as a very busy discussion on the wiki.

Skilgannon (talk)21:37, 5 September 2017

Well, println() is definitely broken somehow. There is no way that checkSkippedTurn() could have called forceStopThread() (as is shown in the stacktrace), without going through println() first, yet the print message never made its way to the console (as is shown in Beaming's log).

battleText is also cleared in initializeRound() and cleanup(), both of which have no synchronization guards at all.

I think skipped turns are working as they should, they're just not being reported.

MultiplyByZer0 (talk)21:59, 5 September 2017

Good catch. I've added synchronization guards on the initializeRound() and cleanup() and now I get the SYSTEM: Neuromancer 5.4 has not performed any actions in a reasonable amount of time. printed inside the correct round. The ThreadDeath is still printed in the round after, but I can live with that I guess.

Now for the next question, why is this even happening?

Skilgannon (talk)22:18, 5 September 2017

You do not have permission to edit this page, for the following reasons:

  • The action you have requested is limited to users in the group: Users.
  • You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.

You can view and copy the source of this page.

Return to Thread:Talk:RoboRumble/ThreadDeath problem/reply (24).

It seems that Skilgannon has filed a bug report on this issue, including the necessary patch.

MultiplyByZer0 (talk)02:19, 6 September 2017

See it now. Thanks.

Beaming (talk)02:28, 6 September 2017
 
 
 

@ MultiplyByZer0, excellent analysis. I would never spotted it.

Beaming (talk)00:53, 6 September 2017
 

To add to mysteries. Why I do not see dramatic drop in scores if I run Voidious' RoboRunner? I see missing SYSTEM messages in logs but it does not affect scores the same drastic way as with the literumble client. I also see no ThreadDeath messages. I am guessing RoboRunner catches those somehow and process the score differently.

Beaming (talk)01:38, 6 September 2017