Difference between revisions of "User talk:Krabb"

From Robowiki
Jump to navigation Jump to search
(Problem in Robocode engine revealed.)
m (fix link)
 
(4 intermediate revisions by 3 users not shown)
Line 70: Line 70:
  
 
It seems like we have a bug in the game engine running the battle in Robocode 1.7.x.x, and I have raised a bug report for it [http://sourceforge.net/tracker/?func=detail&aid=2976754&group_id=37202&atid=419486 here]. In short, it seems like the battle is consuming more and more CPU power during the battle, which causes the battle to be much slower and robots to skip turns and get disabled due to too many calls to get methods etc. So after all, there is nothing wrong with Garm 0.9u. This bug must be fixed before the final version of Robocode 1.7.1.2, of course. Naturally, this bug currently got my full attention. --[[User:FlemmingLarsen|Fnl]] 23:20, 25 March 2010 (UTC)
 
It seems like we have a bug in the game engine running the battle in Robocode 1.7.x.x, and I have raised a bug report for it [http://sourceforge.net/tracker/?func=detail&aid=2976754&group_id=37202&atid=419486 here]. In short, it seems like the battle is consuming more and more CPU power during the battle, which causes the battle to be much slower and robots to skip turns and get disabled due to too many calls to get methods etc. So after all, there is nothing wrong with Garm 0.9u. This bug must be fixed before the final version of Robocode 1.7.1.2, of course. Naturally, this bug currently got my full attention. --[[User:FlemmingLarsen|Fnl]] 23:20, 25 March 2010 (UTC)
 +
 +
Any recent progress? Anything I can do for you? Finally I'm now back at home :) --[[User:Krabb|<font style="font-size:0.8em;font-variant:small-caps;">Krabb</font>]] 14:57, 11 April 2010 (UTC)
 +
: Yes, I have lots of progress in various directions. I did not want to spam your page too much. :-) Currently I am working on several issues leading to robots "hanging". It seems like some of the problem occurs with MessageEvents send between robots, which uses references to e.g. TeamRobot instances with the message. This is a problem (for the robot) as the message itself is declared Serializable, and hence the message is serialized when the robot tries to read and handle it. Image how much CPU that must be used to serialize a TeamRobot instance. The robot author is not in control of how Robot instances are serialized, so they should not refer to robot instances in the messages. With 1.6.2 and 1.7.x robots are more expensive to serialize, which explains why some robots hangs giving e.g. skipped turns and similar problems. Currently, I am figuring out how to deal with this stuff so the robots will be able to run under 1.7.2.0. This is tricky. Notice, I also need to fix some streaming issues beside the issue I mentioned here, but this is very internal stuff, but seems to improve performance. :-) --[[User:FlemmingLarsen|Fnl]] 19:42, 11 April 2010 (UTC)
 +
 +
: Hey Krabb, just to let you know, the issue seems to have finally been resolved probably. See the latest comments in the [http://sourceforge.net/tracker/?func=detail&aid=2974178&group_id=37202&atid=419486 bug report]. It seems that actually what was really going on was the Garm was skipping lots of turns in both Robocode 1.6 and 1.7, however only got killed by the engine for it in 1.7. The cause? A bug in Robocode 1.7 that killed robots after they skip 30 turns in total in a round, instead of 30 '''consecutive''' turns like was required to force the bot to be killed in 1.6. I'm not sure why I didn't notice Garm skipping so much in 1.6 before... It's probably because both newer versions of Robocode and my own bots printed skip messages for each skip turn so I've come to strongly associate skipped turns with those messages, and missed Garm's own skipped turn count. Just thought you'd like to know Krabb. :) --[[User:Rednaxela|Rednaxela]] 20:38, 28 April 2010 (UTC)

Latest revision as of 21:40, 28 April 2010

Hi Krabb,

I bug report has been created for Robocode 1.7.2.0 Beta 2 regarding your robot Krabb.sliNk.Garm 0.9u here.

I should like you to assist us in finding out why your robot is skipping turns and eventually is stopped in the game in MeleeRumble (1000x1000, 10 robots). Perhaps you could try out your robot with the same setup and and tell us what causes the problem for your robot? Please follow up on the bug tracker on the link above. --Fnl 23:35, 22 March 2010 (UTC)

I found Krabb's email on the old wiki's contact information page, and set him an email notifying him of this. No clue if the email address is up to date or not. --Rednaxela 01:47, 23 March 2010 (UTC)
Well, we can ask Voidious to check this wiki's email address =) --Nat Pavasant 02:23, 23 March 2010 (UTC)
No need for that - he has the wiki e-mail feature enabled. I just clicked "E-mail this user" in the toolbox and dropped him a note to check his Talk page. ;) --Voidious 02:58, 23 March 2010 (UTC)

Hi! 6 new mails ;) Thanks for the info! I don't have the source code of 0.9u anymore, however I haven't changed the drawing stuff in the last versions(as far as i remember). I'll check if something is wrong there. I remember having a test whether drawing is enabled or not, but maybe its broken. But first of all I need to install robocode, I'm currently on vacation at my parents home... Hmm, I would like to do some robocoding again ;( I really need to get that bachelor thesis finished! --Krabb

Yes seems like the debug graphics are causing the error, enabling them stops the skipped turns. Oh, 0.9z does some devision by 0 :/ But I don't have any sources here, no ssh server at my PC... we need to wait till monday. --Krabb
Er, do you mean disabling them stops the skipped turns? Also, any guesses why exactly it's affecting 1.7 and not 1.6? Is this using onPaint() or is it using getGraphics()? :) --Rednaxela 12:16, 23 March 2010 (UTC)
When I disable the robot paintings in Robocode, and remove all the painting queues and stuff, 0.9u still skips turns?! In addition, getGraphics() is never called, but the onPaint() handler is. I am not sure, but it seems that sometimes the robots builds up some statistical data, that uses too much CPU power in some battles (at random), but not all battles. If I set up the CPU constant time to e.g. 3-5 times the amount, it still skips turns. I am not sure what causes the problem, but it could be some timing differences (some methods are faster, and others are slower) in Robocode 1.6 vs 1.7 that triggers the problem for the robot. I believe the problem lies within the robot somehow, and the robots has been "lucky" that the problem was not triggered in 1.6.1.4. I have used JAD for decompiling the robot, and fixed the stuff JAD could not figure out. I have put the sources here, if you guys want to see what might cause the trouble. I hope you don't mind Krabb? I am not sure if I got all sources correct, but it will be a good pointer to where the issue might be. My question is. Does this issue need to be fixed before we can release the final version of Robocode 1.7.2.0? --Fnl 21:56, 23 March 2010 (UTC)
How it sounds to me, is that the problem is a few issues combined:
  • Robocode 1.7 appears to always be calling onPaint, whereas 1.6 only called it when necessary
  • Robocode 1.7 appears to penalize bots for onPaint time whereas 1.6 did not, some bots bot need to do a substantial amount of extra processing to compute what to paint. This makes it impossible to debug with debugging graphics and at the same time run the robot's core code under ordinary CPU constraints.
  • Garm's painting overhead in melee that is unreasonably slow for practical use even in 1.6, though it does run.
IMO, at very least resolving the first would be important for a Robocode 1.7.2.0 release. The second also seems important to me. The third of course is a Garm-specific concern but shouldn't matter more in 1.7 than 1.6 if the above is addressed. --Rednaxela 01:41, 24 March 2010 (UTC)

Hang on. I wasn't aware of the fact that the onPaint() is still called in 1.7 even when not painting. This seems like a major deviation from previous versions, as previously the only way to know if painting was enabled was to check if onPaint() was being called. Now it seems that old bots will be executing all their painting code even if painting is not enabled. DrussGT tries to be backwards compatible by setting a global varial painting to true if the onPaint() is called, but it seems that this isn't correct anymore? Could somebody verify this? --Skilgannon 15:33, 24 March 2010 (UTC)

Same concerns here... onPaint() being called is how I determine that painting is enabled, which triggers a lot of extra processing. --Voidious 15:40, 24 March 2010 (UTC)
Same here. When the event is triggered, I set flags in parts of the code to start calculating things it otherwise would not in order to produce the graphics, so definitely a hit to performance. And keep in mind onPaint is supposed to be outside the bot's processing time allotment, so this could be a serious overall performance hit to the Rumble. There is absolutely no reason to count debugging graphics against the allowed processing time of a bot. Bottom line, this change is an oversight that should be rolled back.

On a side note, I can give an example of a reason that enabling and then disabling graphics could cause a crash: Let's say when you enable graphics you set a flag to start putting Shape objects in a list to be rendered. The list is fed to the graphics console each turn and emptied. Then you disable drawing. The list is still being filled, but now no longer being emptied. Crash imminent. --Martin 15:51, 24 March 2010 (UTC)

Heh, sorry for jumping to conclusions: I had guessed it was always running onPaint() based on some of the reported symptoms from Krabb and Fnl, but making a quick test bot now, it turns out that Robocode 1.7 isn't always calling onPaint. Disregard that bit... --Rednaxela 16:09, 24 March 2010 (UTC)

With my test I removed all robot painting entirely from Robocode, and the robot still skipped turns, but only in about 30% of the battles (and all rounds in the same battle). I let the robot fight itself with 10x Garn, and then either all robots skipped turns, or none of them. To me it sounds like something going on in the entire battle, that caused Garn to skip turns. Have you guys seen the same behavior with other robots? --Fnl 21:25, 24 March 2010 (UTC)

Fnl: A version compiled from the JAD output you have there isn't skipping turns but is giving "Error: Nullpointer exception in getWay!". Something is different in the JAD version... trying to figure out what now. --Rednaxela 16:23, 24 March 2010 (UTC)

If I silently try/catch the single offending line it seems to run properly, and also skip turns again, eventually dying with no score. Now what's left is diagnosing why exactly it happens, and subsequently why only in 1.7. --Rednaxela 16:41, 24 March 2010 (UTC)
Issue confirmed to still occur with all onPaint functions/references bulk-removed. --Rednaxela 16:49, 24 March 2010 (UTC)

Its not that sincere to decompile closed source code and to even publish it without agreement... But never mind. Garm is probably just too slow in melee games, I changed a lot during the last releases(speed should be better now) however the ranking performance decreased. If only Garm has this problem we can simply ignore this issue and remove Garm from the rumble for now. But you should make sure that onPaint isn't called every turn, garm also checks this and enables/disables debugging. --Krabb

I agree, and I am sorry that I have offended you. I have removed the sources again. The guys here on this page is probably the only ones knowing about it, and they are all trying to help. Hmm.. in order to really protect code in a robot, you need to obfuscate the code. This way the code is not easily decompiled - it will require hard core skills. --Fnl 21:20, 24 March 2010 (UTC)

Very strongly agreed that decompiles shouldn't be published without asking, heh. Though I supposed, now that Fnl had did that might as well test to double check theories. Anyway, I've tested that onPaint isn't being called unnecessarially, and I doubt it's as simple as Garm being too slow in melee, because the amount of skipping is an order of magnitude different Robocode 1.6 which is not something I'd expect from simple slowness. Personally, I'm very concerned that the cause could affect other bots in more subtle ways. --Rednaxela 18:04, 24 March 2010 (UTC)

Well, we might be able to test this if we compare the results for robots in the rumble between 1.6 and 1.7. This might also help us finding other robots that has some issues with 1.7. --Fnl 21:30, 24 March 2010 (UTC)
Yes, for that exact reason I'm running 1.7.2.0 Beta2 on a localhost rumble server (it's how I discovered this issue). The only issue is that to test all bots, it takes a long time to have the accuracy to discern all but the largest differences. One quick note is that DemonicRage is another bot showing a very large discrepancy however in this case I can see no obvious difference watching it's behaviour, nor any unusual messages or skipped turns. --Rednaxela 22:19, 24 March 2010 (UTC)

I think I need to point out some things regarding robot paintings (onPaint() and getGraphics()):

  • The robots never pay CPU time for painting - not in 1.6, 1.7 and future versions.
  • With 1.7 it is even cheaper (CPU time) to invoke painting methods as the robot painting are totally decoupled the Graphics context due to:
    • All painting commands like e.g. fillRect(x, y, with, height) are immediately put into a buffer with the command and parameters. No painting is immediately performed on the Graphics context. This is done later, when Robocode needs to paint the robot paintings. Note that painting immediately on the Graphics contexts is more expensive to do than immediately putting a few bytes in a buffer that needs to be painted later.
    • The game pays CPU time for the painting, when/if it needs to paint all debugging graphics.
    • Decoupling the robot painting from the Graphics context to support more languages/platform like e.g. C#/.NET.
  • The decoupling in 1.7 was made due to:
    • Allowing .NET robots (and robots for future languages) to paint themselves as well, as they also use the buffer mechanism, and cannot paint directly on the Java Graphics context. It uses a .NET based context instead.
    • Allowing repainting to be stored in replays, if the developer wants to see the paintings later on (saved replay) - even when paintings are disabled. Again, the robot paintings is not stored with the robot - it is stored internally by the game and might be saved to a file (done by the robot developer).

Hence, the changes with onPaint() and getGraphics() were made with a very good reason, and should not have any impact on the robots at all other than perhaps visual bugs in the robot paintings. --Fnl 22:07, 24 March 2010 (UTC)

Makes sense, some of the early symptoms caused me to make some false assumptions. One question: "even when paintings are disabled" only applies to getGraphics() paintings, right? --Rednaxela 22:19, 24 March 2010 (UTC)
getGraphics() and onPaint(Graphics2D) is handled the same way, meaning that paintings are stored in a buffer. With getGraphics() the buffer gets the commands whenever a painting method in invoked on "Graphics". With onPaint(), Robocode call this when the robot should paint itself, and all paintings commands on "Graphics" are stored in the buffer too (in one go). The Graphics object is fake (actually a proxy), as it just contains a buffer that is later processed to a real Graphics object when the real paintings are done by Robocode. :-) --Fnl 22:32, 24 March 2010 (UTC)
What I meant, is that since onPaint() is only called when paintings are enabled, I'd presume only painting via getGraphics() is saved to replays when painting is disabled. Right? --Rednaxela 22:37, 24 March 2010 (UTC)
Ahh.. Well, if paintings are disabled, then nothing is stored in the buffers, and nothing is painted (both onPaint() and getGraphics(). Paintings are always disabled, if the robot painting is disabled on the robot console - unless replay is enabled in the game (as replay allows the paintings to saved and shown later). --Fnl 23:03, 24 March 2010 (UTC)

Since Garm 0.9u skips turns, even when I ripped out all robot paintings of Robocode (local setup), the problem must be somewhere else. I am still investigating this by examining the internals of the robot. --Fnl 23:08, 24 March 2010 (UTC)

Hmm, maybe it`s due to memory consumptions? I remember loading a lot of precomputed data, for fast math and prcise-maxescapeangle calculations and the gun/movement statistics are possible not that small eather. I dunno if it can happen that memory is spilled to the harddrive? Or some other memory related issue? Sorry, I have currently no time for testing. --Krabb

Not seeing the memory usage in Robocode getting anywhere close to the limit, nor is my system memory going into swap space in the slightest, so it doesn't seem like a memory related issue to me. No worries about time for testing. --Rednaxela 14:12, 25 March 2010 (UTC)

It seems like we have a bug in the game engine running the battle in Robocode 1.7.x.x, and I have raised a bug report for it here. In short, it seems like the battle is consuming more and more CPU power during the battle, which causes the battle to be much slower and robots to skip turns and get disabled due to too many calls to get methods etc. So after all, there is nothing wrong with Garm 0.9u. This bug must be fixed before the final version of Robocode 1.7.1.2, of course. Naturally, this bug currently got my full attention. --Fnl 23:20, 25 March 2010 (UTC)

Any recent progress? Anything I can do for you? Finally I'm now back at home :) --Krabb 14:57, 11 April 2010 (UTC)

Yes, I have lots of progress in various directions. I did not want to spam your page too much. :-) Currently I am working on several issues leading to robots "hanging". It seems like some of the problem occurs with MessageEvents send between robots, which uses references to e.g. TeamRobot instances with the message. This is a problem (for the robot) as the message itself is declared Serializable, and hence the message is serialized when the robot tries to read and handle it. Image how much CPU that must be used to serialize a TeamRobot instance. The robot author is not in control of how Robot instances are serialized, so they should not refer to robot instances in the messages. With 1.6.2 and 1.7.x robots are more expensive to serialize, which explains why some robots hangs giving e.g. skipped turns and similar problems. Currently, I am figuring out how to deal with this stuff so the robots will be able to run under 1.7.2.0. This is tricky. Notice, I also need to fix some streaming issues beside the issue I mentioned here, but this is very internal stuff, but seems to improve performance. :-) --Fnl 19:42, 11 April 2010 (UTC)
Hey Krabb, just to let you know, the issue seems to have finally been resolved probably. See the latest comments in the bug report. It seems that actually what was really going on was the Garm was skipping lots of turns in both Robocode 1.6 and 1.7, however only got killed by the engine for it in 1.7. The cause? A bug in Robocode 1.7 that killed robots after they skip 30 turns in total in a round, instead of 30 consecutive turns like was required to force the bot to be killed in 1.6. I'm not sure why I didn't notice Garm skipping so much in 1.6 before... It's probably because both newer versions of Robocode and my own bots printed skip messages for each skip turn so I've come to strongly associate skipped turns with those messages, and missed Garm's own skipped turn count. Just thought you'd like to know Krabb. :) --Rednaxela 20:38, 28 April 2010 (UTC)