Difference between revisions of "User talk:Voidious/Robocode Version Tests"

From Robowiki
Jump to navigation Jump to search
(wrapping this up)
(Unsure...)
Line 173: Line 173:
  
 
I don't have any more tests running at the moment. What else do you guys think I should run? Or is this enough to draw a conclusion? I think I'm sticking with my alpha 2 vote, though I'm still torn, and I should probably investigate the two problem pairings further. I'm not sure jab really decided on one. It looks to me like the new rules will win the vote. --[[User:Voidious|Voidious]] 14:28, 17 August 2009 (UTC)
 
I don't have any more tests running at the moment. What else do you guys think I should run? Or is this enough to draw a conclusion? I think I'm sticking with my alpha 2 vote, though I'm still torn, and I should probably investigate the two problem pairings further. I'm not sure jab really decided on one. It looks to me like the new rules will win the vote. --[[User:Voidious|Voidious]] 14:28, 17 August 2009 (UTC)
 +
 +
Well... I'd say it would be worth investigating ''"Raiko vs Moebius"'' if anyone has any ideas about what may be going wrong. ''"PrairieWolf vs DuelistMini"'' may be worth investigating before the final release, but I don't think would affect the choice between Alpha8 and Alpha9 as it scores similarly high either way. I'm not sure though, what the issue could be. I've looked at both the Mobius and Raiko code and didn't see anything that I'd expect to be significantly affected by the rule fix. Until further notice though, my vote sticks with Alpha9 style. --[[User:Rednaxela|Rednaxela]] 14:41, 17 August 2009 (UTC)

Revision as of 15:41, 17 August 2009

I don't think there are any differences between 1.6.1.4 and 1.7.1.1. The difference is quite big, 0.82%! But I think we are still using the fix, right? » Nat | Talk » 15:51, 17 July 2009 (UTC)

Yes, it's a pretty big difference (0.72% actually). 250 battles might not be enough, though, maybe I should run more for that pairing. If there is a 0.72% difference, I'd probably be against the change. But there's lots more to test first - I see my 1.6.1.4 CPU constant is ~20% higher than the 1.7* versions, that could even play a part. --Voidious 20:36, 17 July 2009 (UTC)

The Surfer-vs-Surfer battle seems to affect much more than Surfer-vs-Random battles. » Nat | Talk » 05:08, 18 July 2009 (UTC)

Not necessarially Nat. These results could just mean that Komarious and Ascendant are affected differently than Diamond and DrussGT. Komarious/Ascendant vs the random movers would be needed to tell that. Really this test would be more revealing if there was a full round-robin between involved bots. I'd also wonder: Do Komarious or Ascendant actually assume anything that's breaks in Alpha2? It's also worth noting that tests of score alone can't tell us if changes better match assumptions a bot is making, since a bot could just by fluke happen to operate in a way that is happy with conditions that the code didn't try to assume at all. --Rednaxela 05:26, 18 July 2009 (UTC)
There's also just a much larger variance in those pairings, so maybe 500 battles isn't even enough. My initial goal was just to find out if there was a measurable difference among these versions, so I was going for diversity in the battles I used. Full round robin between all bots (in all versions) might help in deducing causes, but this testing is already taking a "metric ass-ton" of CPU cycles =), so I'd definitely reduce the # of bots if I were to try that. It would still be pretty speculative, though.
Assuming that we have enough battles, the Diamond vs Komarious result I believe shows there are other differences between 1.6.1.4 and 1.7.x that are contributing. I noticed the CPU constant is a little different, but I'm pretty sure nobody's skipping turns anyway. The Alpha2 updateMovement code shouldn't change anything for these two bots, I don't think.
The PrairieWolf result is bizarre. 2% is well above any margin of error on 500 battles, I think. I thought PrairieWolf would have decreased performance, if anything, when we changed the +1/-1 decel rules, but he does better in Alpha3.
--Voidious 05:48, 18 July 2009 (UTC)
ATWHEB (Assuming that we have enough battles), it seems that the surfer gains score from this changes, which seem weird... Since most surfers use old way, including the old decel-through-zero rules. I wonder what DrussGT vs. Diamond score will look like.
I think PrairieWolf vibrate a bit shorter so DuelistMini missed him. » Nat | Talk » 06:07, 18 July 2009 (UTC)
You might have to test DuelistMini against another opponent -- maybe it's the one that's doing worse, as opposed to PrairieWolf doing better? Thanks for running all these tests, btw. Hopefully they'll help pick the right version and not just confuse things further! =) --Darkcanuck 06:25, 18 July 2009 (UTC)
No problem, though I will be taking some time to work on Diamond this weekend. =) Yeah, I don't think we can draw many conclusions just yet. I also just realized that both PW and DM do data saving... I'm actually not sure if that might throw off results or not (hopefully it does?), but maybe I'll try to find another vibrating test bot. I'm gonna run some stuff in 1.7.1.1 overnight here. --Voidious 06:36, 18 July 2009 (UTC)

Just want to know, where do you guys consider as 'unacceptable' (the red value)? Currently I use ±0.1 as 'margin of error' (green value), ±0.4 as 'acceptable' (black value), beyond that is all 'unacceptable'. I don't normally run a lot of battles so I don't know where it should be. » Nat | Talk » 07:30, 18 July 2009 (UTC)

Margin of error = ±0.1 when comparing results from 500 battles is probably about right, maybe even ±0.2. (That would mean margin of error on each 500-battle result is half that.) I'm unsure of my opinion on "acceptable". My first instinct is "any measurable change is unacceptable", but I need to think about it. I just really, really hate the idea of all our old Robocode bots slowly becoming (artificially) weaker and weaker, just because of changes to Robocode... --Voidious 16:37, 18 July 2009 (UTC)
I too dislike the idea of changes progressively making old bots weaker and weaker, but I really don't think that these changes are something that makes old bots progressively weaker. I have a feeling that most of the score changes that could be seen would not be due to slightly breaking old assumptions, but instead are just flukes arising from things just plain being a little different. The tests show DrussGT vs Ascendant having stronger DrussGT scores than before, but I doubt that Ascendant assumes old behavior in a way that DrussGT doesn't. At very least, I think the number of bots that are not acting quite as intended due to old Robocode movement being unintuitive/weird, is greater than the number of bots that truly intentionally assume old behavior that differs from the Alpha2 mode of operation (Alpha3 being a slightly different story). --Rednaxela 22:35, 18 July 2009 (UTC)
I mostly agree, which is the reason I might be OK with some measurable score differences. However, I think that if there continue to be changes that have measurable impacts on scoring, older bots will gradually get weaker (or should I say, "weakerer", since they're bound to get relatively "weaker", anyway). Even if the code doesn't assume anything explicitly, it may implicitly -- it was tuned in a certain Robocode environment that is no longer reflective of how Robocode works. --Voidious 22:59, 18 July 2009 (UTC)

Let me throw in my 2 cents to the discussion. Although I am a rather conservative guy, I also like clear and simple behaviour. Therefor my vote undoubtly goes for the Alpha-2 variant. Differences in score between the old code and Alpha-2 I take for granted as the old code just seems flawed.
I think that the Alpha-3 is one step too far, as the vibrating movement was rather popular in the days before Wavesurfing, and is used in several bots from that time. As for solving 'bugs' in the old code, there have been more changes that did influence the score, and we accepted them because they were logical and better following the rules than before. Think about onBulletHitBullet, which happens 50% more often now than in 1.0.6, favouring the bots that take them into account. --GrubbmGait 00:08, 19 July 2009 (UTC)

I took the weekend to work on Diamond, but I'm back to running lots more tests: DrussGT vs Diamond, Raiko vs Toorkild, Raiko vs Moebius, and Toorkild vs Moebius. 1.7.1.4-Alpha3 should be done before I get home today, then I'll run Alpha2. --Voidious 15:13, 21 July 2009 (UTC)

I thought my original RoboResearch install dir used 1.7.1.1, but it was actually 1.5.4, so I've removed those scores for now. Doh! The rest are definitely right. :-P --Voidious 22:13, 21 July 2009 (UTC)

Hmm... Raiko vs Moebius has interesting score... So Raiko moving at integer velocity make the nano pm match better? » Nat | Talk » 13:11, 22 July 2009 (UTC)

I am back from vacation. It seems like most of you like Alpha 2 more than Alpha 3, and I bet that most people would want the old behaviour, even though it's buggy and hard to understand - at least harder than the new version. I don't intend to make more changes to Robocode for the movement ever after this change. So, is Alpha 2 the version we go for? Also notice, that bugs fixed in Robocode 1.7.1.x might have an impact on the scores (e.g. 'Bug [2793464]', 'Bug [2740708] - Fair Play!', and 'broken AdvancedRobot.setMaxTurnRate() that did not work since 1.5.4') --Fnl 12:15, 25 July 2009 (UTC)

I personally don't have much preference between Alpha 2 and Alpha 3, but think that Alpha 2, with a limit to prevent acceleration from -0.1 to 1.9, would be preferable, so that it wouldn't be strangely advantageous for a bot to avoid hitting 0 speed ever. Even though some bots are known to have movement optimized for the case when you can accelerate from -1.0 to +1.0, I'm doubtful there are currently any that do things like hit -0.0001 speed instead of 0, in order to accelerate to 1.9999 --Rednaxela 15:03, 25 July 2009 (UTC)

That's the only difference between Alpha2 and Alpha3. Alpha2 still allows decel-thru-zero (e.g. going from -0.1 to 1.9) whereas Alpha3 switches from decel to accel once zero is reached. --Darkcanuck 18:34, 25 July 2009 (UTC)

I myself prefer the Alpha3 more since it is Mat himself suggest this, but I'm newer than everybody else here except Positive. And because the Alpha2 would bring a strange behavior that allow robot able to accl from -0.000000000000000001 to 1.999999999999999999, which is not fair. Rednaxela, I don't think there are any robots do that, but it may have impact on score. If we do that, we must have Alpha4 with that to make Voidious happy ;) » Nat | Talk » 15:23, 25 July 2009 (UTC)

Well, I was thiknking that an Alpha4 might not be a bad idea :P --Rednaxela 15:28, 25 July 2009 (UTC)

I'm still firmly in the Alpha3 camp. As Voidious pointed out, there's no reason to introduce a change in behaviour that only fixes things halfway. Either the decel-thru zero quirk gets fixed (Alpha3) or the old behaviour should stay (Alpha2). In either case, all bots benefit from an improved Robocode movement algorithm. --Darkcanuck 18:34, 25 July 2009 (UTC)

We should have poll somewhere... » Nat | Talk » 18:50, 25 July 2009 (UTC)

I'm torn at this point. In general, I'm against changing Robocode rules more than we need to, and I don't see the -0.1 to 1.9 quirk as that big a deal. (We've known about it for a long time, but who even cares to do it? It's very little gain for really uglying up your movement code.) I also greatly respect GrubbmGait's opinion, and while I don't speak for them, I believe ABC and David Alves are usually against changing Robocode physics rules, too.

On the other hand, the Alpha3 solution is very elegant; there are noticeable score differences, but nothing major or seeming to favor specific bots; and the "vibrating" movement is still possible, just between ±0.7 instead of ±1.0. I also respect all of your opinions =), and Darkcanuck has seemed very conservative about Robocode versions in the rumble, and he says Alpha3... Sorry for such a long post to basically say "I don't know", but just felt I should post my thoughts.

--Voidious 20:17, 25 July 2009 (UTC)

We could do an Alpha 4 - no problem. It might turn out to be better. Could you do a Hijack 3 with the code needed? Then I will make a new Alpha 4 for it, and everybody will be able to see the code for it. :-) --Fnl 20:59, 25 July 2009 (UTC)

I believe the needed getNewVelocity() is already done here: Positive/Optimal Velocity. I haven't tested it, but it looks right to me. --Rednaxela 21:48, 25 July 2009 (UTC)

The version on the bottom of this page is the most recent.

If you're interested in my opinion, I think keeping alpha-2 is probably the cleanest solution as well. It provides most backward and forward compability. If you want to design very precise robots now, you will need to do so for 1.6.1.4 anyway, so it's nice if you know you won't eventually have to adjust again.

The only reason to change it, would be to make it easier on the life of future perfectionist coders. You won't bother them with having to code special -0.1 > 1.9 codes to use the full available potential. Alpha-3 would solve that mathematically elegant, but would force them to deal with fractional velocities instead. Alpha-4 solves it a little less elegant, but I believe it solves the problem just as "correct" (going from -2>-1>1 isn't really better than -2>0>1, because it gives different displacement and turn rates), but with more backward compatibility and no fractional velocities. So I think that would be best. :) --Positive 23:13, 25 July 2009 (UTC)

Okay, I have now assembled the Alpha 4 for download based on the version on the bottom of this page. Perhaps we should run the tests for this version also, so we can compare it against the results for the Alpha 2 and 3? --Fnl 21:47, 26 July 2009 (UTC)

I'll kick off some tests today, starting with the most varied ones among the current results. I think it will take a couple of days to run through them all (assuming I can resist tweaking Diamond for that long =)). --Voidious 18:47, 27 July 2009 (UTC)

I've not yet seen any strange behaviour resulting from the changes in Alpha-4 so far, having run and watched about 100 rounds of 10-round batles. So I'd say it's implemented correctly. I don't know if it differs on score, though. --Positive 16:01, 28 July 2009 (UTC)

Hmm, the way that the new version are always getting notably higher in "Raiko vs Moebius", seems to say to me that either 1) something else that happened in 1.7.x is affecting either Raiko or Moebius, or 2) the 1.6 battle just happened to be lucky for Moebius and more seasons are needed. --Rednaxela 14:36, 29 July 2009 (UTC)

Sure, I will run some more battles, but I do suspect it's something else in 1.7.x causing the big difference compared to 1.6.1.4. --Voidious 15:43, 29 July 2009 (UTC)
Not sure if this effect? http://sourceforge.net/tracker/?func=detail&aid=2828479&group_id=37202&atid=419486 » Nat | Talk » 10:40, 30 July 2009 (UTC)
Nat, I don't think so. I've only seen it happen in melee fights. And I don't believe many 1v1 bots check the onRobotDeath event (they usually use the onDeath and onWin events) --Positive 14:36, 2 August 2009 (UTC)

By the way, perhaps this would be an opportunity to fix that bug where robots sometimes get to a location of x=17.99998 and other outside-of-arena locations? --Positive 14:36, 2 August 2009 (UTC)

Is there really? Could you please explain the situation some more? » Nat | Talk » 15:41, 2 August 2009 (UTC)
Perhaps you could create a robot that can reproduce the issue, and attach it to a new bug report at SourceForge for Robocode? This way we can fix this issue faster. :-) --Fnl 19:33, 2 August 2009 (UTC)
I just finished testing it, and it seems I was wrong. Whenever I try, getX() and getY() return the correct values. Sometimes when my robot scans opponents it does get values like Y=582.0000000000002, but I just realized those must be rounding errors. Well sorry... and that's one less bug to fix. :) --Positive 21:55, 2 August 2009 (UTC)
Okay, no harm done. =) --Fnl 19:49, 3 August 2009 (UTC)

Guys, I finally managed to fix the Missed onRobotDeath events issue. :-) Hence, it might make sense to make an Alpha-5 (or real Beta) to see what difference this bugfix makes compared to the other alphas. However, I do not know which getNextVelocity() method I should base it on. That is, which alpha version should I base the new alpha on? Alpha-2, Alpha-3, or Alpha-4? Which alpha do you prefer. Did we ever get to a conclusion about which alpha version that is the better one? I know there are different oppinions here. In addition, I should like to release a real Beta soon in order to get rid of other bugs as well. So I really need to make a decision on which getNextVelocity() method to use. --Fnl 20:08, 4 August 2009 (UTC)

In the end, I consider it your decision, Fnl. But if you'd like, we could set up a poll on this page? I'm actually still torn on the issue, but in that case, I have to err on the side of caution towards legacy bots... No hard feelings if/when Alpha3 wins out, though. =) (And maybe you should be able to vote for as many choices as you like.) --Voidious 20:35, 4 August 2009 (UTC)

Name Alpha2 (old rules) Alpha3 (new rules) Alpha4 (cap at v=1) Comment
Voidious X
Positive X X
Rednaxela X X
Darkcanuck X
Nat X
Fnl X X
GrubbmGait X X
Skilgannon X
Jab X

Hi guys. I am very sorry, but I discoved a bug in the updateMovement() compared to 1.6.1.4, which is described in the Optimal_Velocity page. The problem is that the getDistanceRemaining() will return invalid value when the robot we setAhead(0) or setBack(0), but the robot needs to brake. AFAIK, this must have an impact on the scores and hence the results for all the alphas! :-(

I intend to make at least an Alpha-5 with all bugfixes since 1.7.1.3. Perhaps I should make 3 new alphas to replace Alpha-2, Alpha-3, and Alpha-4 so we can retest with these? Suggestions are very welcome. We could also just make a decision of which rules to use, and only make an Alpha-5, e.g. stick to the old rules (like Alpha-2)? --Fnl 22:13, 7 August 2009 (UTC)

New Alphas

I have now created 3 new alphas (5, 6, 7) to replace the old alphas (2, 3, 4):

I propose that we make a new table and compare each new alpha against only version 1.6.1.4. --Fnl 22:47, 7 August 2009 (UTC)

Note that the new alphas contains a bugfix for the missing RobotDeathEvents too. =) --Fnl 22:53, 7 August 2009 (UTC)

Is the code for the new alphas available in SVN? I'd like to test some of the routines. I'm still studying the new code at User:Positive/Optimal Velocity and I'm a little confused now as to what was used in each alpha... --Darkcanuck 04:55, 8 August 2009 (UTC)

No, current code is on Fnl's machine. It isn't available in SVN yet, though I want it too (maybe movement-1/2/3-workspace branches?) » Nat | Talk » 06:02, 8 August 2009 (UTC)
Nat is right. I don't commit the code for alphas as the intension of an alpha for Robocode is to detirmine if the code should be commit to SVN. An alpha is considered as a prototype/experiment, and are never "official" (hence not in the SVN). However, I have updated the links on the listed alphas so you are able to find the code. --Fnl 06:13, 8 August 2009 (UTC)

If you want to study the code, all you need to do is to read out the source code from the SVN trunk/head, and replace the getNextVelocity() depending on which alpha you need to examine (from the links above). In addition, you need to replace the buggy updateMovement() method with the one I added on the buttom on here yesterday. That's it. =) --Fnl 06:16, 8 August 2009 (UTC)


Okay guys, I decided to add the last two alphas before the Beta:

Now we could retest again, and if you have not put your vote yet in the vote table above, please add your vote. :-) --Fnl 20:12, 9 August 2009 (UTC)

Great - I'll start running tests this evening / tomorrow morning, should have a lot of results posted in the next couple of days. --Voidious 20:53, 9 August 2009 (UTC)

Super! I can't wait to see the results. I hope that the new version of updateMovement (by Nat) and also the bugfix for the missing RobotDeathEvents compared to the old alphas will give lesser differences in the scores compared to version 1.6.1.4. We might even see a smaller difference between the old vs new rules after all when the old aplhas are buggy. =) --Fnl 22:25, 9 August 2009 (UTC)

I've come around to the idea of the split-tick decel rules, despite the fact that they make DrussGT's movement simulator wrong. The discontinuity that Rednaxela pointed out was the swaying argument. --Skilgannon 21:57, 9 August 2009 (UTC)

Heh. I was convinced that you would vote for Alpha-2. Great that you added your vote. =) --Fnl 22:17, 9 August 2009 (UTC)
No tie-break now. But we need to see new result before vote again. » Nat | Talk » 09:05, 10 August 2009 (UTC)

Hmm.. looks like the difference between Alpha8 and Alpha2, largely fixes the "Raiko vs Moebius" discrepency. I wonder how Alpha9 will turn out now :) --Rednaxela 04:06, 11 August 2009 (UTC)

Oh, and I just noticed you said "eg -1 to 0.4" Voidious. Shouldn't that be 0.5, not 0.4? -1 to 0 is half the decel that can occur in a tick, and 0 to 0.5 would be half the accel of a tick. --Rednaxela 04:11, 11 August 2009 (UTC)
Yep, just a typo. =) Good catch - thanks! --Voidious 04:23, 11 August 2009 (UTC)

IIRC Raiko also saves data, so having 1000 battles in 1.6.1.4 vs. only 500 in Alpha9 would naturally favor the 1.6.4 results. --Skilgannon 10:57, 11 August 2009 (UTC)

Wow, I had no idea Raiko saved data. Looks like he's not saving it against Moebius because he only saves if he wins < 70% of rounds, but is saving against Toorkild. How crazy to use up so much space with file saving, restoring, and logic behind it in a MiniBot! --Voidious 12:36, 11 August 2009 (UTC)

I think I will run another 500 battles for Raiko vs Moebius in the new alphas. Very strange that the difference is the opposite of the Alpha2 vs Alpha3 difference... --Voidious 23:37, 11 August 2009 (UTC)

I wonder, is my updateMovement() has some bugs? It should be all green... Look at all scores & diffs again, there are a lot of 'weird' result. » Nat | Talk » 00:04, 12 August 2009 (UTC)

Hmm.. I wonder if it is due to the other bugfixes in the two recent alphas (8 & 9)? At least the differences in scores compared to Alpha-2, 3, and 4 seems to be lesser than with the new Alphas. --Fnl 21:16, 16 August 2009 (UTC)

Should we update the pool now when the new results from Voidious are ready? I need to know what version to include in the comming Beta. ;-) --Fnl 21:16, 16 August 2009 (UTC)

Hey guys, conclusion please? I think jab has had change his vote to Alpha-2 so I think it's a tie now. I think you can choose what you want, Fnl, we all respect your decision, don't we? =) » Nat | Talk » 13:11, 17 August 2009 (UTC)

I don't have any more tests running at the moment. What else do you guys think I should run? Or is this enough to draw a conclusion? I think I'm sticking with my alpha 2 vote, though I'm still torn, and I should probably investigate the two problem pairings further. I'm not sure jab really decided on one. It looks to me like the new rules will win the vote. --Voidious 14:28, 17 August 2009 (UTC)

Well... I'd say it would be worth investigating "Raiko vs Moebius" if anyone has any ideas about what may be going wrong. "PrairieWolf vs DuelistMini" may be worth investigating before the final release, but I don't think would affect the choice between Alpha8 and Alpha9 as it scores similarly high either way. I'm not sure though, what the issue could be. I've looked at both the Mobius and Raiko code and didn't see anything that I'd expect to be significantly affected by the rule fix. Until further notice though, my vote sticks with Alpha9 style. --Rednaxela 14:41, 17 August 2009 (UTC)