User talk:Voidious/Robocode Version Tests

From Robowiki
< User talk:Voidious
Revision as of 15:36, 29 July 2009 by Rednaxela (talk | contribs) (Raiko vs Moebius)
Jump to navigation Jump to search

I don't think there are any differences between 1.6.1.4 and 1.7.1.1. The difference is quite big, 0.82%! But I think we are still using the fix, right? » Nat | Talk » 15:51, 17 July 2009 (UTC)

Yes, it's a pretty big difference (0.72% actually). 250 battles might not be enough, though, maybe I should run more for that pairing. If there is a 0.72% difference, I'd probably be against the change. But there's lots more to test first - I see my 1.6.1.4 CPU constant is ~20% higher than the 1.7* versions, that could even play a part. --Voidious 20:36, 17 July 2009 (UTC)

The Surfer-vs-Surfer battle seems to affect much more than Surfer-vs-Random battles. » Nat | Talk » 05:08, 18 July 2009 (UTC)

Not necessarially Nat. These results could just mean that Komarious and Ascendant are affected differently than Diamond and DrussGT. Komarious/Ascendant vs the random movers would be needed to tell that. Really this test would be more revealing if there was a full round-robin between involved bots. I'd also wonder: Do Komarious or Ascendant actually assume anything that's breaks in Alpha2? It's also worth noting that tests of score alone can't tell us if changes better match assumptions a bot is making, since a bot could just by fluke happen to operate in a way that is happy with conditions that the code didn't try to assume at all. --Rednaxela 05:26, 18 July 2009 (UTC)
There's also just a much larger variance in those pairings, so maybe 500 battles isn't even enough. My initial goal was just to find out if there was a measurable difference among these versions, so I was going for diversity in the battles I used. Full round robin between all bots (in all versions) might help in deducing causes, but this testing is already taking a "metric ass-ton" of CPU cycles =), so I'd definitely reduce the # of bots if I were to try that. It would still be pretty speculative, though.
Assuming that we have enough battles, the Diamond vs Komarious result I believe shows there are other differences between 1.6.1.4 and 1.7.x that are contributing. I noticed the CPU constant is a little different, but I'm pretty sure nobody's skipping turns anyway. The Alpha2 updateMovement code shouldn't change anything for these two bots, I don't think.
The PrairieWolf result is bizarre. 2% is well above any margin of error on 500 battles, I think. I thought PrairieWolf would have decreased performance, if anything, when we changed the +1/-1 decel rules, but he does better in Alpha3.
--Voidious 05:48, 18 July 2009 (UTC)
ATWHEB (Assuming that we have enough battles), it seems that the surfer gains score from this changes, which seem weird... Since most surfers use old way, including the old decel-through-zero rules. I wonder what DrussGT vs. Diamond score will look like.
I think PrairieWolf vibrate a bit shorter so DuelistMini missed him. » Nat | Talk » 06:07, 18 July 2009 (UTC)
You might have to test DuelistMini against another opponent -- maybe it's the one that's doing worse, as opposed to PrairieWolf doing better? Thanks for running all these tests, btw. Hopefully they'll help pick the right version and not just confuse things further! =) --Darkcanuck 06:25, 18 July 2009 (UTC)
No problem, though I will be taking some time to work on Diamond this weekend. =) Yeah, I don't think we can draw many conclusions just yet. I also just realized that both PW and DM do data saving... I'm actually not sure if that might throw off results or not (hopefully it does?), but maybe I'll try to find another vibrating test bot. I'm gonna run some stuff in 1.7.1.1 overnight here. --Voidious 06:36, 18 July 2009 (UTC)

Just want to know, where do you guys consider as 'unacceptable' (the red value)? Currently I use ±0.1 as 'margin of error' (green value), ±0.4 as 'acceptable' (black value), beyond that is all 'unacceptable'. I don't normally run a lot of battles so I don't know where it should be. » Nat | Talk » 07:30, 18 July 2009 (UTC)

Margin of error = ±0.1 when comparing results from 500 battles is probably about right, maybe even ±0.2. (That would mean margin of error on each 500-battle result is half that.) I'm unsure of my opinion on "acceptable". My first instinct is "any measurable change is unacceptable", but I need to think about it. I just really, really hate the idea of all our old Robocode bots slowly becoming (artificially) weaker and weaker, just because of changes to Robocode... --Voidious 16:37, 18 July 2009 (UTC)
I too dislike the idea of changes progressively making old bots weaker and weaker, but I really don't think that these changes are something that makes old bots progressively weaker. I have a feeling that most of the score changes that could be seen would not be due to slightly breaking old assumptions, but instead are just flukes arising from things just plain being a little different. The tests show DrussGT vs Ascendant having stronger DrussGT scores than before, but I doubt that Ascendant assumes old behavior in a way that DrussGT doesn't. At very least, I think the number of bots that are not acting quite as intended due to old Robocode movement being unintuitive/weird, is greater than the number of bots that truly intentionally assume old behavior that differs from the Alpha2 mode of operation (Alpha3 being a slightly different story). --Rednaxela 22:35, 18 July 2009 (UTC)
I mostly agree, which is the reason I might be OK with some measurable score differences. However, I think that if there continue to be changes that have measurable impacts on scoring, older bots will gradually get weaker (or should I say, "weakerer", since they're bound to get relatively "weaker", anyway). Even if the code doesn't assume anything explicitly, it may implicitly -- it was tuned in a certain Robocode environment that is no longer reflective of how Robocode works. --Voidious 22:59, 18 July 2009 (UTC)

Let me throw in my 2 cents to the discussion. Although I am a rather conservative guy, I also like clear and simple behaviour. Therefor my vote undoubtly goes for the Alpha-2 variant. Differences in score between the old code and Alpha-2 I take for granted as the old code just seems flawed.
I think that the Alpha-3 is one step too far, as the vibrating movement was rather popular in the days before Wavesurfing, and is used in several bots from that time. As for solving 'bugs' in the old code, there have been more changes that did influence the score, and we accepted them because they were logical and better following the rules than before. Think about onBulletHitBullet, which happens 50% more often now than in 1.0.6, favouring the bots that take them into account. --GrubbmGait 00:08, 19 July 2009 (UTC)

I took the weekend to work on Diamond, but I'm back to running lots more tests: DrussGT vs Diamond, Raiko vs Toorkild, Raiko vs Moebius, and Toorkild vs Moebius. 1.7.1.4-Alpha3 should be done before I get home today, then I'll run Alpha2. --Voidious 15:13, 21 July 2009 (UTC)

I thought my original RoboResearch install dir used 1.7.1.1, but it was actually 1.5.4, so I've removed those scores for now. Doh! The rest are definitely right. :-P --Voidious 22:13, 21 July 2009 (UTC)

Hmm... Raiko vs Moebius has interesting score... So Raiko moving at integer velocity make the nano pm match better? » Nat | Talk » 13:11, 22 July 2009 (UTC)

I am back from vacation. It seems like most of you like Alpha 2 more than Alpha 3, and I bet that most people would want the old behaviour, even though it's buggy and hard to understand - at least harder than the new version. I don't intend to make more changes to Robocode for the movement ever after this change. So, is Alpha 2 the version we go for? Also notice, that bugs fixed in Robocode 1.7.1.x might have an impact on the scores (e.g. 'Bug [2793464]', 'Bug [2740708] - Fair Play!', and 'broken AdvancedRobot.setMaxTurnRate() that did not work since 1.5.4') --Fnl 12:15, 25 July 2009 (UTC)

I personally don't have much preference between Alpha 2 and Alpha 3, but think that Alpha 2, with a limit to prevent acceleration from -0.1 to 1.9, would be preferable, so that it wouldn't be strangely advantageous for a bot to avoid hitting 0 speed ever. Even though some bots are known to have movement optimized for the case when you can accelerate from -1.0 to +1.0, I'm doubtful there are currently any that do things like hit -0.0001 speed instead of 0, in order to accelerate to 1.9999 --Rednaxela 15:03, 25 July 2009 (UTC)

That's the only difference between Alpha2 and Alpha3. Alpha2 still allows decel-thru-zero (e.g. going from -0.1 to 1.9) whereas Alpha3 switches from decel to accel once zero is reached. --Darkcanuck 18:34, 25 July 2009 (UTC)

I myself prefer the Alpha3 more since it is Mat himself suggest this, but I'm newer than everybody else here except Positive. And because the Alpha2 would bring a strange behavior that allow robot able to accl from -0.000000000000000001 to 1.999999999999999999, which is not fair. Rednaxela, I don't think there are any robots do that, but it may have impact on score. If we do that, we must have Alpha4 with that to make Voidious happy ;) » Nat | Talk » 15:23, 25 July 2009 (UTC)

Well, I was thiknking that an Alpha4 might not be a bad idea :P --Rednaxela 15:28, 25 July 2009 (UTC)

I'm still firmly in the Alpha3 camp. As Voidious pointed out, there's no reason to introduce a change in behaviour that only fixes things halfway. Either the decel-thru zero quirk gets fixed (Alpha3) or the old behaviour should stay (Alpha2). In either case, all bots benefit from an improved Robocode movement algorithm. --Darkcanuck 18:34, 25 July 2009 (UTC)

We should have poll somewhere... » Nat | Talk » 18:50, 25 July 2009 (UTC)

I'm torn at this point. In general, I'm against changing Robocode rules more than we need to, and I don't see the -0.1 to 1.9 quirk as that big a deal. (We've known about it for a long time, but who even cares to do it? It's very little gain for really uglying up your movement code.) I also greatly respect GrubbmGait's opinion, and while I don't speak for them, I believe ABC and David Alves are usually against changing Robocode physics rules, too.

On the other hand, the Alpha3 solution is very elegant; there are noticeable score differences, but nothing major or seeming to favor specific bots; and the "vibrating" movement is still possible, just between ±0.7 instead of ±1.0. I also respect all of your opinions =), and Darkcanuck has seemed very conservative about Robocode versions in the rumble, and he says Alpha3... Sorry for such a long post to basically say "I don't know", but just felt I should post my thoughts.

--Voidious 20:17, 25 July 2009 (UTC)

We could do an Alpha 4 - no problem. It might turn out to be better. Could you do a Hijack 3 with the code needed? Then I will make a new Alpha 4 for it, and everybody will be able to see the code for it. :-) --Fnl 20:59, 25 July 2009 (UTC)

I believe the needed getNewVelocity() is already done here: Positive/Optimal Velocity. I haven't tested it, but it looks right to me. --Rednaxela 21:48, 25 July 2009 (UTC)

The version on the bottom of this page is the most recent.

If you're interested in my opinion, I think keeping alpha-2 is probably the cleanest solution as well. It provides most backward and forward compability. If you want to design very precise robots now, you will need to do so for 1.6.1.4 anyway, so it's nice if you know you won't eventually have to adjust again.

The only reason to change it, would be to make it easier on the life of future perfectionist coders. You won't bother them with having to code special -0.1 > 1.9 codes to use the full available potential. Alpha-3 would solve that mathematically elegant, but would force them to deal with fractional velocities instead. Alpha-4 solves it a little less elegant, but I believe it solves the problem just as "correct" (going from -2>-1>1 isn't really better than -2>0>1, because it gives different displacement and turn rates), but with more backward compatibility and no fractional velocities. So I think that would be best. :) --Positive 23:13, 25 July 2009 (UTC)

Okay, I have now assembled the Alpha 4 for download based on the version on the bottom of this page. Perhaps we should run the tests for this version also, so we can compare it against the results for the Alpha 2 and 3? --Fnl 21:47, 26 July 2009 (UTC)

I'll kick off some tests today, starting with the most varied ones among the current results. I think it will take a couple of days to run through them all (assuming I can resist tweaking Diamond for that long =)). --Voidious 18:47, 27 July 2009 (UTC)

I've not yet seen any strange behaviour resulting from the changes in Alpha-4 so far, having run and watched about 100 rounds of 10-round batles. So I'd say it's implemented correctly. I don't know if it differs on score, though. --Positive 16:01, 28 July 2009 (UTC)

Hmm, the way that the new version are always getting notably higher in "Raiko vs Moebius", seems to say to me that either 1) something else that happened in 1.7.x is affecting either Raiko or Moebius, or 2) the 1.6 battle just happened to be lucky for Moebius and more seasons are needed. --Rednaxela 14:36, 29 July 2009 (UTC)

You cannot post new threads to this discussion page because it has been protected from new threads, or you do not currently have permission to edit.

There are no threads on this page yet.