CPU benchmark advice

From User talk:Voidious
← Thread:User talk:Voidious/CPU benchmark advice

Jump to navigation Jump to search

CPU benchmark advice

Say, any of you Robocoders have a fast quad-core machine (like Core i5/i7 or comparable) and feel like advising me? I'm considering buying a Core i7 (2600k) quad-core box that would mainly (for now) be for Robocode. But I'm wondering how much of a speed increase this will offer me.

How long does a minimized 35-round battle of Diamond vs itself take? (Maybe run one then "Restart", if that helps JIT things up...) I'd need to know the Diamond version, Robocode version, and what CPU you've got to make full sense of that info.
How much of a speed hit do you take per battle when running 4-threaded RoboResearch? Ie, if a given battle takes 60 seconds when you run single threaded, does it still take 60 seconds when you run 4 Robocodes, or how much of a hit does it take?

This would be a huge, geeky indulgence, so I'd love to get some idea what I'd be getting for my money if I actually pull the trigger. =) Thanks!

I have a much AMD Phenom II x4 @ 3.6 ghz. My AMD is considered slower then say a higher end i7.

You can see here for a ALU comparison. http://www.tomshardware.com/charts/desktop-cpu-charts-2010/ALU-Performance-SiSoftware-Sandra-2010-Pro-ALU,2408.html

Mine is closest to the AMD Phenom II X4 975 Black Edition on this chart (Overclocked 965 to 3.6). Since Robocode is math heavy you can see the result each chip gets.

For a real performance reference see the amount of rumble I can perform an a given period (4 clients).

On this chart, the 2600K gets over twice the score of my CPU. 114.30 vs 55.0.

— Chase-san‎

Well, pretty sure I've done 100k battles in a month, so this tells me it's 2.3x as fast. I probably wouldn't shell out $700 for double the Robocode power, but I'm guessing it's much more of a multiplier than that. Also, I reckon performance could scale differently with simple bots (many of your rumble battles) vs high-end bots, which are surely much more memory-intensive, and thus perhaps not as much sped up by an increase in raw CPU power.

So I'd still really love to know the time a certain battle takes and how close to linearly your Robocode power scales with # of cores...

I have done about 164359 battles so far, so about 41090 per client, 10 days in about so multiply that by 3 for a full month, only a total of about 123,270 per client. But that cpu is about twice mine in math, so estimate around 250,000 per client over a period of a month, so 2.5 times yours with a single client (even if you have more then 1 cpu, it only uses 1 cpu worth of cpu time). Times 4 for 4 clients equals about 1 million. This totals to about 10 times yours if you only run a single client, or 5 times with two.

All things being equal. However to answer you're original question. I do not know the exact amount of time it takes, but it isn't very long. Also as long as I stick to only 4 threads, the speed is equivalent to running only 1 thread if my computer is doing nothing else.

Because of intels hyperthreading, you may be able to get away with 5 or 6 threads without much overall hit.

— Chase-san‎

Well, thx for the info. That assumes Robocode power scales linearly with benchmark scores, which is something I don't trust, or I wouldn't even be asking this. :-) And comparing RR client battle count is a very rough estimate, too. (Maybe I've done 150k? Don't remember, and who knows what bots or if that was full time...)

If anyone wants to serve up some cold hard battle times and single vs 4-thread comparisons, I'd still much appreciate it!

Just in the last week I got a AMD Phenom II X6 1090 at 3.2GHz here. Sure, it's slower per core than a high end i7 like the 2600K, but on the other hand 1) The CPU is practically half the price of a 2600K, and 2) six cores rather than four is nothing to sneeze at for robocode purposes.

Running Diamond 1.6.7 versus itself, 35 rounds:

50.265s average (Trials: 49.735s, 43.292s, 51.542s, 53.741s, 50.764s, 52.277s, 52.839s, 47.930s)

This is without GUI, and including robocode startup time (about 1-2 sec)

Running Diamond 1.6.7 versus itself, 35 rounds in 2 separate robocode instances:

29.305s per battle

58.611s per instance (Trials: 58.016s, 61.019s, 56.540s, 59.753s, 55.980s, 62.418s, 57.420s, 57.741s)

Robocode startup time increased to 4 seconds. This would not be a factor in a battle runner which runs multiple battles in the same JVM!

Running Diamond 1.6.7 versus itself, 35 rounds in 4 separate robocode instances:

16.294s per battle

65.174s per instance (Trials: 65.207s, 66.350s, 67.326s, 66.458s, 62.473s, 62.683s, 65.189s, 65.709s)

Note, robocode startup was already seemed highly parallel, because robocode startup now took up to 8 seconds for one instance! As such, about 6 seconds of the increased time can be attributed to robocode startup.

Running Diamond 1.6.7 versus itself, 35 rounds in 6 separate robocode instances:

15.736s per battle

94.417s per instance (Trials: 91.325s, 92.572s, 93.809s, 94.710s, 95.344s, 95.738s, 97.421s)

The gains seem to flatten out about here. One note is, because one instance of robocode on it's own uses something like 115% of a core, I should reach a CPU limit at 5-ish instances, not 4-ish, so I suspect I'm hitting a memory bandwidth bottleneck

One instance of Robocode can use more than 115% of a core. It oscilates between 100% and 200%. It is expected to see a performance decrease in benchmarks when you run more instances than half of your cores (all instances using 200% at the same time).

MN‎

Why would it use 200%? According to Pavel, different robots can run on different cores, but they are synchronized so only one is running at once, basically capping your actual performance at the speed of one core. So it should be 100% + some JVM / Robocode engine overhead, I'd think.

That overhead happens about 30% of the time, so an instance uses about 130% cores average. But there are peaks of 200%. When I run 3 instances on 4 cores, they use all cores most of the time, but you see one idle core sometimes (and it´s not uploading).

When running test beds, I run one instance per core (and disable turn skipping), so all cores are used all of the time.

Running a benchmark restricting each instance to a single core would remove that parallel overhead.

MN‎

If I had to guess... I'd guess that the peaks would be the JVM garbage collection because that does happen in bursts, and does run in it's own thread(s) independent of whatever java code is running.

Can also be the JIT compiler, which compiles code in parallel by default. It activates at least once for each new battle.

MN‎

Oh yeah, that does remind me, I do have some pretty wicked memory in here. Tuned just so to get maximum speed out of it (which usualyl in most things I do effects overall feel of speed my computer has more then pure cpu power).

If I recall my DDR3 is running at 1600 with timings 7-8-7-20.

— Chase-san‎

Quick little note to compare, DDR3 running at 1600 here too, but with 9-9-9-24 timings. Anyway, at 4 threads I don't suspect I'm hitting memory bandwidth bottlenecks, whereas it looks like I may be at 6 threads.

The cores all share an L3 cache too... I wonder if it's worth the extra $100 to get 1600 RAM (and the mobo that supports it). I've been buying Macs for the last 5 years, I feel like such a noob examining this kinda thing again. =)

Extra $100 for 1600MHz RAM? My ram only cost $50 for 8GB, and I didn't see notably cheaper in slower ram really. As far as motherboard, mine was a little fancier than some others, but it was only about $115. So... It shouldn't cost $100 extra for 1600MHz ram.

Yeah, I'm looking at barebones kits which default to a pretty cheapy motherboard, so most of that was to upgrade to a decent one. Prolly worth it anyway, and while it's not quite Apple-level gouging on memory itself, I guess it's universally true that I should buy/install my own. ;)

That's great info Rednaxela, thanks! Btw, how are you timing the battles so precisely, and measuring JVM startup time? I wasn't expecting 3 decimal places. =)

I'm getting 79s / battle single-threaded, 42s / battle with duel-threaded on my MacBook Pro (Core 2 Duo 2.8 GHz), just trusting the times output by RoboResarch. So it looks like you're almost 3x as fast, which is pretty darn close to the PassMark scores (6053 vs 2029). So maybe I can hope for 5x as fast with the 2600k after all, which would be fabulous!

For timing I'm just running the *nix command "time ./robocode.sh -nodisplay -battle battles/diamond.battle". For Robocode startup time (including JVM startup but not just that), I'm just roughly estimating by watching the command line output.

Here's some fun... I tried using my motherboard's "automatic overclocking" functionality where it autonomously tries to see how high it can clock things, and it decided it could get it up from 3.2GHz to 4.2GHz (+30%). Both Windows and Linux booted fine, so initially I thought it was stable, and it ran one robocode battle at a time fine, but as soon as I tried to run multiple in parallel, the JVM kept crashing and it became apparent that the +30% overclock was not stable despite OSs booting fine. Interesting thing was, the +30% overclocking seemed fine thermally even with the stock cooler, it just had other stability issues.

I'm now running a more modest +12.5% CPU overclock, and I got the Diamond versus Diamond runs down to 12.837 seconds-per-battle, running 6 in parallel. This is still with memory running at 1600MHz, so I guess what I was hitting before wasn't purely a memory bottleneck anyway. Also, huh, 22.5% increase from a 12.5% CPU overclock...

I ave a Intel 2600k(3.4Ghz, 4.4Ghz w/ Turbo Boost) and I did a quick benchmark for you. Using Diamond 1.6.8, no GUI, using Powershell to measure.

1 Instance, 35 Rounds:

- 48.1 seconds Total

2 Instances, 35 Rounds:

- 47.28 seconds Total
- 24.17 seconds per Battle

4 Instances, 35 Rounds:

- 1:05 Minutes Total
- 15.2 seconds per Battle

8 Instances, 35 Rounds:

- 1:31 Minutes Total
- 11.37 seconds per Battle

Though I noticed that Powershell had a small delay between creating each instance, not sure why. I haven't had a look at RoboResearch, so maybe I'll have a look at that later.

Not sure if it just my benchmark setup, but if Rednaxela could send over the benchmark setup, maybe I'll be able to test it in the same way.

I'm just running the unix command "for i in {1..8}; do sh -c "time ./robocode.sh -nodisplay -battle battles/sample.battle > /dev/null &"; done" with the battle file set to run diamond versus diamond 35 rounds. I then take the average time outputted from each "time" command and use that.

Hmmm... 8 instances... I just tried 8 instances here and got the following result on my Phenom II 1090 X6 that's clocked up from 3.2GHz to 3.6GHz..... 1m35s per instance average, 11.90 seconds per battle.

Voidious: It seems like the 2600K may not be as fast for robocode as non-robocode benchmarks would lead you to believe?

Yep, I was guessing closer to 5-6x - the PassMark score is 5x my current CPU, and I figured if anything Robocode would scale better to more cores than the average benchmark. I really appreciate all the real world info! Definitely impacting my purchasing decision.

Oh, and one little warning, when I run the X6 1090 at the OCed 3.6GHz, with 8 robocode instances, and stock cooling, it pushes the CPU temperature awfully high (61C when the CPU is spec'ed for a maximum of 62C). Pondering clocking it back to 3.2GHz now that I noticed that, haha.

On completely unrelated thing, maximum temp is 62? That's pretty low... I know I have pushed my Core 2 P8400 (mobile processor) to 95C before the system shut down to protect the cpu. 60C is my standard CPU temp when I am not in air-conditioned room (50C in air-conditioned room). My graphics also goes up to 108C without problem...

Nat Pavasant‎

Mobile processors are spec'ed completely different for heat from what I've seen (My old Core2Duo laptop was spec'ed for up to 105C for instance)

Aha, I think I found why...
Compare i7-2640M (mobile) and i7-2600K. Notice that the mobile part is specified with "Tjunction" whereas the desktop is specified with "Tcase". It appears desktop CPUs use temperature measurements of the packaging temperature whereas mobile parts directly measure the die temperature. Fun stuff :)

Thanks Rednaxela! I've rerun the test without the little delay between starting up instances, and calculated it the same way (if I understood it right) and this is what I got 1:17 per instance average, 9.70 per battle.

I don't have *nix to test on my home computer at the moment. My server has *nix, but it's stuck with a Quad-core Xeon in it ;)

Cool, thanks Cuoq! I wouldn't have expected hyperthreads to help so much to scale beyond 4 threads. So it looks like up to 4x as much Robocode throughput as I have now is a pretty solid estimate. Now I just need to grapple with whether that's worth a few hundred bucks... =)

I re-ran some tests here with Diamond 1.6.7 on a fresh Robocode 1.7.3.2, using the time command like Rednaxela. I'm seeing about 57s/battle when I run one at a time and 39s/battle when I run 2 in parallel (just duplicating the command, adding &, taking the max elapsed). I forgot my RoboResearch dir had a cranked CPU constant.

Belated thanks to everyone for the input... Finally pulled the trigger on a Core i7-3770 this weekend. I'm so stoked! Waiting a few hours for dev versions of Diamond to run 500-1000 battles against my test bed has been unbearable lately!

FYI, running Diamond 1.6.7 vs itself as above, on Robocode 1.7.4.0, I get:

1 instance: 65s (yes, 1 is taking longer than 4 right now, wtf?)
4 instances: 4th one finishes at 53.34s. (~13.34s / battle)
8 instances: 8th one finishes at 80.47s. (~10.11s / battle)

One thing I find very odd is it's setting the CPU constant higher here than on my MacBook Pro (5.8m vs 4.0m). Seems like running Diamond 1.7.37 vs itself is about twice as fast, so I'm not sure what's up with that.

Anyway, enough screwing around, time to kick off some RoboResearch. =)

You do not have permission to edit this page, for the following reasons:

The action you have requested is limited to users in the group: Users.
You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.

You can view and copy the source of this page.

Probably [[Wikipedia:Intel Turbo Boost|Intel Turbo Boost]] is confusing the Robocode engine. The engine assumes CPU speed is constant, which is not true with [[Wikipedia:Intel Turbo Boost|Turbo Boost]]. CPU constant is being measured while the clock is still low.

Return to Thread:User talk:Voidious/CPU benchmark advice/reply (28).

I'd guess dynamic clocking (i.e. turbo boost) also explains the 4 instances running faster than 1, since 1 instance may not be setting the cpu usage high enough for the CPU to go to it's full clock speed.

I figured you guys were right, but I tried running 1-3 threads of RoboResearch to trigger any clock increase and recalculating the CPU constant while that was going on, and it came out even higher. My best guess is the benchmark used to calculate CPU constant is just way more optimized on Mac and/or on Apple's JVM than in Ubuntu/OpenJDK. Still sounds like a good bet on the slow result for 1 instance though.

Try running background threads in lower priority, or clients in higher priority.

But never tried OpenJDK, I´m using Oracle/Sun Hotspot JVM here.

MN‎

Retrieved from "http://robowiki.net/wiki/Thread:User_talk:Voidious/CPU_benchmark_advice#CPU_benchmark_advice_1131"