View source for Talk:RoboRunner
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Can i ask you to have a look at .... | 9 | 08:45, 28 August 2012 |
Possible bug report | 13 | 22:17, 23 August 2012 |
calculating confidence of an APS score | 12 | 02:31, 15 August 2012 |
smart battles | 9 | 16:15, 13 August 2012 |
priorities | 7 | 09:24, 1 August 2012 |
Congrats! | 6 | 17:21, 28 July 2012 |
First page |
Previous page |
Next page |
Last page |
Hi mate. I'm not sure if i can bother you to have a look at the roborunner changes i made. Maybe the development state is a little to early, but i would like to know what you think.
What is new:
- configuration - included, no need for external scripts and Windows should be supported as well
- just type CONFIG and go through the options
- this will make all internal robocode directories (depended how many installations you want)
- it should be quite fail prof and checks the input for validity
- you also can re configure it if you want to switch to another robocode version or something
- challenges - can be switch on the run
- type CHAL to go through the options
- the challenge file format should be the same as it where before
- all missing bots will be copied to the instances (well not new but it works like before)
- the processes stay initialized
- that means if you have once started the instances they are ready to take more battles after the challenge is over
- or you can switch to another challenge and run it on these instances as well
- you can stop running challenges
- if you type STOP while the challenge is running all processes stop the current battles and can be restarted (they stay initialized)
- with DEBUG you get additional informations (about the messages and some standard output from the processes) - i plan to make this configurable so you can see what you want
- with AUTORUN the next time you start the program it takes the last configuration and challenge and runs it automatically
- with STATUS you get the configuration and running state of all processes
- HELP shows some help (not much yet)
What is not ready yet:
- everything with result output is not finished yet
- the results are coming back from the processes (and will be printed to the console) but there is no processing on these informations right now
- i plan to take your code and then it will be possible to print results with whatever output you prefer (also offline results)
- basically this means - you can say: show me result bla (avgDmg,score,cats,dogs - whatever) and it will be extracted from the current available results
I would be interested if it runs with all the cpu you have and has no concurrency issues. And if the usability is ok.
I had to change quite a lot but it has still the spirit of your RoboRunner and should work as same as yours. It is based on a very basic communication protocol to make it extendable for later needs.
You can find it here: roborunner_wompi.zip. To start it just run the ./rr.sh (same as yours)
If you don't trust the class files, the sources are included or available from GitHub as well.
You don't have to play much with it, just a quick start and configuration and one of the example challenges would be great. This should only take a couple of minutes.
Take Care
Cool, I'd be happy to take a look! I particularly like the idea of having a Windows-compatible setup. I think my main concerns from your changes are:
- If RoboRunner stays running after finishing a challenge, that would screw up how I often use it, which is having multiple dev versions queued up in separate runs via shell script. So I'd either need to make that configurable, or also add support for running batches (which I guess would also be necessary if ever we have a GUI).
- The interactive commands sound really powerful and I would definitely use them :-), like when a dev version is tanking and I want to just kill it and move onto the next one I have ready to go. But I also like simplicity and not having a big learning curve to using RoboRunner, so I just want to make sure it doesn't feel like "you have to learn a bunch of commands" in order to use it. So I'd like to make sure you can get by without knowing them, and/or that they're really easy to find and learn about. A lot of how I use RoboRunner is queueing up a few runs and leaving it for hours, so I definitely want to keep full support for non-interactive batch runs too.
Thanks man! Very cool to have someone else using and contributing to this. :-)
And I promise I have not forgotten about the custom scoring, I just haven't gotten to trying out some of my ideas with it. I was curious if you're still using that in your development? And if so, what kind of stuff do you collect and how do you like it?
Yes i rewrote most of the original stuff to be highly configurable. I'm a little unhappy with all the changes right now but i hope in the end it will pay off to have something really nice to run test beds. It's fairly easy to provide batch runnings with multiple dev versions. I just have to make the challenger input ',' separated and then it runs all challengers against the current challenge. Or maybe a input file where all challengers are included (linked to whatever challenge).
I guess i use RoboRunner in a sightly different manner right now. While writing on my bot i make a quick dev jar and let the runner make a couple of battles against my test bets. Therefor i can still make changes - and in the background the first results can show me if i was wrong or if i'm on the right track with my changes. That's why i wanted the processes alive. What i had in mind was having RoboRunner running infinitely and if a new version arrives he just grab it and runs the challenge against the new bot version . I also can switch the challenge on the run so if i think i need another view of my development state its just one switch to the console. I'm defiantly on your side of having RoboRunner making its stuff without maintainance. The console is just to have a tool to make changes if you think you have to. And beside of the config stuff its just a run command now.
The main reason because i switched to a communication protocol, is having the possibility to improve later versions with more fancy stuff like diagrams on certain battle states and such stuff. I don't know if you run some melee tests as well but for me its better to have just a couple of precise test beds rather than just let the challenger run against everyone above a certain level. I guess this will be more important if i go for an appropriate 1v1 strategy (someday :)).
About the custom scoring, its more custom battle field statistics for me. It's still one of my main targets for RoboRunner. Some of my statistics gave me a quite nice view of whats going on on the battle field and for what i should watch out. Like average field population (where are the most crowded spots and how was my survival at this spots) or bullets fired far away from me with more as 6 opponents on the field (how much did i catch a hit of these bullets and where would be a better place to stay). Yes its quite a bunch of other stuff to and most of it is just not worth it but, you know, sometimes you have to think strange.
Oh no - real tabs, braces on their own lines, lines over 80 chars?! :-) It's funny, at my last job, almost every file had its own different code style, so I was very flexible. My current job is much more rigid on code style, and now I realize I've also become more rigid... Might have to reformat some stuff, at least in the main package. :-) I just took a quick look for now, though. I'll look more and test out your stuff when I get home later. Btw, should I hold off playing with the code much if you're still making major changes?
The first thing I wanted to do with custom scoring was to track start positions for Diamond in 1v1, to see if there was any pattern to which rounds he lost. Like, when he loses 1 random round out of 35 to Raiko, is it because we started near each other? Or I started in a corner? Or I didn't start in a corner? Or is it just always the first round before we have much data? Seems like there could be a lot to gain just shoring up some of the "unlucky" stuff that can happen in a battle. But yeah, I realized most of the useful stuff you'd do with it would be custom stats like that, not traditional "scores" like with percent score or bullet damage. Passing values back from RoboRunner and storing them in the XML should be no problem, I just want to come up with a nice/simple/flexible API for the listener to log the values and RoboRunner to format them. I might want to peek at some of your code for the dynamic class loading, since I haven't done much of that before.
If you're wondering about Raiko (or any other multi-mode random movement bot), what I find is that I win the first round because of their Musashi trick / stop-n-go / anti-simple movement, and then whether I win the next two rounds is a matter of luck, until I can unlearn their anti-simple movement and learn their actual movement. Simultaneously, these rounds are the ones where their non-rolling-average VCS guns are still adapting quickly, so they get a decent hitrate against the surfing still.
Ah, very good points. The Musashi trick shouldn't last long (stops as soon as they're hit once), but certainly the 1+ rounds of stop and go would screw up my guns. Maybe it's worth special casing that and clearing gun data once you detect the switch.
And the relative fast-learning of their guns in early rounds isn't something I'd thought about much. Maybe there's a place for some light flattening early on as soon as you know they're using something besides simple targeting.
I've just been thinking a lot about all these bots that take 1-2 rounds off of Diamond (and DrussGT). If you're winning 95% of rounds vs a given bot, maybe you’re just a little consistency away from 99%. And it could be from simple stuff, like starting a round cornered or too close. But it might take some real research to figure out some causes (or just give up and accept "randomness" =)).
Hehe the formatting style discussion i know all to well :). I have no problem if you reformat it to whatever you think is appropriate, i'm used to read all kind of code style (working on it is another discussion :) ). I don't know if you develop with eclipse but if so, just give me your formatting file and we will see how it works :). I started programming when all these formatting rules made sense (80x40 terminals). You wouldn't come far with braces at there own line and lines over 80 chars but these days are long long gone :) and with todays monitor resolutions i don't see why i shouldn't use it. I'm surprised that you still have discussions on code style at work. We had some check in format at work and every one had to use it before checking in files to cvs. If you check out the files, just format it to whatever you like and thats it. Maybe i can convince you to give up the 80 char per line rule, because with all these method().method().method() calls it is quite hard to maintain a readable code line. Well, like said format to whatever you think is good :).
If it comes to changes, don't hold you back do whatever you want (in your branch or in mine no matter). This Git stuff is way more fail resistant to merges than cvs/subversion and i have no doubt that i can handle the changes. The code right now is still draft status and i haven't looked where i could bring some things together or should more open and bring it apart. I just wanted to bring it to life and start the improvements from there. I'm thinking of changing the communication anyway to RMI or TCP but for now i'm fine with the process in/out like you did. It's also ready to take a GUI (but thats not really something i'm thinking of right now). I'm not sure if i have the dynamic stuff included right now i guess it is still in another project but i think i will bring it over within the next couple of updates.
Reading about Diamonds start positions made me think if it would be rewarding to use the start position feature of the RoboCodeEngine. This would give you the opportunity to set all kind of interesting start positions and just rumble it out.
I basically use the default Oracle/Sun format and style guide, just with 2-space tabs (no real tabs). 2-space tabs makes 80 char lines a lot more reasonable, too. I generally don't auto-format because some stuff is still a judgement call, like breaking lines in the clearest way.
I certainly learned a long time ago that code style is just something you have to compromise on to get anything done in a collaborative setting. =) I didn't quite realize I’d become so accustomed to one Java code style until browsing your code. And yeah, I was also surprised at first that code style was so enforced at my current job. It was an adjustment after having the exact opposite situation at my previous job (just stay consistent within a given file). But now I like it, and/or am brainwashed. At a large enough company with a lot of code sharing, it kind of makes sense to just settle on something. Deep down I’m not actually a psycho about code style, but maybe keeping it consistent within each package makes sense, and wouldn't be too painful for either of us.
Ok, gave it a shot and everything seems to be working fine with 8 threads on my Linux box. Actually I like the feel of this environment more than I expected to, it's very cool. And I like the extra stuff you save in roborunner.properties now. Looks like some of the output is just incomplete for now (I didn't see overall scores at all?), and there's definitely plenty of room to polish up the usability side of things, but that stuff's a pleasure to work on once you have all the core stuff working. =)
I'd be happy to take a pass at improving some of the usability stuff, or just writing up a list of ideas, but maybe I'll wait a bit if you've still got some stuff in progress with this. Nice work man!
Thanks man. Sure thing, your improvements and ideas are very welcome. The current design is open for quite a lot of directions.
Like said, everything that relates to score processing (collect,save,average,overall score,smart battles) is not implemented yet. The results you can see, are the results of every battle and contain all available data field of onCompleted() BattleResults. They just have to be parsed and processed now. I just wanted to have the config, chal and messaging stuff to be ready because i know myself to well. Once implemented i probably don't touch it ever again because i don't like writing code around path, file and input checks. After thats finished now i can spend all time in doing the result/statistics stuff smooth and fluffy. Thats why i wanted to have your opinion about it, to make the related changes now.
Heya Voidious,
I think I may have found a bug.
I finished a run of deBroglie rev0130 last night on the test bed you made for me. Score was in the lower 80s.
Just now, I manually made a .rrc testbed with some high performing bots. Started running it, and here's the output. Looks like RoboRunner is carrying over the score from the other challenge file?
~/roborunner $ ./rr.sh -bot tjk.deBroglie rev0130 -c debroglie_mega.rrc -seasons 20
Copying missing bots... 0 JAR copies done!
Initializing engine: ./robocodes/r1... done!
Initializing engine: ./robocodes/r3... done!
Initializing engine: ./robocodes/r2... done!
Challenger: tjk.deBroglie rev0130
Challenge: deBroglie Megabot test
Seasons: 20
Threads: 3
tjk.deBroglie rev0130 vs lxx.Tomcat 3.67c: 39.79, took 57.6s, avg: 39.79
Overall score: 81.16, 170.42 seasons
tjk.deBroglie rev0130 vs voidious.Diamond 1.8.1: 31.91, took 72.3s, avg: 31.91
Overall score: 80.83, 170.5 seasons
tjk.deBroglie rev0130 vs jk.mega.DrussGT 2.7.3: 37.2, took 82.0s, avg: 37.2
Overall score: 80.54, 170.58 seasons
Yep, it seems I'm printing the overall score for every bot you've faced, not just the ones in the current challenge file that's loaded. I'll see about fixing that later today. You can just delete (or rename for now) the file from the data directory if you want to start fresh. Thanks!
Or you could keep/copy just the lines for those bots in the data file, if you feel like mucking with it.
Ok, posted the fix in 1.0.1: [1] Only things to update are the RoboRunner JAR and rr.sh which points to it. It was just a problem with the output, so things should work fine with your old data file, if you still have it.
Hi mate. I got a little Exception :)
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.lang.ArithmeticException: / by zero at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at robowiki.runner.BattleRunner.getAllFutures(BattleRunner.java:95) at robowiki.runner.BattleRunner.runBattles(BattleRunner.java:80) at robowiki.runner.RoboRunner.runBattles(RoboRunner.java:338) at robowiki.runner.RoboRunner.main(RoboRunner.java:89) ... Caused by: java.lang.ArithmeticException: / by zero at robowiki.runner.RoboRunner.printOverallScores(RoboRunner.java:485) at robowiki.runner.RoboRunner.access$4(RoboRunner.java:466) at robowiki.runner.RoboRunner$3.processResults(RoboRunner.java:734) at robowiki.runner.BattleRunner$BattleCallable$2.run(BattleRunner.java:197) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
One question. If i fork RoboRunner to my GitHub repositories and make changes, does it mean i have a new project or is it more like a separate branch and we could merge some changes i made?
Take care
edit: stupid me, i posted just the head
Seems like this would only happen when printing the overall score for 0 battles? Is it possible that was the situation? If so I'm not as worried about it being a bug, but we should check for it and print something nicer. If it shouldn't have had 0 total battles, then it's a deeper problem with the score tallying I guess.
This is my first experience with GitHub, so I don't know for sure, but I'm pretty sure the main idea behind forking is for you to make changes and then I can pull them back in. I think you issue a "pull request" once you've made your changes. I also think it can function fine as a new project if you don't ever intend to merge back.
Yep the problem is deeper. It looks like, if i had no battles before everything is fine (not 100% sure). Then, if i restart the test run this Exception comes up. I run it with 20 seasons (melee).
I made just a quick fix for me, so i can still use it. The only thing i lost was the 'Overall' score output - but i'm fine with the 'Average' output.
I have forked your repository and made a new brunch from the main branch, not sure if you can see it on your side to. The only thing i changed so far is the output of the melee score (just formatting). Yes, i guess i will use it mainly as new project and tweak it to my needs, but i thought for little bug fixes it would be easier to just merge the branches.
So you can see the latest and average score for each battle, but overall score throws that exception? How strange. Could you post your data file somewhere so I can try to reproduce? That would be super helpful. (roborunner/data/package.BotName version.xml.gz)
I'd certainly like to pull back any bug fixes or awesome new features. =) What's your melee score output look like? I've used it for Melee a little but mostly 1v1, and even for Melee I tend to focus on overall score, so I'm open to suggestion. I've also considered a -verbose option (or something) for printing extra scoring details, like survival/bullet damage even when you specify APS as the scoring style.
Ok there it is RoboRunner-bugtrace.zip. Looks like i was wrong it happens straight from the start. I deleted all xml files and the output of the first run is shown in the zip file. Maybe it helps :). Let me know if you need more. i broke the run after the second season with 'CTRL-C'.
Well, i just made the melee output a little more 'eye' friendly :) but i guess i will enhance the output to something that i use in my other outputs in the next days (nothing serious just a little more info on bullet hit ratio of all bots,some movement stats, sorted output of APS and a table of all bot score to each other). Based on an early RoboRunner version i rewrote it to a console like application. So basically you start the program and use console commands to configure,run.output some stuff. Unfortunately does it not use multiple threads and i'm now back to the latest RoboRunner and maybe i can merge the two somehow.
I think if you look in GitHub at Network you should see the forks that go of of your main branch.
Great, thanks! I was able to duplicate it here and figured out the problem. RoboRunner gets confused by having 2 of the same bot in a battle (mld.DustBunny 3.8 in this case). It looks like BattleListener eats the result right away when it builds a map of scores by bot name/version. (Edit: So RoboRunner has zero scores for the actual bot list when it tries to calculate overall score.) I have to head out in a few minutes, but I'll try to get a fix out later tonight or tomorrow.
Ok, I think it's all set. Tested it with the challenge you provided, dropped it into my currently running melee test a half hour ago and that still looks right, and did my usual round of manual setup and tests. The fix was mostly pretty easy thanks to Guava's Multimap stuff, but it also led to some minor refactoring so that nothing is based on looking up a score only by bot name, besides the challenger bot. I think it should work fine even if the challenger is also a reference bot, even though that seems silly - the first score for that bot in each battle would be considered the challenger score.
Hopefully it won't be too painful of a merge for you. ;)
Yep works fine. Thanks. It wasn't supposed to have two of the same bots within the challenge :) - i realized that i just took an old challenge file while switching to the new RoboRunner version. But i guess in this case it was luck to detect the bug.
I tried yesterday to make the challenger a development bot. I changed the copy bot function to let bot names with ..* through but somewhere it lost the name. Can you give me a hint where the bot name comes back from the process? The RobocodeEngine can work with development bots if the properties contain the right path. What it does, if you give it, lets say wompi.Wallaby* , it changes it to wompi.Wallaby* 4.7 for the result output (this works so far). Now i thought i just change the name back to my original (wompi.Wallaby*) within the BattleResultHandler (i guess this is where the results are coming back from the process) and could work with development bots. It was just a quick try and i will try it today more seriously, but maybe you have a quick solution. I guess you are more used to your code and could say where it stores references between name and score. The sad thing is even if i'm giving it the complete name (wompi.Wallaby* 4.7) it doesn't work :(. I guess somewhere the "*" is a limiter or gets lost. Please don't put any time in this, it just would be nice if you have a quick hint.
I have to admit that this GitHub stuff is very neat. It's so easy to work with - thanks for pointing me at this by releasing RoboRunner over GitHub. I'm a little more used to it now and figured out how the forking works.
It's basically: fork your origin -> my origin clone to local (optional) make branches add your origin as remote (this keeps me up with changes at your side) merge remote -> my branch push my branch -> my origin (optional) make a pull request to you
It's pretty straight forward and with GitX you have a nice graphic view about the branches to :)
Take Care
I don't think the name should be interpreted as a regex anywhere or anything like that. I think that whatever comes back from robotResults.getRobot().getNameAndVersion() in BattleListener should be handled by the rest of the code OK. The other points of concern that come to mind are:
- Copying the dev bot into the Robocode install directories means copying your package dir and classes into the robots dirs of each Robocode install, which is not as simple as copying one file. (Unless you have them all configured to look at some other directory?)
- Assuming Robocode can find it, checking whether the dev bot you specify is actually running in the battles.
For the second point, you could try:
- Modifying BattleProcess to do _engine.setVisible(true), so you could see the battles that get run.
- Run robowiki.runner.BattleProcess (with -path to Robocode, -rounds, -width, -height) and try running battles with your dev bot. BattleProcess is a command line application where you can type in a comma delimited list of bots (like "jam.mini.Raiko 0.43,voidious.Diamond 1.8.1") and it runs the battle and spits out the result.
And yeah, I'm liking GitHub too! I know PEZ is a big fan, though he's not doing Robocode stuff. I didn't know about GitX, I'll have to give that a shot. Maybe it will encourage me to make better use of branches. ;)
Hey resident brainiacs - I'm displaying confidence using standard error calculations on a per bot basis in RoboRunner now. What I'm not sure of is how to calculate the confidence of the overall score.
If I had the same number of battles for each bot, then the average of all battles would equal the average of all per bot scores. So I think then I could just calculate the overall average and standard error, ignoring per bot averages, and get the confidence interval of overall score that way. But what I want is the average of the individual bot scores, each of which has a different number of battles.
Something like (average standard error / sqrt(num bots)) makes intuitive sense, but I have no idea if it's right. Or maybe sqrt(average(variance relative to per bot average)) / sqrt(num battles)?
This would also allow me to measure the benefits of the smart battle selection.
I don't actually think this can be correctly modelled by a unimodal distribution - you will be adding thin gaussians to fat gaussians, making horrible bumps which don't like to be approximated by a single gaussian mean+stdev. I almost wonder if some sort of Monte-Carlo solution wouldn't be most accurate in this instance - at least the math would be easy to understand.
Good call! That was super easy. I don't recall this Monte-Carlo stuff, but the name rings a bell so maybe I learned about it at some point.
So I calculate 100 random versions of the overall score. For each battle that goes into it, instead of the real score, I generate a random score, assuming a normal distribution using the mean and standard deviation I have for that bot. Then I take the standard deviation of those randomized overall scores and multiply by 1.96 for the confidence interval. Seems like a lot of calculations, but only taking a few hundredths of a second even with 250 bots/3000 battles, so I can afford to do it even when I print the overall score after every battle. Nice!
I'm curious - did you use the Monte-Carlo method for calculating the non-smart-battles deviations?
Also, how long did it take to get the 3000 battles compared to the non-smart-battles?
I'm using the same Monte-Carlo method for confidence either way. I hadn't run too many side by sides yet, but I'll do some more soon. Over night, I ran a test of 25 seasons of TCRM in regular vs smart battles mode on my laptop. They took about the same amount of time, and both ended up showing +- 0.363. But the smart battles came out to 89.32, very close to the 89.31 I got when I ran 100 (non-smart) seasons before, while the normal battles ended at 88.76.
So I'm a little disappointed it wasn't faster nor showed a better confidence, but it was a lot closer to the true average. And I guess my confidence calculation sucks or something weird happened, since 88.76 is much farther than .363 from the true average. (And yes, my TCRM score has tanked that much since its glory days!)
Are you sure that you're first averaging all the scores into each bot before averaging the scores together for the section? It wouldn't make a difference in the old method, since they all had the same number of battles, but it would affect things in the new one.
I guess the other possibility is that Diamond is so much slower than the bots it is facing that it doesn't make much difference which one you face. What was the spread of battles like on the TCRM? Were they spread fairly evenly, or were certain battles highly prioritised?
Yeah, that's a good point, especially with the TC bots that are just simple random movements and no gun. If the variation in confidence is higher than the variation in speed, it could take longer for same number of battles. I guess the puzzling thing is the overall confidence calculation showing the same both ways. With a limited amount of sample data, I guess it can only be so accurate, but I'm thinking I may have a bug there. The spread was:
apv.AspidMovement 1.0: 95.6 +- 0.83 (16 battles) dummy.micro.Sparrow 2.5TC: 98.43 +- 0.64 (13 battles) kawigi.mini.Fhqwhgads 1.1TC: 96.95 +- 1.11 (21 battles) emp.Yngwie 1.0: 98.15 +- 0.77 (14 battles) kawigi.sbf.FloodMini 1.4TC: 94.91 +- 1.25 (24 battles) abc.Tron 2.01: 88.15 +- 1.42 (26 battles) wiki.etc.HTTC 1.0: 88.83 +- 1.45 (28 battles) wiki.etc.RandomMovementBot 1.0: 92.23 +- 1.04 (22 battles) davidalves.micro.DuelistMicro 2.0TC: 86.22 +- 1.61 (31 battles) gh.GrubbmGrb 1.2.4TC: 81.29 +- 1.87 (33 battles) pe.SandboxDT 1.91: 85.48 +- 1.8 (31 battles) cx.mini.Cigaret 1.31TC: 86.82 +- 1.62 (31 battles) kc.Fortune 1.0: 80.6 +- 1.77 (29 battles) simonton.micro.WeeklongObsession 1.5TC: 87.02 +- 1.48 (26 battles) jam.micro.RaikoMicro 1.44TC: 79.16 +- 1.8 (30 battles)
Going to leave some tests with Diamond 1.8.16 in real battles running today and see how that compares.
Those +-, are they the standard error or the stddev?
The only thing I can think of testing is whether you are calculating the right number of random battles for each in the Monte-Carlo method. If you were only doing one battle for each, then the numbers you are getting would be the same for the standard as for the smart battles. It looks like the prioritisation is working well though - Sparrow and Yngwie both have low number of battles as well as low error/stddev.
The per bot +- is the 95% (or 97.5%?) confidence = 1.96 * standard error = 1.96 * standard deviation / sqrt(num battles).
It probably is something silly like the one battle per bot you mentioned, but at a glance it seems like the overall confidence calculation isn't doing anything stupid. I'll have a longer look this evening. I do think the smart battles are working well, though, I'd just like to have some numbers to back me up. =)
The spread is a bit more interesting in real battles. HOT bots with 99.9% scores will get 2-3 battles in 12 seasons. RamBots get lots of battles because they have fairly high variance and run super fast.
Some results with normal battles. Diamond 1.8.16 vs 50 random bots for 10 seasons.
- Dumb battles: took 6338.8s, 89.87 +- 0.188
- Smart battles: took 6010.6s, 89.94 +- 0.148
Looks like it hit ~0.18 by 5 seasons with smart battles. Right now I'm using a much rougher calculation for printing overall confidence between battles, for speed. I will be improving this with some caching of the random samples for the overall scores. I do a much more thorough calculation for the final score.
It's a slightly different calculation with the scoring groups, so maybe I only have a bug there. Or maybe there just wasn't much difference in the TCRM. Or maybe TC scores are so far from normally distributed that it throws it off. Or maybe it was just a fluke - the same confidence down to 3 digits seems pretty unlikely even with the same battle selection.
Well, the verdict is in. Looks like a combination of fluke and the TCRM battles just not being particularly optimizable. I ran another 25 seasons each way and got:
- Dumb battles: Took 2690.4s, 89.13 +- 0.362
- Smart battles: Took 2858.8s, 89.4 +- 0.338
So this time smart battles actually took longer, but had a better confidence and were again much closer to the true average. I also tested that the groups and non-groups versions of overall confidence were giving the same for TCRM (because groups are of equal size). I'm going to skip any fancy attempts to optimize for a more accurate overall confidence between battles, round the final confidence to 2 digits instead of 3, and get this posted.
So I'm planning to implement smart battle selection this weekend. Every bot (or bot set) will get at least two battles, then I will choose battles to run (in batches since I don't want idle threads) based on trying to decrease standard error in the least amount of time. Maybe with some random battles sprinkled in as well.
I'm thinking I will choose bots with the highest value for: <math>{{stDev \over \sqrt{numBattles}} - {stDev \over \sqrt{numBattles + 1}}} \over {avgBattleTime}</math>
I think this will lead to an overall result with the highest confidence in the least amount of time.
I like testing against a test bed with an average score about the same as my RoboRumble APS. The problem with this is it includes a lot of bots with super low variance (eg, 99.9% scores), so running lots of battles against them is a waste of time. But ignoring them and using a stronger test bed risks specializing against stronger bots.
That looks like a good metric for choosing fast stability. Now I'm wishing I'd included variance in the LiteRumble scores...
Yeah, do you just store a running tally of average score? I'll need to update RoboRunner to keep scores from every individual battle, too, along with battle times.
Yeah, I do a online mean calculation, so newMean = oldMean*(n/(n+1)) + newScore/(n+1), n++
I've actually thought quite a bit about this, and it all depends what score you're trying to stabilise. If you're trying to stabilise the PL, for instance, you need to run lots of battles for pairings at or near the 50/50 mark. If you're doing Schultz then lots of battles need to go to where a weak bot beat a strong bot. It's all about which battle has the most potential influence.
Yeah, for sure you would focus on different battles to optimize other rankings. I'm not sure I need to add a "focus on win/loss" flag to RoboRunner, since you'd probably just test against your toughest matchups if that's what you were working on. It does support smart battles for all the score types, though (eg survival, bullet damage).
If we do implement this type of smart battle selection in a rumble system, maybe we could have a client side setting for what you're interested in optimizing. =) I guess to start it would just be APS vs win/loss, but it could include Schultz or Vote at some point.
Got this working, just dogfooding it a bit myself before posting it since it's a pretty major change. Data files are now (gzipped) XMLs with the raw scores from every battle and everything's recalculated on the fly. (That was actually most of the work.) Comes out to about 100 kb for 3k battles.
It runs 2 seasons vs each bot then does smart battle selection with the formula above to try to increase overall accuracy as quickly as possible. It's nice to see test runs where only 2 battles were run vs HawkOnFire. =) 5% of the time, it instead chooses randomly among the bots with fewest battles, to try to mitigate cases where the variance was randomly low in the initial battles. (I can make this configurable if/when anyone cares.)
It won't schedule two battles vs the same bot unless the number of bots is <= the number of threads. Otherwise, you'd keep scheduling that bot until the battle finishes. I could instead estimate how many times in a row it would still be worth scheduling it, but that seems like a lot of work for a corner case.
I think this is going to save a heck of a lot of CPU time. The XML data files will also make it easier to let you store arbitrary score data in the custom scoring stuff.
You do not have permission to edit this page, for the following reasons:
You can view and copy the source of this page.
Just a quick node. Maybe you know that already but you can add a shutdownhook to the runtime thread. This would catch CTRL-C and you can clearly shutdown the gzipping. Not sure if that is what you looking for.
Cool, yeah, that might do the trick. I'm trying just doing a fresh save of the score data in the shutdown hook and I'll see if I can ever replicate the problem.
Still getting the feel for how many seasons to run with smart battles. It indeed seems to be much more accurate in less time, but I'm not sure to what degree I should:
- Run less seasons because it's more accurate per number of battles.
- Run the same number of seasons, since it will run faster and still be more accurate.
- Run more seasons, in about the same amount of time as before, but with much more accuracy.
I guess it partly depends on how patient you were being before this feature. =)
Edit: Part of the dilemma is that this focuses on accuracy per time, not per number of battles. So maybe with a certain test bed, you don't gain accuracy in 10 seasons vs traditional battle selection, but it completes in 25% less time and gives the same accuracy. So you could up it to 12 seasons to do better on both time and accuracy.
At this point, this tool does everything I need and I'm really happy with it, so if anyone wants to offer feedback as far as features or prioritizing the to-do's, let me know. =) I'll probably bang out some of the more important stuff in the next week or so (like letting you configure JVM arguments), and the option for dynamically loading battle listeners for custom scoring sounds really cool, so I might tinker with that soon too.
Hi mate. I got a little intimate with your code and finally figured out how it works :). I wrote a dynamic class loader that can load classes from a specific directory/jar. The classes will only be loaded if they provide a certain interface. So far so good. After this i was digging through the code and was looking for a good point to use these classes. Unfortunately it looks like, there is no good way to pass classes between the 'BattleProcess' and the 'BattleRunner'. I tried to redirect the 'System.out/in' of the BattleProcess to Serialization streams but this is not working as i now know. I guess object serialization over temp files is nothing that you are fond of, neither to speak of RMI. The other idea that came to me, would it be possible to map the events of the BattleProcess BattleListener (on...()) to strings, then pass it over the in/out stream to the BattleRunner and rebuild the events there. In my opinion this would have the advantage that you can pass the events to the user made score class and would have no need to do all the score parsing within your code. If the user class decide it has no need for the event it will simply be ignored.
Hmm i have right now a hard time to explain this :). Lets give you a scenario.
I write a score class for the PatternChallenge. The score class interface has a getName() method and this name has to be in the 'pattern.rrc' to. The class will be loaded RoboRunner reads the "rrc" file looks for the available score classes and find my PatternChallenge class. Now you can register this class on the BattleRunner (similar to the BattleResultHandler you have). The score interface has, lets say onBattleCompleted(..) implemented and you pass all the events (in this case just one) to my score class. There i can read the damage fields and calculate my score and if i want to print the results to the console i can do this as well (no work for you so far :)). If the score interface provides a toString() method i could use this to provide a output string for the data file. The only thing you had to do, would be to get this string and write it to the data file at the end of everything. I'm sure i missed something but so far as i see it, could you get rid of all the hard coded score you have right now.
Well, i hope it makes at least a little bit of sense what i have said. If you think i'm wrong on one/all points let me know, i'm not offended at all by it.
Anyway enough mumbling for today :)
Take Care
Cool! Well, I have a few thoughts on how all this could tie together:
- Instead of (or in addition to) RoboRunner/BattleRunner dynamically loading the listener/scoring class, I think we should pass a flag to BattleProcess that tells it the name of the listeners to load and attach to Robocode engine.
- I think it would be good if the listener interface extends IBattleListener, or includes one, so you can just attach it to the RobocodeEngine (addBattleListener) and have it listen to the events it wants.
- Then I guess it would need some setup to pass its output back to BattleRunner so we can store it and print it. I'm fine with printing to stdout or writing to temp files or whatever. I guess if we load the interface on the RoboRunner side, too, it could also have a method that runs after each battle to print whatever it wants from the data file.
- I don't think it's reasonable for BattleProcess to always listen to all events and pass all that data back for every battle. If you look at IBattleListener, it's possible to listen to every detail about every single turn in the battle. That's a lot of extra processing if you're not using it. =)
Does most of that make sense? Thanks for getting the ball rolling on this! I think it'd be a really exciting feature. Even if nobody but us uses it. =)
Hmm ...
- Is there another way then dynamic loading a class, if the program does not know about it? Maybe including the score class directory in the class path and making the challenge name the fully package name but this would still need the class loader part.
- I was starting with the interface to be IBattleListener but i could not get the event classes within BattleRunner and therefore i mapped it to the same methods but with different parameter objects.
- This sounds interesting. I was playing with this but had to face some issues that i could not solve. Loading the same class in different environments but not using all methods equally would be very inconsistent (not to say bad style :)) i guess. The user is probably not aware that the class has no idea where the events are processed and would put his output stuff just within the on..() methods - but never got a result, because it works in a different environment. And making two different classes (one for BattleRunner one for BattleProcess) would be not very user friendly and increases the probability to doing something wrong.
If you have no problem with temp files i guess this would be a good way to solve the issues. This way you can load the score class (should be extend BattleAdaptor) and RoboRunner can check if a certain method is overloaded (translates to - is he interested in this information). This information could be flagged to the BattleProcess and he can use it to process the needed events. If you use temp files you have the possibility to serialize almost every event to the file - pass the temp file name to BattleRunner, restore the Events and pass them to the score class. I cannot point my finger on it, but something tells me that there is something wrong with this approach :)
- yep you are right :) - i was not fully aware of the cascading level of the onTurn..() events and this could lead to some issues with the temp files to i guess. If you, lets say, just interested in the energy level of all bots, it would certainly not make sense to save the whole turn event cascade. Maybe you have a idea to overcome this.
Hehe, thats quite a point you got there :). But i hope it will pay off somehow, especially if i look at the time i have spend to write output classes to get some data visualized through GnuPlot. I easily can see some nice GUI statistic diagrams or movement plots for later runs and that really excites me :).
Take Care
Edit: Another incredible easy to use IPC would be to use named pipes. But this would put the Windows user out of business until someone is willing to write a JNI adaptor, or find another way to establish a named pipe there.
So I guess there's two major things being weighed here:
- User code running in 1 vs 2 places - Having the user code running on just the RoboRunner side of things may avoid some programming pitfalls if someone tries to store state between the battle listener and the score output.
- Having to flatten the battle events for post-processing - If the user code is not in the BattleProcess, we need to figure out what events to listen to, log them, and pass them back to the other side for post-processing after the battle.
I guess I have a pretty strong preference for having user code in the battle listener itself instead of processing and transferring all the desired events. Figuring out which methods to listen to, serializing all the events, then processing them on the other side just seems like a lot of unnecessary work, and possibly error prone. A big note in the Javadoc that the listener methods should be idempotent, or using separate interfaces both seem like OK options to me.
I get the impression you'd rather make the other trade-off. =) The main thing I'm not sure of is whether reflection can figure out which methods you actually override. All of them would be overridden by BattleAdaptor, so I'm just not sure we can tell the difference. You could end up with some big temp files if you listen to onTurnEnded, but I don't think processing time would be much compared to running the battle itself.
So I guess what I'm imagining is something like:
- RoboRunner finds the custom listeners (command line argument and/or in challenge file). It loads an instance to process scoring output and passes the listener names to BattleProcess, which also loads them.
- BattleProcess sets some object on the listening class, which the listener can use to store custom values. (Eg, "skipped_turns" = 50, or "score_snapshots = {100, 150, 250, 575}".) Maybe an XML or JSON object.
- BattleProcess loads the battle listener and attaches it to the Robocode Engine, and runs the battle. The listener processes things on the fly and stores data in the data object.
- The values stored by the listener would be output by BattleProcess, read by BattleRunner, and stored in the bot's data file. (With XML or JSON, converting to/from ASCII like this would be pretty easy.)
- The scoring method would take the score data for that bot set and/or battle and display whatever it wants.
If you're using some sort of IPC, why not TCP? Then it opens the option of running remote battle runners.
This could actually work really smoothly. By default, it spins up the processes as now, but passing a port number to each process and communicating over TCP/IP. The data sent / received could remain the same. Then we could add command line arguments for:
- Launching Robocode processes and doing nothing, just listening for commands.
- Accepting a list of host:port of additional processes. In addition to the normal processes, launch a thread for each remote process.
So on your extra machine, you do #1, and on your primary machine you do #2, and voila!
Edit: Except for copying the necessary bot JARs. That would be a little more complicated.
Well :), of course TCP would be the obvious choice for IPC, but i think you bring a whole new bunch of complexity into the program and i'm not sure if it is worth the struggle.
Beside copying the bot JARs, copying the user score classes, configuring the robocode path on every extra machine there are some other more technically issues to consider. Of course if done right it would be a very nice and strong feature, beyond question.
Using the scenario you described, with JSON, sound quite interesting, maybe i should reconsider my concerns about having the user classes running on two different places. I'm sure i'm nitpicking to much on that point.
Sidenode: It is possible with reflection to check if a method is overloaded just by doing
myBattleAdaptor.getClass().getMethod("onBattleCompleted",BattleCompletedEvent.class).getDeclaringClass()
if it gives back the name of myBattleAdaptor it is overloaded.
Right now i have discarded the tmp file approach, simple because i don't liked it and switched to named pipes. The BattleRunner got some watcher threads where he is communicating with the BattleProcesses, using ObjectStreams and watch out for errors and feed the score class. Don't worry i'm doing all this just for curiosity and will be fine with whatever you come up.
Take Care
Congratulations on releasing this!
I've got it working already. Super easy.
It is moving along noticably quicker than RoboResearch! Awesome work!
Cool, so nice to hear! =) I think some people will miss the RoboResearch GUI, but maybe I or someone else can add one sometime. And there's still quite a few little things left on the to do list. But I'm pretty happy with it. =)
Hi Voidious. Nice program have you put there together, respect. I'm really excited about the melee feature. I had a couple of tries with RoboResearch but got it never to work on melee benchmarks. Easy to install and use, nice job.
Have you thought about some sort of dynamic score output? For me it would be very useful if i could write my own benchmark score, because in melee it is sometimes better to get a score view along some certain battle states. Like start/middle/end game or score against every opponent by its own. If you have for example Diamond :) and some samples together i would like to know how much score i loose to the samples (or in general weaker bots) if a top bot is on the field.
I will have a look at the sources, and maybe it is possible to make the scores dynamic. Maybe you have something in mind and we could share some ideas. I'm very fond of the idea to have a nice and easy melee test platform.
The remote client feature of Jdev Distributed_Robocode would be awesome. Unfortunately it seems to need Java 7 and therefore is out of my reach.
Take Care
Hey Wompi, thanks for the thoughts. The score output could definitely use a lot more features/options, it's pretty bare bones right now. You also make me realize that I don't even score per bot scores in the data file, so I'll need to fix that first. One thing is, as much as possible, I want to make the right decision automatically about how to show scores instead of making you remember lots of settings, but in cases where different things make sense to different people I'm OK with adding optional flags or whatever.
So for Melee, right now we have something like:
voidious.Diamond 1.8.4.x12 vs abc.Shadow 3.84i, sample.Crazy 1.0: 61.02, positive.Portia 1.26e took 34.8s, avg: 59.93. Overall score: 55.34, 1.5 seasons
So maybe if there's more than one opponent, we'd add the per bot scores each on their own line after that? Like:
voidious.Diamond 1.8.4.x12 vs abc.Shadow 3.84i, sample.Crazy 1.0: 61.02, positive.Portia 1.26e took 34.8s, avg: 59.93. vs abc.Shadow 3.84i: 55.05 (22000 : 19000), avg: 53.70 vs sample.Crazy 1.0: 90.1 (22000 : 2000), avg: 90.2 vs positive.Portia 1.26e: 53.43 (22000 : 20341), avg: 54.15 Overall score: 55.34, 1.5 seasons
What do you think? Would you also like to see bullet damage / survival data? I always collect all the different fields for scoring, but for the most part was only going to show whatever you had configured as the scoring style. But I've been thinking lately it might be nice to show bullet damage / survival too.
Oh, and about the options for mid-battle score data, that sounds like a really cool idea. Do you mean like you could write and plugin your own scoring code? I'm not really sure the best way to set it up so you could write your scoring class and pass it to RoboRunner, but from a technical standpoint I don't think it would be too tough.
I was just thinking yesterday that it would be cool to integrate some stuff like what Rednaxela did here for collecting hit rates and stuff during a battle, too.
Yes, each per line would be great.
For the damage/survival data, hmm, personally i look at the damage in very rare cases (mostly if i run my 100+k/40k benchmark against the samples) and survival is most interesting if you see all places (to spot some movement leaks in early/mid game) but it couldn't hurt to show these data :)
As i said, i have quite a bunch of 'odd' scoring pattern and a way to implement these dynamic would be great. My fist thought was to provide an dynamic ClassLoader and a directory, where you can put your own written score pattern. There you can release RoboRunner with some default pattern (score,damage) and still provide the possibility to write your own. I guess this would be fairly easy just to provide an interface and pass the 'ScoreObject'. In that way you don't have to put much interest in the scoring table. Some challenges also need some unusual score i guess.
To use the 'robocode.control' (like Rednaxela did) would be extraordinaire, i constantly write new output classes and pass the results to GnuPlot but having this bundled - awesome!
If you like, i can try to put a first draft together for the scoring tomorrow. Not sure if you are fond with the idea to have someone messing with your code.
Take Care
Well, for the next round of changes, I think I'll add the per bot data for Melee battles and the other basic scoring options (like survival and bullet damage). There's still a bunch of basic things I need to check off my to-do list before I get too deep into the custom scoring stuff. But I do think it sounds awesome and really powerful.
Reading a custom battle listener at runtime and attaching it to the Robocode engine via control API (which I'm already using to run battles) should be pretty easy. Then you could listen to whatever events you want to and do whatever you want with the data. And if you're comfortable doing everything outside the RoboRunner infrastructure, that will be all you need.
What I see as the hard part is crafting a nice way for you to store/retrieve these score data in the data files and format custom output, which I think is necessary to make this a lot more useful. It's not rocket surgery, but I think the data file format would need an overhaul - maybe switch to something XML based. And I'm not sure about just passing a custom ScoreObject, because, for instance, right now I'm only listening for the final score from the Robocode engine, so I don't even have the data you'd want. You'd need to listen to other stuff for the per-round survival and stuff. There's a lot of options for what type of data to collect, so I don't think I want to guess and try to record all types of data you might want and just pass it along.
And sure, feel free to experiment some of this stuff, or fork the GitHub repo and go nuts. =) I made the code public domain so people can do whatever they want with it. I'm super stoked that anybody's even interested in this - I thought I'd be the only one using it while everyone else just stuck with their RoboResearch setups. =)
First page |
Previous page |
Next page |
Last page |