View source for Talk:RoboRunner
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Possible bug report | 13 | 22:17, 23 August 2012 |
calculating confidence of an APS score | 12 | 02:31, 15 August 2012 |
smart battles | 9 | 16:15, 13 August 2012 |
priorities | 7 | 09:24, 1 August 2012 |
Congrats! | 6 | 17:21, 28 July 2012 |
First page |
Previous page |
Next page |
Last page |
Heya Voidious,
I think I may have found a bug.
I finished a run of deBroglie rev0130 last night on the test bed you made for me. Score was in the lower 80s.
Just now, I manually made a .rrc testbed with some high performing bots. Started running it, and here's the output. Looks like RoboRunner is carrying over the score from the other challenge file?
~/roborunner $ ./rr.sh -bot tjk.deBroglie rev0130 -c debroglie_mega.rrc -seasons 20
Copying missing bots... 0 JAR copies done!
Initializing engine: ./robocodes/r1... done!
Initializing engine: ./robocodes/r3... done!
Initializing engine: ./robocodes/r2... done!
Challenger: tjk.deBroglie rev0130
Challenge: deBroglie Megabot test
Seasons: 20
Threads: 3
tjk.deBroglie rev0130 vs lxx.Tomcat 3.67c: 39.79, took 57.6s, avg: 39.79
Overall score: 81.16, 170.42 seasons
tjk.deBroglie rev0130 vs voidious.Diamond 1.8.1: 31.91, took 72.3s, avg: 31.91
Overall score: 80.83, 170.5 seasons
tjk.deBroglie rev0130 vs jk.mega.DrussGT 2.7.3: 37.2, took 82.0s, avg: 37.2
Overall score: 80.54, 170.58 seasons
Yep, it seems I'm printing the overall score for every bot you've faced, not just the ones in the current challenge file that's loaded. I'll see about fixing that later today. You can just delete (or rename for now) the file from the data directory if you want to start fresh. Thanks!
Or you could keep/copy just the lines for those bots in the data file, if you feel like mucking with it.
Ok, posted the fix in 1.0.1: [1] Only things to update are the RoboRunner JAR and rr.sh which points to it. It was just a problem with the output, so things should work fine with your old data file, if you still have it.
Hi mate. I got a little Exception :)
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.lang.ArithmeticException: / by zero at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at robowiki.runner.BattleRunner.getAllFutures(BattleRunner.java:95) at robowiki.runner.BattleRunner.runBattles(BattleRunner.java:80) at robowiki.runner.RoboRunner.runBattles(RoboRunner.java:338) at robowiki.runner.RoboRunner.main(RoboRunner.java:89) ... Caused by: java.lang.ArithmeticException: / by zero at robowiki.runner.RoboRunner.printOverallScores(RoboRunner.java:485) at robowiki.runner.RoboRunner.access$4(RoboRunner.java:466) at robowiki.runner.RoboRunner$3.processResults(RoboRunner.java:734) at robowiki.runner.BattleRunner$BattleCallable$2.run(BattleRunner.java:197) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
One question. If i fork RoboRunner to my GitHub repositories and make changes, does it mean i have a new project or is it more like a separate branch and we could merge some changes i made?
Take care
edit: stupid me, i posted just the head
Seems like this would only happen when printing the overall score for 0 battles? Is it possible that was the situation? If so I'm not as worried about it being a bug, but we should check for it and print something nicer. If it shouldn't have had 0 total battles, then it's a deeper problem with the score tallying I guess.
This is my first experience with GitHub, so I don't know for sure, but I'm pretty sure the main idea behind forking is for you to make changes and then I can pull them back in. I think you issue a "pull request" once you've made your changes. I also think it can function fine as a new project if you don't ever intend to merge back.
Yep the problem is deeper. It looks like, if i had no battles before everything is fine (not 100% sure). Then, if i restart the test run this Exception comes up. I run it with 20 seasons (melee).
I made just a quick fix for me, so i can still use it. The only thing i lost was the 'Overall' score output - but i'm fine with the 'Average' output.
I have forked your repository and made a new brunch from the main branch, not sure if you can see it on your side to. The only thing i changed so far is the output of the melee score (just formatting). Yes, i guess i will use it mainly as new project and tweak it to my needs, but i thought for little bug fixes it would be easier to just merge the branches.
So you can see the latest and average score for each battle, but overall score throws that exception? How strange. Could you post your data file somewhere so I can try to reproduce? That would be super helpful. (roborunner/data/package.BotName version.xml.gz)
I'd certainly like to pull back any bug fixes or awesome new features. =) What's your melee score output look like? I've used it for Melee a little but mostly 1v1, and even for Melee I tend to focus on overall score, so I'm open to suggestion. I've also considered a -verbose option (or something) for printing extra scoring details, like survival/bullet damage even when you specify APS as the scoring style.
Ok there it is RoboRunner-bugtrace.zip. Looks like i was wrong it happens straight from the start. I deleted all xml files and the output of the first run is shown in the zip file. Maybe it helps :). Let me know if you need more. i broke the run after the second season with 'CTRL-C'.
Well, i just made the melee output a little more 'eye' friendly :) but i guess i will enhance the output to something that i use in my other outputs in the next days (nothing serious just a little more info on bullet hit ratio of all bots,some movement stats, sorted output of APS and a table of all bot score to each other). Based on an early RoboRunner version i rewrote it to a console like application. So basically you start the program and use console commands to configure,run.output some stuff. Unfortunately does it not use multiple threads and i'm now back to the latest RoboRunner and maybe i can merge the two somehow.
I think if you look in GitHub at Network you should see the forks that go of of your main branch.
Great, thanks! I was able to duplicate it here and figured out the problem. RoboRunner gets confused by having 2 of the same bot in a battle (mld.DustBunny 3.8 in this case). It looks like BattleListener eats the result right away when it builds a map of scores by bot name/version. (Edit: So RoboRunner has zero scores for the actual bot list when it tries to calculate overall score.) I have to head out in a few minutes, but I'll try to get a fix out later tonight or tomorrow.
Ok, I think it's all set. Tested it with the challenge you provided, dropped it into my currently running melee test a half hour ago and that still looks right, and did my usual round of manual setup and tests. The fix was mostly pretty easy thanks to Guava's Multimap stuff, but it also led to some minor refactoring so that nothing is based on looking up a score only by bot name, besides the challenger bot. I think it should work fine even if the challenger is also a reference bot, even though that seems silly - the first score for that bot in each battle would be considered the challenger score.
Hopefully it won't be too painful of a merge for you. ;)
Yep works fine. Thanks. It wasn't supposed to have two of the same bots within the challenge :) - i realized that i just took an old challenge file while switching to the new RoboRunner version. But i guess in this case it was luck to detect the bug.
I tried yesterday to make the challenger a development bot. I changed the copy bot function to let bot names with ..* through but somewhere it lost the name. Can you give me a hint where the bot name comes back from the process? The RobocodeEngine can work with development bots if the properties contain the right path. What it does, if you give it, lets say wompi.Wallaby* , it changes it to wompi.Wallaby* 4.7 for the result output (this works so far). Now i thought i just change the name back to my original (wompi.Wallaby*) within the BattleResultHandler (i guess this is where the results are coming back from the process) and could work with development bots. It was just a quick try and i will try it today more seriously, but maybe you have a quick solution. I guess you are more used to your code and could say where it stores references between name and score. The sad thing is even if i'm giving it the complete name (wompi.Wallaby* 4.7) it doesn't work :(. I guess somewhere the "*" is a limiter or gets lost. Please don't put any time in this, it just would be nice if you have a quick hint.
I have to admit that this GitHub stuff is very neat. It's so easy to work with - thanks for pointing me at this by releasing RoboRunner over GitHub. I'm a little more used to it now and figured out how the forking works.
It's basically: fork your origin -> my origin clone to local (optional) make branches add your origin as remote (this keeps me up with changes at your side) merge remote -> my branch push my branch -> my origin (optional) make a pull request to you
It's pretty straight forward and with GitX you have a nice graphic view about the branches to :)
Take Care
I don't think the name should be interpreted as a regex anywhere or anything like that. I think that whatever comes back from robotResults.getRobot().getNameAndVersion() in BattleListener should be handled by the rest of the code OK. The other points of concern that come to mind are:
- Copying the dev bot into the Robocode install directories means copying your package dir and classes into the robots dirs of each Robocode install, which is not as simple as copying one file. (Unless you have them all configured to look at some other directory?)
- Assuming Robocode can find it, checking whether the dev bot you specify is actually running in the battles.
For the second point, you could try:
- Modifying BattleProcess to do _engine.setVisible(true), so you could see the battles that get run.
- Run robowiki.runner.BattleProcess (with -path to Robocode, -rounds, -width, -height) and try running battles with your dev bot. BattleProcess is a command line application where you can type in a comma delimited list of bots (like "jam.mini.Raiko 0.43,voidious.Diamond 1.8.1") and it runs the battle and spits out the result.
And yeah, I'm liking GitHub too! I know PEZ is a big fan, though he's not doing Robocode stuff. I didn't know about GitX, I'll have to give that a shot. Maybe it will encourage me to make better use of branches. ;)
Hey resident brainiacs - I'm displaying confidence using standard error calculations on a per bot basis in RoboRunner now. What I'm not sure of is how to calculate the confidence of the overall score.
If I had the same number of battles for each bot, then the average of all battles would equal the average of all per bot scores. So I think then I could just calculate the overall average and standard error, ignoring per bot averages, and get the confidence interval of overall score that way. But what I want is the average of the individual bot scores, each of which has a different number of battles.
Something like (average standard error / sqrt(num bots)) makes intuitive sense, but I have no idea if it's right. Or maybe sqrt(average(variance relative to per bot average)) / sqrt(num battles)?
This would also allow me to measure the benefits of the smart battle selection.
I don't actually think this can be correctly modelled by a unimodal distribution - you will be adding thin gaussians to fat gaussians, making horrible bumps which don't like to be approximated by a single gaussian mean+stdev. I almost wonder if some sort of Monte-Carlo solution wouldn't be most accurate in this instance - at least the math would be easy to understand.
Good call! That was super easy. I don't recall this Monte-Carlo stuff, but the name rings a bell so maybe I learned about it at some point.
So I calculate 100 random versions of the overall score. For each battle that goes into it, instead of the real score, I generate a random score, assuming a normal distribution using the mean and standard deviation I have for that bot. Then I take the standard deviation of those randomized overall scores and multiply by 1.96 for the confidence interval. Seems like a lot of calculations, but only taking a few hundredths of a second even with 250 bots/3000 battles, so I can afford to do it even when I print the overall score after every battle. Nice!
I'm curious - did you use the Monte-Carlo method for calculating the non-smart-battles deviations?
Also, how long did it take to get the 3000 battles compared to the non-smart-battles?
I'm using the same Monte-Carlo method for confidence either way. I hadn't run too many side by sides yet, but I'll do some more soon. Over night, I ran a test of 25 seasons of TCRM in regular vs smart battles mode on my laptop. They took about the same amount of time, and both ended up showing +- 0.363. But the smart battles came out to 89.32, very close to the 89.31 I got when I ran 100 (non-smart) seasons before, while the normal battles ended at 88.76.
So I'm a little disappointed it wasn't faster nor showed a better confidence, but it was a lot closer to the true average. And I guess my confidence calculation sucks or something weird happened, since 88.76 is much farther than .363 from the true average. (And yes, my TCRM score has tanked that much since its glory days!)
Are you sure that you're first averaging all the scores into each bot before averaging the scores together for the section? It wouldn't make a difference in the old method, since they all had the same number of battles, but it would affect things in the new one.
I guess the other possibility is that Diamond is so much slower than the bots it is facing that it doesn't make much difference which one you face. What was the spread of battles like on the TCRM? Were they spread fairly evenly, or were certain battles highly prioritised?
Yeah, that's a good point, especially with the TC bots that are just simple random movements and no gun. If the variation in confidence is higher than the variation in speed, it could take longer for same number of battles. I guess the puzzling thing is the overall confidence calculation showing the same both ways. With a limited amount of sample data, I guess it can only be so accurate, but I'm thinking I may have a bug there. The spread was:
apv.AspidMovement 1.0: 95.6 +- 0.83 (16 battles) dummy.micro.Sparrow 2.5TC: 98.43 +- 0.64 (13 battles) kawigi.mini.Fhqwhgads 1.1TC: 96.95 +- 1.11 (21 battles) emp.Yngwie 1.0: 98.15 +- 0.77 (14 battles) kawigi.sbf.FloodMini 1.4TC: 94.91 +- 1.25 (24 battles) abc.Tron 2.01: 88.15 +- 1.42 (26 battles) wiki.etc.HTTC 1.0: 88.83 +- 1.45 (28 battles) wiki.etc.RandomMovementBot 1.0: 92.23 +- 1.04 (22 battles) davidalves.micro.DuelistMicro 2.0TC: 86.22 +- 1.61 (31 battles) gh.GrubbmGrb 1.2.4TC: 81.29 +- 1.87 (33 battles) pe.SandboxDT 1.91: 85.48 +- 1.8 (31 battles) cx.mini.Cigaret 1.31TC: 86.82 +- 1.62 (31 battles) kc.Fortune 1.0: 80.6 +- 1.77 (29 battles) simonton.micro.WeeklongObsession 1.5TC: 87.02 +- 1.48 (26 battles) jam.micro.RaikoMicro 1.44TC: 79.16 +- 1.8 (30 battles)
Going to leave some tests with Diamond 1.8.16 in real battles running today and see how that compares.
Those +-, are they the standard error or the stddev?
The only thing I can think of testing is whether you are calculating the right number of random battles for each in the Monte-Carlo method. If you were only doing one battle for each, then the numbers you are getting would be the same for the standard as for the smart battles. It looks like the prioritisation is working well though - Sparrow and Yngwie both have low number of battles as well as low error/stddev.
The per bot +- is the 95% (or 97.5%?) confidence = 1.96 * standard error = 1.96 * standard deviation / sqrt(num battles).
It probably is something silly like the one battle per bot you mentioned, but at a glance it seems like the overall confidence calculation isn't doing anything stupid. I'll have a longer look this evening. I do think the smart battles are working well, though, I'd just like to have some numbers to back me up. =)
The spread is a bit more interesting in real battles. HOT bots with 99.9% scores will get 2-3 battles in 12 seasons. RamBots get lots of battles because they have fairly high variance and run super fast.
Some results with normal battles. Diamond 1.8.16 vs 50 random bots for 10 seasons.
- Dumb battles: took 6338.8s, 89.87 +- 0.188
- Smart battles: took 6010.6s, 89.94 +- 0.148
Looks like it hit ~0.18 by 5 seasons with smart battles. Right now I'm using a much rougher calculation for printing overall confidence between battles, for speed. I will be improving this with some caching of the random samples for the overall scores. I do a much more thorough calculation for the final score.
It's a slightly different calculation with the scoring groups, so maybe I only have a bug there. Or maybe there just wasn't much difference in the TCRM. Or maybe TC scores are so far from normally distributed that it throws it off. Or maybe it was just a fluke - the same confidence down to 3 digits seems pretty unlikely even with the same battle selection.
Well, the verdict is in. Looks like a combination of fluke and the TCRM battles just not being particularly optimizable. I ran another 25 seasons each way and got:
- Dumb battles: Took 2690.4s, 89.13 +- 0.362
- Smart battles: Took 2858.8s, 89.4 +- 0.338
So this time smart battles actually took longer, but had a better confidence and were again much closer to the true average. I also tested that the groups and non-groups versions of overall confidence were giving the same for TCRM (because groups are of equal size). I'm going to skip any fancy attempts to optimize for a more accurate overall confidence between battles, round the final confidence to 2 digits instead of 3, and get this posted.
So I'm planning to implement smart battle selection this weekend. Every bot (or bot set) will get at least two battles, then I will choose battles to run (in batches since I don't want idle threads) based on trying to decrease standard error in the least amount of time. Maybe with some random battles sprinkled in as well.
I'm thinking I will choose bots with the highest value for: <math>{{stDev \over \sqrt{numBattles}} - {stDev \over \sqrt{numBattles + 1}}} \over {avgBattleTime}</math>
I think this will lead to an overall result with the highest confidence in the least amount of time.
I like testing against a test bed with an average score about the same as my RoboRumble APS. The problem with this is it includes a lot of bots with super low variance (eg, 99.9% scores), so running lots of battles against them is a waste of time. But ignoring them and using a stronger test bed risks specializing against stronger bots.
That looks like a good metric for choosing fast stability. Now I'm wishing I'd included variance in the LiteRumble scores...
Yeah, do you just store a running tally of average score? I'll need to update RoboRunner to keep scores from every individual battle, too, along with battle times.
Yeah, I do a online mean calculation, so newMean = oldMean*(n/(n+1)) + newScore/(n+1), n++
I've actually thought quite a bit about this, and it all depends what score you're trying to stabilise. If you're trying to stabilise the PL, for instance, you need to run lots of battles for pairings at or near the 50/50 mark. If you're doing Schultz then lots of battles need to go to where a weak bot beat a strong bot. It's all about which battle has the most potential influence.
Yeah, for sure you would focus on different battles to optimize other rankings. I'm not sure I need to add a "focus on win/loss" flag to RoboRunner, since you'd probably just test against your toughest matchups if that's what you were working on. It does support smart battles for all the score types, though (eg survival, bullet damage).
If we do implement this type of smart battle selection in a rumble system, maybe we could have a client side setting for what you're interested in optimizing. =) I guess to start it would just be APS vs win/loss, but it could include Schultz or Vote at some point.
You do not have permission to edit this page, for the following reasons:
You can view and copy the source of this page.
Though I'm still figuring out how to avoid potentially corrupting the data file if you ctrl-C your run. I'm not sure if skipping the gzipping would help or if it's just become more likely because the data files are so much larger. Maybe I need to add a keyboard option to safely exit.
Just a quick node. Maybe you know that already but you can add a shutdownhook to the runtime thread. This would catch CTRL-C and you can clearly shutdown the gzipping. Not sure if that is what you looking for.
Cool, yeah, that might do the trick. I'm trying just doing a fresh save of the score data in the shutdown hook and I'll see if I can ever replicate the problem.
Still getting the feel for how many seasons to run with smart battles. It indeed seems to be much more accurate in less time, but I'm not sure to what degree I should:
- Run less seasons because it's more accurate per number of battles.
- Run the same number of seasons, since it will run faster and still be more accurate.
- Run more seasons, in about the same amount of time as before, but with much more accuracy.
I guess it partly depends on how patient you were being before this feature. =)
Edit: Part of the dilemma is that this focuses on accuracy per time, not per number of battles. So maybe with a certain test bed, you don't gain accuracy in 10 seasons vs traditional battle selection, but it completes in 25% less time and gives the same accuracy. So you could up it to 12 seasons to do better on both time and accuracy.
At this point, this tool does everything I need and I'm really happy with it, so if anyone wants to offer feedback as far as features or prioritizing the to-do's, let me know. =) I'll probably bang out some of the more important stuff in the next week or so (like letting you configure JVM arguments), and the option for dynamically loading battle listeners for custom scoring sounds really cool, so I might tinker with that soon too.
Hi mate. I got a little intimate with your code and finally figured out how it works :). I wrote a dynamic class loader that can load classes from a specific directory/jar. The classes will only be loaded if they provide a certain interface. So far so good. After this i was digging through the code and was looking for a good point to use these classes. Unfortunately it looks like, there is no good way to pass classes between the 'BattleProcess' and the 'BattleRunner'. I tried to redirect the 'System.out/in' of the BattleProcess to Serialization streams but this is not working as i now know. I guess object serialization over temp files is nothing that you are fond of, neither to speak of RMI. The other idea that came to me, would it be possible to map the events of the BattleProcess BattleListener (on...()) to strings, then pass it over the in/out stream to the BattleRunner and rebuild the events there. In my opinion this would have the advantage that you can pass the events to the user made score class and would have no need to do all the score parsing within your code. If the user class decide it has no need for the event it will simply be ignored.
Hmm i have right now a hard time to explain this :). Lets give you a scenario.
I write a score class for the PatternChallenge. The score class interface has a getName() method and this name has to be in the 'pattern.rrc' to. The class will be loaded RoboRunner reads the "rrc" file looks for the available score classes and find my PatternChallenge class. Now you can register this class on the BattleRunner (similar to the BattleResultHandler you have). The score interface has, lets say onBattleCompleted(..) implemented and you pass all the events (in this case just one) to my score class. There i can read the damage fields and calculate my score and if i want to print the results to the console i can do this as well (no work for you so far :)). If the score interface provides a toString() method i could use this to provide a output string for the data file. The only thing you had to do, would be to get this string and write it to the data file at the end of everything. I'm sure i missed something but so far as i see it, could you get rid of all the hard coded score you have right now.
Well, i hope it makes at least a little bit of sense what i have said. If you think i'm wrong on one/all points let me know, i'm not offended at all by it.
Anyway enough mumbling for today :)
Take Care
Cool! Well, I have a few thoughts on how all this could tie together:
- Instead of (or in addition to) RoboRunner/BattleRunner dynamically loading the listener/scoring class, I think we should pass a flag to BattleProcess that tells it the name of the listeners to load and attach to Robocode engine.
- I think it would be good if the listener interface extends IBattleListener, or includes one, so you can just attach it to the RobocodeEngine (addBattleListener) and have it listen to the events it wants.
- Then I guess it would need some setup to pass its output back to BattleRunner so we can store it and print it. I'm fine with printing to stdout or writing to temp files or whatever. I guess if we load the interface on the RoboRunner side, too, it could also have a method that runs after each battle to print whatever it wants from the data file.
- I don't think it's reasonable for BattleProcess to always listen to all events and pass all that data back for every battle. If you look at IBattleListener, it's possible to listen to every detail about every single turn in the battle. That's a lot of extra processing if you're not using it. =)
Does most of that make sense? Thanks for getting the ball rolling on this! I think it'd be a really exciting feature. Even if nobody but us uses it. =)
Hmm ...
- Is there another way then dynamic loading a class, if the program does not know about it? Maybe including the score class directory in the class path and making the challenge name the fully package name but this would still need the class loader part.
- I was starting with the interface to be IBattleListener but i could not get the event classes within BattleRunner and therefore i mapped it to the same methods but with different parameter objects.
- This sounds interesting. I was playing with this but had to face some issues that i could not solve. Loading the same class in different environments but not using all methods equally would be very inconsistent (not to say bad style :)) i guess. The user is probably not aware that the class has no idea where the events are processed and would put his output stuff just within the on..() methods - but never got a result, because it works in a different environment. And making two different classes (one for BattleRunner one for BattleProcess) would be not very user friendly and increases the probability to doing something wrong.
If you have no problem with temp files i guess this would be a good way to solve the issues. This way you can load the score class (should be extend BattleAdaptor) and RoboRunner can check if a certain method is overloaded (translates to - is he interested in this information). This information could be flagged to the BattleProcess and he can use it to process the needed events. If you use temp files you have the possibility to serialize almost every event to the file - pass the temp file name to BattleRunner, restore the Events and pass them to the score class. I cannot point my finger on it, but something tells me that there is something wrong with this approach :)
- yep you are right :) - i was not fully aware of the cascading level of the onTurn..() events and this could lead to some issues with the temp files to i guess. If you, lets say, just interested in the energy level of all bots, it would certainly not make sense to save the whole turn event cascade. Maybe you have a idea to overcome this.
Hehe, thats quite a point you got there :). But i hope it will pay off somehow, especially if i look at the time i have spend to write output classes to get some data visualized through GnuPlot. I easily can see some nice GUI statistic diagrams or movement plots for later runs and that really excites me :).
Take Care
Edit: Another incredible easy to use IPC would be to use named pipes. But this would put the Windows user out of business until someone is willing to write a JNI adaptor, or find another way to establish a named pipe there.
So I guess there's two major things being weighed here:
- User code running in 1 vs 2 places - Having the user code running on just the RoboRunner side of things may avoid some programming pitfalls if someone tries to store state between the battle listener and the score output.
- Having to flatten the battle events for post-processing - If the user code is not in the BattleProcess, we need to figure out what events to listen to, log them, and pass them back to the other side for post-processing after the battle.
I guess I have a pretty strong preference for having user code in the battle listener itself instead of processing and transferring all the desired events. Figuring out which methods to listen to, serializing all the events, then processing them on the other side just seems like a lot of unnecessary work, and possibly error prone. A big note in the Javadoc that the listener methods should be idempotent, or using separate interfaces both seem like OK options to me.
I get the impression you'd rather make the other trade-off. =) The main thing I'm not sure of is whether reflection can figure out which methods you actually override. All of them would be overridden by BattleAdaptor, so I'm just not sure we can tell the difference. You could end up with some big temp files if you listen to onTurnEnded, but I don't think processing time would be much compared to running the battle itself.
So I guess what I'm imagining is something like:
- RoboRunner finds the custom listeners (command line argument and/or in challenge file). It loads an instance to process scoring output and passes the listener names to BattleProcess, which also loads them.
- BattleProcess sets some object on the listening class, which the listener can use to store custom values. (Eg, "skipped_turns" = 50, or "score_snapshots = {100, 150, 250, 575}".) Maybe an XML or JSON object.
- BattleProcess loads the battle listener and attaches it to the Robocode Engine, and runs the battle. The listener processes things on the fly and stores data in the data object.
- The values stored by the listener would be output by BattleProcess, read by BattleRunner, and stored in the bot's data file. (With XML or JSON, converting to/from ASCII like this would be pretty easy.)
- The scoring method would take the score data for that bot set and/or battle and display whatever it wants.
If you're using some sort of IPC, why not TCP? Then it opens the option of running remote battle runners.
This could actually work really smoothly. By default, it spins up the processes as now, but passing a port number to each process and communicating over TCP/IP. The data sent / received could remain the same. Then we could add command line arguments for:
- Launching Robocode processes and doing nothing, just listening for commands.
- Accepting a list of host:port of additional processes. In addition to the normal processes, launch a thread for each remote process.
So on your extra machine, you do #1, and on your primary machine you do #2, and voila!
Edit: Except for copying the necessary bot JARs. That would be a little more complicated.
Well :), of course TCP would be the obvious choice for IPC, but i think you bring a whole new bunch of complexity into the program and i'm not sure if it is worth the struggle.
Beside copying the bot JARs, copying the user score classes, configuring the robocode path on every extra machine there are some other more technically issues to consider. Of course if done right it would be a very nice and strong feature, beyond question.
Using the scenario you described, with JSON, sound quite interesting, maybe i should reconsider my concerns about having the user classes running on two different places. I'm sure i'm nitpicking to much on that point.
Sidenode: It is possible with reflection to check if a method is overloaded just by doing
myBattleAdaptor.getClass().getMethod("onBattleCompleted",BattleCompletedEvent.class).getDeclaringClass()
if it gives back the name of myBattleAdaptor it is overloaded.
Right now i have discarded the tmp file approach, simple because i don't liked it and switched to named pipes. The BattleRunner got some watcher threads where he is communicating with the BattleProcesses, using ObjectStreams and watch out for errors and feed the score class. Don't worry i'm doing all this just for curiosity and will be fine with whatever you come up.
Take Care
Congratulations on releasing this!
I've got it working already. Super easy.
It is moving along noticably quicker than RoboResearch! Awesome work!
Cool, so nice to hear! =) I think some people will miss the RoboResearch GUI, but maybe I or someone else can add one sometime. And there's still quite a few little things left on the to do list. But I'm pretty happy with it. =)
Hi Voidious. Nice program have you put there together, respect. I'm really excited about the melee feature. I had a couple of tries with RoboResearch but got it never to work on melee benchmarks. Easy to install and use, nice job.
Have you thought about some sort of dynamic score output? For me it would be very useful if i could write my own benchmark score, because in melee it is sometimes better to get a score view along some certain battle states. Like start/middle/end game or score against every opponent by its own. If you have for example Diamond :) and some samples together i would like to know how much score i loose to the samples (or in general weaker bots) if a top bot is on the field.
I will have a look at the sources, and maybe it is possible to make the scores dynamic. Maybe you have something in mind and we could share some ideas. I'm very fond of the idea to have a nice and easy melee test platform.
The remote client feature of Jdev Distributed_Robocode would be awesome. Unfortunately it seems to need Java 7 and therefore is out of my reach.
Take Care
Hey Wompi, thanks for the thoughts. The score output could definitely use a lot more features/options, it's pretty bare bones right now. You also make me realize that I don't even score per bot scores in the data file, so I'll need to fix that first. One thing is, as much as possible, I want to make the right decision automatically about how to show scores instead of making you remember lots of settings, but in cases where different things make sense to different people I'm OK with adding optional flags or whatever.
So for Melee, right now we have something like:
voidious.Diamond 1.8.4.x12 vs abc.Shadow 3.84i, sample.Crazy 1.0: 61.02, positive.Portia 1.26e took 34.8s, avg: 59.93. Overall score: 55.34, 1.5 seasons
So maybe if there's more than one opponent, we'd add the per bot scores each on their own line after that? Like:
voidious.Diamond 1.8.4.x12 vs abc.Shadow 3.84i, sample.Crazy 1.0: 61.02, positive.Portia 1.26e took 34.8s, avg: 59.93. vs abc.Shadow 3.84i: 55.05 (22000 : 19000), avg: 53.70 vs sample.Crazy 1.0: 90.1 (22000 : 2000), avg: 90.2 vs positive.Portia 1.26e: 53.43 (22000 : 20341), avg: 54.15 Overall score: 55.34, 1.5 seasons
What do you think? Would you also like to see bullet damage / survival data? I always collect all the different fields for scoring, but for the most part was only going to show whatever you had configured as the scoring style. But I've been thinking lately it might be nice to show bullet damage / survival too.
Oh, and about the options for mid-battle score data, that sounds like a really cool idea. Do you mean like you could write and plugin your own scoring code? I'm not really sure the best way to set it up so you could write your scoring class and pass it to RoboRunner, but from a technical standpoint I don't think it would be too tough.
I was just thinking yesterday that it would be cool to integrate some stuff like what Rednaxela did here for collecting hit rates and stuff during a battle, too.
Yes, each per line would be great.
For the damage/survival data, hmm, personally i look at the damage in very rare cases (mostly if i run my 100+k/40k benchmark against the samples) and survival is most interesting if you see all places (to spot some movement leaks in early/mid game) but it couldn't hurt to show these data :)
As i said, i have quite a bunch of 'odd' scoring pattern and a way to implement these dynamic would be great. My fist thought was to provide an dynamic ClassLoader and a directory, where you can put your own written score pattern. There you can release RoboRunner with some default pattern (score,damage) and still provide the possibility to write your own. I guess this would be fairly easy just to provide an interface and pass the 'ScoreObject'. In that way you don't have to put much interest in the scoring table. Some challenges also need some unusual score i guess.
To use the 'robocode.control' (like Rednaxela did) would be extraordinaire, i constantly write new output classes and pass the results to GnuPlot but having this bundled - awesome!
If you like, i can try to put a first draft together for the scoring tomorrow. Not sure if you are fond with the idea to have someone messing with your code.
Take Care
Well, for the next round of changes, I think I'll add the per bot data for Melee battles and the other basic scoring options (like survival and bullet damage). There's still a bunch of basic things I need to check off my to-do list before I get too deep into the custom scoring stuff. But I do think it sounds awesome and really powerful.
Reading a custom battle listener at runtime and attaching it to the Robocode engine via control API (which I'm already using to run battles) should be pretty easy. Then you could listen to whatever events you want to and do whatever you want with the data. And if you're comfortable doing everything outside the RoboRunner infrastructure, that will be all you need.
What I see as the hard part is crafting a nice way for you to store/retrieve these score data in the data files and format custom output, which I think is necessary to make this a lot more useful. It's not rocket surgery, but I think the data file format would need an overhaul - maybe switch to something XML based. And I'm not sure about just passing a custom ScoreObject, because, for instance, right now I'm only listening for the final score from the Robocode engine, so I don't even have the data you'd want. You'd need to listen to other stuff for the per-round survival and stuff. There's a lot of options for what type of data to collect, so I don't think I want to guess and try to record all types of data you might want and just pass it along.
And sure, feel free to experiment some of this stuff, or fork the GitHub repo and go nuts. =) I made the code public domain so people can do whatever they want with it. I'm super stoked that anybody's even interested in this - I thought I'd be the only one using it while everyone else just stuck with their RoboResearch setups. =)
First page |
Previous page |
Next page |
Last page |