Talk:Diamond/Version History
For old discussions, see Archived talk:Diamond/Version History 20110905.
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
DiamondWhoosh vs DookiCape | 0 | 12:22, 29 August 2017 |
DiamondFist vs DookiLighting | 0 | 12:22, 29 August 2017 |
1.8.23 (melee) Stop it :) | 1 | 19:07, 24 August 2012 |
Survival and the PL | 5 | 08:47, 29 July 2012 |
1.8.1 | 3 | 15:43, 24 July 2012 |
1.7.57 frustration | 0 | 13:50, 20 July 2012 |
Problem bots from Literumble | 5 | 20:12, 18 July 2012 |
New King! | 5 | 17:15, 18 July 2012 |
kernel density is important | 13 | 22:08, 16 July 2012 |
gun tuning tangent | 26 | 20:31, 15 July 2012 |
fading weights | 6 | 18:25, 29 June 2012 |
1.7.30 - As the wave breaks | 4 | 17:46, 23 June 2012 |
1.7.20 Anti-Surfer gun fix | 0 | 18:11, 13 June 2012 |
1.7.x refactoring | 0 | 14:22, 5 June 2012 |
1.6.x PL brags | 6 | 16:57, 26 September 2011 |
1.6.7 with fancy Anti-Surfer gun | 0 | 01:42, 6 September 2011 |
Hi mate. This version is awesome, the survivability is extraordinary. I know, it's only 3k battles now, but i guess it will hold his score quite well. And that right after i had Wallaby made some sort of a survival king :) (at least in mico) - well done made. This will be thrilling to watch how far you can push it.
I'm just kidding with the 'Stop it!' :)
Tae Care
Thanks dude! I think there's still a lot of room to grow at the top in Melee. Justin (of DemonicRage) and I both kind of just stopped working on it when we got to that virtual tie for #1.
I haven't played with Melee for a while, so right now I'm still kind of just getting my bearings again. But I think between having a better testing tool (RoboRunner vs hacked up RoboResearch) and way more CPU power, it should be a lot easier for me to find real improvements now.
And it might be a good idea to build up a buffer before Numbat and Neuromancer come for my head. ;)
It seems that lower bullet power is a real boost against upper bots... methinks some sort of adaptive bullet power really is the way to go.
I can only speak for melee but it was very rewarding to put a rule within my bots where i only shoot 0.1 bullets if i have more energy than the other bot (endgame 1v1). This worked against every kind of bot. If the bot is weak he will drain his energy anyway (bad hit ratio) and i get the win score. And if the bot is strong there is a good chance that he will also drain his energy and i can't hit him anyway. If i have less energy than the opponent i just shoot with normal bullet power to prevent the stronger bot to score more bullet damage. I guess this is working because in melee you have mostly significant less energy for 1v1 as in normal 1v1 and therefor the stronger bots have not much time to adapt the movement and weapon.
I'm sure this is not working very well in 1v1 but i guess it shows that it is worth to think about because there are a lot of bots with bad energy management.
That would make sense, as most of the time only the strongest bots get to the endgame. I wonder if it would help even more if you only shoot 0.1 against everybody in the endgame...
I have tried to shoot 0.1 only, but it didn't work very well, because if you have less energy than the other bot it is most likely that you lose anyway. But you give the other bot the chance to collect more bullet damage. If you have a weak opponent you have a good chance to hit him and get on some point more energy and a good chance to win, and if it is a strong bot you deny him the additional bullet damage.
I had played with the energy slope to detect when he might be at zero and adjusted my bullet power to this value. The energy slope was quite nice to react on bullet hits and other energy losses and worked very well but it turned out that you have to react on some battle situations where it is better to shoot normal otherwise the opponent get to much bullet score.
Yeah, what a pleasant surprise - I was going for APS. =) This was a .09 improvement over 3000 battles (6 seasons x 500 bots) in tests, so I was sure it would translate to the rumble. But I guess it really did hurt against th really weak bots I excluded from this test bed, which surprises me.
A few weeks ago I tried to find some really scientific bullet power selection. My thinking was that in general, if survival is not a worry, you want to maximize bullet damage - do as much damage as fast as possible, to increase your bulle damage and give him the least amount of time to get his own. But if survival is a concern, you want to maximize energy differential. So I plugged all this in to my normalized hit rates, expected rate of return for both of those, and trying to figure out when to switch between them.
Turns out maximizing bullet damage = 3.0, maximizing energy differential = 0.1, in almost every situation against almost every bot. Like you're literally best off just not firing to maximize energy differential unless your accuracy is super high. It didn't work very well. I think ~2.0 works because it's only slightly below 3.0 in damage rate, while also much better in hedging over more shots for consistent survival, and gun accuracy is better if you stick to a more consistent bullet power in data collection. But I'm not really sure. I'd still like something more intelligent than a hand crafted formula, but for now I'm skeptical about really making that work.
I think the thing to remember is that you don't have to maximize energy difference once you're certain you'll win the battle. From that point onwards it is possible to maximize bullet damage.
I'm interested in what could actually change in a surfing algorithm to increase the MEA. Are you doing a different escape angle or something?
Oh, it's pretty simple actually. Instead of simple orbiting, I'm moving in a straight line towards the point where I'd end up in a precise MEA calculation.
My main worry is it seems like a ton of extra calculations per tick, but in practice it doesn't seem to slow me down that much. I guess some combination of the "don't calculate second wave when you don't need to" optimization, and that I don't recalculate the destination for the direction I'm heading in on the first wave.
Heh, so essentially you're using a sort of Goto surfing now, albeit with a very fancy points generator?
Yeah, it is a little bit of go-to. =) I'm always moving towards a specific destination. But the overall algorithm is still pretty much unchanged from True Surfing, it's just how I choose the movement angles for each direction has changed.
Wow, a loss of 0.1 from some legit bug fixes I found during a small refactoring. My perceptual gun was always reporting HOT to the VG, which should hardly even come into play since barely any firing waves reach the enemy by the time the perceptual gun gets disabled anyway (4 shots). As far as I can tell, that's the only real behavioral change. I'd think the score loss was randomness but I also saw the difference in my benchmark (and I'd dismissed it). So, obviously I refuse to revert to broken code, and now have to figure out how the heck this should really be tuned...
Naturally I want to believe it's some other change I'm missing, but it really was very few changes before the first benchmark that showed the loss. Bummer!
Hey, I just thought you should know that the Literumble pairings are just about complete (which is where all of my CPU cycles have been going).
It seems that both of us have very low scores against rtk.Tachikoma. Something to look into.
Cool, thanks for the heads up! I think this will be a great way to find bots to improve against.
Could the Tachikoma thing be a discrepancy in Robocode versions? Are you using 1.7.3.0 or something newer?
It could be. I'm running 1.7.4.2 Alpha2 at the moment as it accepts gzip/deflate encoded data for the rumble and helps cut down on my bandwidth. A lot of the earlier results are from 1.7.3.0 though, so it could be related.
Some quick tests show that DrussGT typically gets 80-85% on my 1.7.3.2 dev Robocode, so it seems the the results on Darkcanuck's server are a bit odd... I see in it's Tachikoma.properties
file it is built for 1.7.3.2, but I don't think that could be giving it bad battles on 1.7.3.0.
Here is a more telling result: Tachikoma details sorted by date. Scroll down and keep an eye on the survival. Only the latest battles have any survival at all. The time they started getting survival corresponds with the time I switched my clients from 1.7.3.0 to 1.7.4.2_A2. So it seems it is a client issue.
I recall Wompi discovered some bots acting differently from 1.7.3.0 to 1.7.3.2. I'd forgotten, but Tachikoma was one of them: Talk:RoboRumble#Rumble_Client_1.7.3.2_vs_1.7.3.0_1092.
RoboRumble ‒ APS: 89.82% (1st), PL: 913-1 (2nd), Survival: 97.19%
It's one small step for bot, one giant leap for community! I know, that it's too small margin, but congrats, Voidious, you're the best again!:)
Skilgannon, nothing personal, but i'm opponent of absolute monarchy:)
Hmm, I'd say my best version was 2.5.5 though.
This shows DrussGT a *little* bit ahead... exciting times!
I foresee fierce battle for crown in next few week with unpredictable result. And as i say earlier i'm fan of Diamond this time:) Actually, i think that completly new King, Jdev for example, is better case:) But in fact now only you, Voidious and Skilgannon, worthy of crown:) So good luck for both and i will go to market for popcorn:)
P.S. Let's break a 90 APS barrier!:)
Thanks for the encouragement Jdev. =) It's exciting (and a little weird) to even be in the ballpark of DrussGT, he's been so dominant for so long.
DrussGT is indeed still the king - I think the best of each is Diamond 1.7.53 vs DrussGT 2.5.6. Diamond shows slightly ahead there, but that doesn't include the head to head battle, which DrussGT wins by a margin and would put him in the lead. But really this is all well within margin of error, anyway, which means DrussGT is still King.
Lets see what happen in next few weeks:)
Besides, I've been making a nice run in the real competition since I got my Robocode computer. ;)
- Diamond 1.7.47:
Math.exp(-0.5 * ux * ux)
- Diamond 1.7.50:
Math.pow(2, -0.5 * ux * ux)
(.47 vs .50) - Diamond 1.7.51:
Math.pow(2, -Math.abs(ux))
(.50 vs .51)
I know 1.7.51 is far from stable, but it blew away my test bed enough that I'm pretty sure it's a nice jump (knock on wood).
Fun to experiment with!
I just need to figure out where my major performance problems lie, because if I try directly using Math.exp() or Math.pow(), I get hundreds or thousands of skipped turns in a round. I'm pretty heavily reliant on using a fast approximation for Math.exp() right now.. but I don't think I should have to be...
Using the approximator seems reasonable to me. I actually saw that in your version history and played with one a bit in my gun. =) In my main gun, where I do over 10k kernel density calculations per tick, I long ago abandoned gaussian because it was too slow. But I thought with an approximation, which I already had laying around from some experiments with an integral surf danger formula, it might work. It was fast enough, but it didn't perform better anyway...
I've found that a formula that smooths across the whole angle range is really important in movement. And in my movement, it's a max of 200 data points * 12 firing angles tested = 2400 kernel density calculations (across both waves). So until now, I stuck with gaussian because it's the only common kernel density formula I'd seen with that property. But I finally started playing with modifications of that and it was quite an improvement.
Bingo!
A big big problem was that I was calculating all dangers on my waves up-front. My reasoning was to take a one-time calculation hit and then surf using lookups.
Problem was, at the angular resolution I was wanting, this involved tens (maybe even hundreds) of thousands of kernel density calculations when creating my wave danger Object. Seems like a few thousand kernel density calcs each tick works a lot better for surfing. My skipped turns were probably happening when I detected enemy waves fired on the same turn as trying to make a targeting decision.
Targeting is still annoying in this sense.. the entire angular range needs to be evaluated on this tick. I like the exponential/Gaussian approach.. but want to investigate if there are less processor intensive kernel functions that work as well (or better?).
Regarding targeting being annoying in terms of evaluating the entire angular range, how are you doing that currently? Are you just calling a kernel density function on a large number of fixed points?
Here are three examples of ways to perhaps calculate kernel density faster in the context of targeting where you only care about the maximum:
- If you take the derivative of your kernel density function, you should be able to find the zero-crossings of the slope, and only calculate the kernel density at those points.
- One could also try approaches like skipping the kernel density calculation for angles which are too far from any data points.
- Or maybe even use the data points themselves as the angles to run the kernel density calculation for.
- With certain exceptionally simple kernel density functions (i.e. rectangle like I use in RougeDC/Scarlet's targeting), you can find the peak extremely fast with specialized algorithms also.
re #1: That seems to break for me, because (taking the Gaussian example) if I have two data points, centers -0.25 and 0.25 .. the maximum of the total area after calculating both kernels will be at x=0, which wasn't a zero-crossing of either Gaussian point in isolation.
re #2: I like this idea!
I've just now switched (experimentally) to using the Tricube kernel because I like it's shape: flattish in the center and trailing off to either side. I have it adjusted to slightly overhang the precise intersection width of each data point. Since it only exists from (-1,1), I've got some of your suggestion #2 built in, and turn skipping has pretty much ceased! We'll see how well this kernel compares, of course....
For #1 I did not mean the zero-crossing of any one point, I meant the zero-crossings of the sum of all the derivatives of the kernel density function. Of course, whether it's efficient to calculate those zeros or not all depends on what the kernel density function is (probably not practical for gaussian, trivial for triangular, as two extereme cases)
Hmm... tricube sounds like an interesting one, though that's quite a bit of multiplication it uses. I wonder if this is the sort of thing that would be worth doing a rough approximation of really. I mean... it probably wouldn't affect the results too much to do the kernel density as a piecewise "sum of rectangles" approximation, and it would be much faster.
My solution to your problem was 2-fold:
1: Use a faster smoothing function. I've ended up at 1/(1+sqr(x))
2: A bit of dynamic programming: pre-calculate a single 'function profile' (and put it into a set of bins), centred at GF0, which runs from GF-2 to GF+2. Then whatever your GF is, you just need to scale your GF to figure out where on the function to draw your value from. So rather than doing an entire smoothing function for each hit, log all your hits (without smoothing) into a set of bins, then do the smoothing afterwards into a different set of bins by checking each bin if it is non-zero and overlaying a 'function profile' with that weight. If you're really sneaky you can even keep what the bin index of the hit is, instead of the actual GF ;-)
Until a couple versions ago, in my main gun, I was using Gaussian until a certain number of data points, then switching to if (abs(ux) < 1) { density += square(1 - square(ux)) }
. After some testing with WaveSim, I'm now using (1 - cube(abs(ux)))
and never using Gaussian. YMMV, but I think with the amount of data you have in a gun, you don't really need the heavy smoothing offered by formulas that cover the whole range.
One less intensive approach that covers the whole range would be something like: density += 1.0 / (1 + square(ux))
, which is akin to what a lot of VCS guns do for Bin Smoothing.
Wow, nice work. I haven't really stopped robocoding (does anyone ever really stop?) but I took a break for a while and now I'm working on an R-Tree, and some rewriting for Gilgalad. I had an idea that might push Diamond to the top. As far as I can tell, you only surf three options on the second wave for each of your three options on the first wave. I suspect that the bullet shadows make the dangers much less continuous so that using more points on the second wave would help your score a bit. (For Gilgalad, I thought I had more or less fixed the skipped turns problem by using every 5th point and making sure I got the extreme points in either direction for the second wave, but I got a new computer that has an intel processor rather than an AMD. It's more than twice as fast as my old one from four years ago, but it seems to have way more skipped turns.
Hmm, interesting thought. My original surf algorithm was to check every point along the second wave (in the days of bins and no precise intersection), but just checking forward/stop/reverse somehow always outperformed it. It's true that a lot has changed since then, including bullet shadows, so maybe you're right. But my most recent experiments with changing my surf algorithm were even more significant and came out with almost no change in score, so now I'm a little skeptical about tweaking my surf algorithm. =)
Wow, congrats on these tweaks, although it brings Diamond a bit too close for my liking there! I think that we tend to weight the second wave so low anyways that minor inaccuracies aren't as big of a deal. Wintermute does that though, for each tick on the second wave try stopping and see where the intersection is. It mostly just made it slow.
You mean you only use three movement options on the second wave for each movement option on the first wave? And I've spent all these hundreds of hours optimizing for nothing!
Man, I'm so relieved to finally have a nicely tuned gun in 1.7.47. I hit several weird hurdles along the way that had me really confused / annoyed. The whole time, I knew it wouldn't even gain me many points, but all I wanted was to find some small gains and get warm fuzzies about my gun being nicely polished. =)
Now that I have that figured out, I'll just re-tune the perceptual gun against the same bots, hope I don't lose much or maybe even gain a little, and move on with my life. =)
The hurdles, if anyone's curious:
- Made a new version of TripHammer updated to Diamond's current code base, which has changed a lot of nitty gritty data processing stuff.
- My genetic algorithm code for the "fading KNN" was setting the parameters related to "size of k" on the wrong Classifier, so they were producing jibberish (had no impact on fitness) for several versions of Diamond.
- My KNN classifier (basically the WaveSim version of Diamond's gun) was multiplying the scan weight to the value I pass to Math.exp, instead of the result of Math.exp. No idea how/when that happened, but it sure made me feel stupid.
It's so strange, I found, once I add an attribute it doesn't really matter how much it is weighted (within an order of magnitude or so), I still get around the same results for gun accuracy. The biggest gains I had from genetic tuning was adjusting the speed that the 'time' attribute increased, and even then once it was in the right ballpark there was very little to choose between them. Still, it does help to squeeze that extra 0.1% out =)
Hmm, is it that strange though? You have enough good attributes already that the new attribute likely correlates to a significant degree (but not entirely) with one or more of them, which I'd expect to make it so it wouldn't change which points are closest when it's weighting is only changed a small amount.
True. I really need to PCA the data that gets generated by a typical battle. There must be an input transformation which can eliminate a bunch of the dimensions.
Attribute weighting is probably one of the things in Robocode that has received the most attention vs what it deserves. =) Sort of like dynamic segmentation, which used to get tons of focus, but is IMO much more elegantly implemented with KNN. I think it's worth having them tuned, but for example, Gilgalad is a super strong bot and recently got the exact same score when he removed his gun weights.
My thoughts with PCA would be that we could eliminate a large number of the dimensions stored in the tree by only taking the X main components, and make a transform which combines a large number of measurements from all sorts of things which aren't even very useful and turn them into a much more information-dense, lower dimension location. This would save on memory as well as search time while still keeping pretty much exactly the same results.
I agree that far too much effort has been put into refining weights, but it does have its place for ekking out that extra little bit of performance against a known population.
Obviously I agree it's worth some effort, if you check my recent version history. =) It's a very obvious and easy knob to fiddle with. And I can see pretty clearly with WaveSim that there are accuracy gains that can come out of it.
The PCA stuff sounds pretty interesting. I think it went a bit over my head in my Machine Learning class (though I understand the basic idea).
I think another big factor is that there's so much variance in hit rate, and so much score coming from movement and survival, that increasing accuracy beyond a certain point just doesn't translate into very many rumble points. The best of guns can miss 10 shots in a row and force you to rely on good movement and energy management. It's still fun though. =)
My current view is that movement and targeting inextricably linked with each other and it's impossible to say which part of points come from movement and gun. I think, that both statemets are correct:
- good gun gives less chances to enemy to hit you (so less score for enemy and more bullets for you, so more score to you), because steal his energy
- good movement gives less chances to enemy to hit you (so less score for enemy and more bullets for you, so more score to you), because steal his energy.
It's system of equal partners, imho for last few weeks:)
And a little offtop: also, imho for last few weeks, that statistical targeting is impasse (deadlock?) and next breakthrough may be in single tick playing forward. Especially in the light of the fact that totally annihilate of weak bots is more important, that destroy strong bots.
I disagree for a different reason... I think that's a bit of a false dichotomy, because I'd still classify the "single tick playing forward" methods as statistical targeting so long as the mechanism used each tick is still statistical. It adds another assumption to make each data point used more generally, but so do GuessFactors.
Really, what the technique provides, is denser data by making the assumption that on a given tick the opponant behaves in mostly-deterministic manner according to the attributes you're targeting based on. If your attributes are sufficiently complete, it should have a quicker effective learning rate.
I do think there is value in the "single tick playing forward" idea, but as-is it uses too much CPU, espescially if your targetting attributes are complex. I think one has to consider what it brings to the table and take advantage of it without making things so slow. My current view on the best approach, is that it would be doing larger number of ticks than one at a time (i.e. 10-tick-at-a-time iterative prediction).
I did not say, that behind ST-PIF must be kNN etc.:) Neural Networks may be used, for example. But actually yes - when i implement this, it was knn based. And you completly right: although this gun gets hit rate >95% against walls and >60% against crazy, it was unacceptable slow.
And I never said ST-PIF was always statistical, just that it doesn't have anything more to do with it being statistical or not than GuessFactors do (aka, nothing) :)
<random> Come to think of it, "Single-Tick" techniques and "GuessFactor" techniques have a lot in common... both "fold" data across lines of assumed symmetry. GuessFactors "fold" across the "front-versus-back" symmetry, whereas Single-Tick folds across a temporal symmetry of sorts.
GuessFactors have proven themselves highly beneficial, and Single-Tick techniques may also in the future, howver both techniques would perform sub-optimally when encountering something which violates the symmetry they assume. Unless the targeting attributes include something that differentiates front/back, GuessFactors will perform sub-optimally when faced with an opponent which treats them differently. Of course, it's difficult to take advantage of this in a major way I think.
Similarly the weakness of Single-Tick techniques is when an opponent treats different ticks differently due to something that cannot be detected in the targeting attributes. For most robots, even surfers, the assumption is probably good enough... but... in contrast to guessfactors... <evil>A cleverly designed semi-random multi-mode movement could be designed so that the movement path generated by a "single-tick" technique is never where it actually ends up ;) </evil></random>
Anti-Pattern matching comes to mind.
Have you tried using k=1? How does it compare then with something like regular kNN-PIF in terms of speed and hitrate?
Sorry, but i forgot details, everything that i remember i already wrote. Tomorrow i can publish that code, but i have no time in nearest future to liven up it
I guess K=1 would make ST-PIF have the same weaknesses as neural network based Pattern matching (non-statistical).
If a bot dodges 30% of the time going straight then turning to the right and 70% of the time going straight all the way, Neural Targeting averages both patterns and shoots slightly to the right, missing both patterns. In other words, it is awful against Walls.
Increasing K is what makes the gun choose the "straight all the way" pattern alone and achieve 70% hit rate.
Yeah, but you have other factors which would affect which scan is closest, like forward distance to wall, time since decel, distance last 10 etc. which all affect what the enemy motion will be. That is the advantage of this over plain single-tick pattern matching (which works better than regular pattern matching, but is slow/memory hungry). Even having k=3 would be quite fast for each kNN compared to what works well in guns now, where I can easily use k=150 and not skip any turns.
Also, once it gets onto one of the branches which suggest it will follow the '70%' you mention, the act of following that branch will make it more likely to further follow similar branches in the future, so it won't end up in between, but rather will end up at a different path completely.
I also thought on this problem and find out a possible solution: keep similar amount of data with different classes.
I tend to disagree - I think gaining rumble score via targeting has to come by improving against mid-range bots that are scoring significant amounts of bullet damage and survival against you. Whether or not I beat SpinBot 4000-0 or 5000-0 isn't going to make much difference. =) But going from 70% to 75% against a few dozen bots will make a difference.
Roughly, on eye, Diamond has >=90% APS against ~50% bots, so it's better to go from 90% to 95% against ~450 bots:) More accurate, Diamond has 5 bots with 70% APS and 28 bots with 90%, so, again, it's better to go from 90% to 95%:)
More accurate data: Diamond has 70-80 APS against 134 bots and 85-95 against 277 bots 90-95 APS against 168 bots
Sure, I'd love to go from 90% to 95% =), but that's incredibly difficult. It means cutting the enemy's score in half. And these are bots that you are already annihilating, and which are winning about zero rounds against you, so all the score increase has to come from relative bullet damage.
On the other hand, going from 70% to 75% means cutting the enemy's score by ~16%. And these bots are winning some rounds against you, which gives you more score to take from them as you improve.
I did not say that it's easy to more totally annihilate (completly totatlly annihilate:)) a weak bots. I say, that it's place where more points are hidden:)
Well, I weight the hidden points by how easy they are to obtain. =) For instance, you won't see me talking about the 55 points still "hidden" vs DrussGT...
But they are there!:) Ok, i offer stop this little offtop:)
(Maybe I should see how this fares before babbling about it, but anyway...)
How this works is: for each attribute, I have an initial weight, a final weight, and a final time (say, 20,000 ticks). The weight shifts linearly from initial to final value until final time, when it stays at final value. Tuning against my 500-bot, 4 season general rumble test bed, I increased my hit percentage from 23.21% to 23.90%. Not sure how much I could've gotten just re-tuning without the shifting, but testing just the initial weights or just the final weights came in around 23.1%. And the previous weights were genetically tuned in a similar fashion, so I think they were close to optimal.
How this came about is I was looking into a custom gun just for the first round of a battle, the way I've been using RetroGirl/Gun for the first few shots. I tried just tuning a KNN gun on 1-round battle data, and the result quickly outperformed my current settings, and the weights looked a lot different. After starting to implement it in Diamond and seeing promising (if incomplete) results, I thought of the more general solution of gradually shifting the weights. This too outperformed my current settings after very few generations. One thing that stands out is that my gun heat dimension starts at a very low weight and ends with a pretty high weight late in the match. This seems to make intuitive sense: as I have more data, it's better to favor firing waves more.
Now to cross my fingers that it produces in the rumble. =)
Neat concept! Now this has got me thinking about trying to associate KNN weightings with all sorts of things besides time... distance maybe?
What do you use for your test bed? RoboResearch?
I've been wanting to systematize my testing more so I can shake things out more thoroughly in an automated way before I throw it up on the Rumble server. I've got RoboResearch ready to go.. I just need to assemble a set of bots to test against.
Yeah, I use RoboResearch for real testing. For some gun-only testing (like this) I use WaveSim, which is a tool I wrote to test just classification against raw battle data - so it doesn't work for testing bullet power or some other nitty gritty things, but for the things it does work for, it's pretty sweet (and fast).
For test beds, I use User:Voidious/BedMaker, which is a little script I wrote to select random bots from the rumble within certain parameters. But you'd need to get a rumble server API key from Darkcanuck first. I've been thinking maybe I should make a web-based version of that for others to use freely, but I wasn't sure if anyone was interested...
Currently hunting down which of a bunch of changes between 1.7.29 and 1.7.35 caused a decrease in performance. I swear I tested rolling back each one individually yesterday and none of them fixed my score. So now the other way: start with 1.7.29 and add each change individually. Once that's done I can see if this fading KNN stuff actually helps.
I've been pretty fearless with continuous refactoring and bug fixes throughout 1.7.x, and this is the price I pay for it. =) But overall I think it's been worth it, both in terms of code quality and performance.
Orrrrr I've been chasing ghosts... In my 2,000 battle benchmark (250 bots x 8 seasons), 1.7.29 came in 0.18 above 1.7.37 and a dev version of 1.7.35 got the exact same score as 1.7.37. Rolling back individual changes gave me anywhere from 0.14 to 0.22 below 1.7.29's score. Then I recompiled 1.7.29, checked the class files came out the same (ie, I had the right source), and reran 10 seasons... and came in 0.14 below 1.7.29.
Another good reason to focus on big improvements: anything else is too small / painful to reliably benchmark. =)
I'm really curious to see how "cast shadows over firing angles that already would have hit" works out. This is actually a feature I had implemented in RougeDC a long time ago, but I can't remember if I saw much benefit in the rumble or not.
Yeah, it seems like a cool feature, and it doesn't seem possible it would cost you points unless there is some other bug or quirk at play. Looks like I lost 0.1 APS, which could be margin of error, but I didn't really reproduce a score increase in tests either, so I think it's accurate. Another nice thing this would let me do is surf a wave until it completely passes without sacrificing anything. If I'm modeling the danger accurately, I'll pretty much be surfing the second wave as soon as I am now anyway, and even a tick or two sooner in a lot of cases where waves break along my front edge.
The one aspect that probably needs improving is how I apply shadows to the danger calculation. I ignore any firing angles that fall within a shadow, and I multiply the final danger by (1 - the percentage of firing angles that are shadowed). Of course this yielded like 0.8 APS with bullet shadows, so it must be a decent approach, but the right way to do it is obviously with integrals and all that. I think I need to change my kernel density formula to do it right, which is something I still need to figure out.
It's funny, I've thought about this idea before, but it always seemed like a hugely complex thing to try to deal with. This time I already had all the pieces in place to do it without much additional work: precise intersection, bullet shadows, plus a lot of newly refactored wave code that makes my life a lot easier. =)
Ahh, a pity it didn't seem to really help, though maybe it would more with the integral-style danger.
Makes sense. It was pretty natural for me to implement in RougeDC because it had both precise intersection and integral-style danger right from the start.
I'm also surprised this didn't work. With Goto surfing I could understand, as it doesn't make any accel/decel decisions once the wave starts breaking, but with True Surfing the decision to speed up/slow down is made as the wave is breaking over you. Are you creating the wave shadows in your simulations as well, or just using the ones which are actually created by the bot hitting the waves?
And you're making me nervous with your recent gains =)
I'm just creating shadows up to the present point in time, including the firing angles that will hit me next tick (before I can move again). I did consider that as an extension, I could shadow any angles that are totally unavoidable, starting several ticks earlier and simulating until the wave is gone, but I'm taking it one step at a time.
My best guess is that the crudeness of how I apply bullet shadows to my danger calculation is what's holding me back. My kernel density danger calculation just ignores any angles that fall within a shadow. But an angle near the edge of a shadow would otherwise cause me to go the other way, and that's probably worth doing since those angles are just guesses. With these shadows, I'm totally ignoring that firing angle. I'm going to try some integral style dangers and see where that takes me.
It's also worth noting that while it's true those angles really should be viewed as having zero danger, many of them are in common across the movement options, so they may frequently cancel each other out anyway. (Ie, what's the difference if I add zero or some other number to all the movement option dangers?)
I know you can't really say anything from 1 battle, but I don't ever recall seeing a score vs Shadow like this one. =) Yay!
My Anti-Surfer gun tries to predict the enemy's surfing on the nearest wave, then restrict its min/max firing angles to those still reachable assuming the enemy surfed the nearest wave as expected. For figuring out what's reachable, I hacked up my precise MEA calculation with a different starting state. But basically I screwed that all up and it hardly worked at all. (About time I started writing unit tests...)
Despite being broken, I had tested this gun against all my worst matchups and it helped against Shadow. So hopefully it will work better now.
Ok, looks like 1.7.9 finally fixes "the" major bug(s) I introduced with the movement refactoring in 1.7.5. I guess I won't know for sure if it was one of the 1.7.8 changes unless I let it fill out its battles, but it started off tanking pretty hard and the wrong GFs in the flattener (fixed in 1.7.9) is exactly the type of thing I thought would be the problem. (I switched to precise intersection in the flattener, since it was free with the new wave processing, but copied a couple lines from the gun that also used precise MEA GFs.) I'm really happy to have the wave interpolation and bullet shadow fixes too, though, and wonder if they're the reason 1.7.9 is undefeated (that or luck).
Regardless, I'm happy I can move on to the rest of the movement refactor. There's still quite a lot to do. :-) It's amazing I can refactor or rewrite every 1-2 years, and every time, the previous code seems so terrible...
While the RoboRumble results don't bear much evidence, I'm quite happy with the progress I've made against top bots recently (while maintaining or increasing APS!). I mainly focus on Shadow and DrussGT, but also against a test bed that includes 6 other strong bots. I swapped out Scarlet for Tomcat because at least twice I saw a score of 85-90 (adaptive bullet power gone haywire?).
abc.Shadow 3.83c darkcanuck.Pris 0.92 davidalves.Phoenix 1.02 jk.mega.DrussGT 2.2.0 kc.serpent.WaveSerpent 2.11 lxx.Tomcat 3.17.169 mue.Ascendant 1.2.27 voidious.Dookious 1.573cNDS (non-data saving)
Results over 100 seasons (plus an extra 200 of 1.6.12 vs Shadow / DrussGT):
Diamond | Shadow | Pris | Phoenix | DrussGT | WaveSerpent | Tomcat | Ascendant | Dookious | Avg | Seasons |
v1.6.4 | 50.20 | 51.43 | 56.41 | 45.74 | 52.06 | 58.94 | 56.05 | 51.58 | 52.80 | 100 |
v1.6.12 | 52.46 | 60.74 | 61.13 | 49.30 | 57.45 | 58.66 | 62.81 | 58.60 | 57.64 | 150 |
There are very strange results for Tomcat's last version, but his aps against Druss and Diamond is about 50% now
Congrats! Did you test it or just looking at the Rumble? 2-3 battles isn't very accurate... But I should update DrussGT and Tomcat to latest versions in any case.
FYI:
- 100 battles vs Tomcat 3.27: 54.15
- 300 battles vs DrussGT 2.2.2: 49.00
Edit: Oops, Tomcat's at 3.29? Doh!
I lost 7-8 places with the latest version, so congratulations are not appropriate in this situation:) No i did not test, but now there're 9 battles and APS is still about 50%:) And yes, i did my previous post to encourage you to get last version of Tomcat:) Good luck with PM crown hunting:)
Thank you, but for me is better -5 APS vs Diamond, but + 1 APS overall:)
So far, this hurts vs Phoenix and Dookious and helps vs Shadow, but that's a trade-off I'm willing to make at the moment. It brings me from ~49 to ~51 vs Shadow. As usual, nothing helps vs DrussGT... There's plenty more I can do to tune this, and I could even end up with multiple Anti-Surfer guns in my VG, but I've made enough changes that I just want a sanity release before going much further.