Talk:LiteRumble
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Prettifying + Bot Comparisons | 2 | 16:28, 25 July 2012 |
Lost Pairings | 2 | 17:31, 9 July 2012 |
Trying New Rumbles | 3 | 08:39, 8 June 2012 |
Nice work and some thoughs | 18 | 13:57, 1 June 2012 |
Problem Bot Index | 7 | 13:49, 1 June 2012 |
First page |
Previous page |
Next page |
Last page |
I've added a simple color scheme as well as the comparison page. I still need to add something for easily selecting previous versions of the same bot, but most of the work is there. Take a peek!
I recently made a transition over from using gzipped pickled Python objects to gzipped json'ed Python dictionaries, and somehow managed to run into an error with the main Rumble object, which was deleted each time it was uploaded to. After that all the uploads that happened had their respective bots pairings deleted. I managed to catch it fairly quickly, but about half the bots lost a good portion of their battles, so it's back to the waiting game again. Fortunately after the changes the server uses less memory, so I can do more aggressive caching and have less accidental out of memory shutdowns =)
I also need to whip up a bot comparisons page to compare versions... this or next week I think.
Are you still pushing the limit on how many clients you can support? I've now got a heck of a lot more RoboRumble firepower, so let me know if you want me to point some of it at your server.
I was running Melee clients at the same time, so I've stopped those now and they're running 1v1. That puts me at 6 clients, which I think is all I can handle. Perhaps I'll upgrade to paid someday when I get more annoyed with the limits =)
Do you mind if I try submitting results for some different rumble configs? I could setup my own instance of this at home or something if you'd rather. I'm thinking of finally seeing what a StrongestBotsRumble might look like, or a PerceptualRumble.
No problem. Just don't put more than one client on at the moment, or it will probably go over write quota. I'm currently working my way up through all the pairings of the roborumble with 4 clients on an i5, and filling out pairings for the first time is probably the worst as far as writes go because every single bot gets writes evenly, so caching between writes doesn't help particularly. Because of this I currently have caching on quite aggressively, I will lower it in a while. The problem with excessive caching is that occasionally the changes made in the bots get evicted from memcache and the frontend instance gets shut down/cycled, so some battles might be lost. Don't worry, everything will stabilise in the end, and it should be robust to any problems with lost battles on one pairing but not the other, etc.
If it does hit write quota it will reset at 07h00 UTC. You will start getting messages from your client that writes are failing, again don't worry, they will get written once the quota rolls around if they are still available in memcache or instance memory. Batch rankings (currently just Vote, ie. BestBot) get recalculated at 22h00 UTC.
Here's a sample client config: roborumble.txt
Enjoy!
Btw, what's your goal with filling out the 1v1 pairings? Just as a test to see how they compare to Darkcanuck's, or are you planning to try and keep this instance up to date with new RoboRumble activity?
Once they're full I want to try out some different scoring mechanisms - in particular I want to try out my Average Normalised Pairs Percentage. I also want to see what the KNN-PBI looks like for the main rumble. I'm not sure I want to keep a ~900 bot rumble permanently updated - it would eat into the free quota quite a bit. The same with the melee rumble. They are better suited for Darkcanuck's server IMO.
I think it would be more interesting to have a few others, like the PerceptualRumble or TripleDuel/TwinMelee, but I don't want to run any rumbles with more than 100-200 bots on a long term basis. They are just too slow to stabilise, and chances are that the majority of the bots have been abandoned anyways. Perhaps the next step is to write an app that serves the participants list as a FIFO - automatically kicking out old bots as new ones are entered (although not counting versions as new bots).
Nice work!
Not using JavaScript sorting made it easier to link sorted tables from other pages. Also, with a melee database reset, getting rid of battles with retired bots may change the rankings. And you put %wins scoring. =D
But I miss some kind of Condorcet ranking. PL was the only one we had, and the one Combat was doing best.
Also miss some kind of statistical ranking. Elo was what we had and allowed fun statistics like problem bot index, specialization index and that non-working JavaScript diagram. Mirror bots and ram bots will lose some of their appeal without those statistics.
I tried to raise a RoboRumble server in App Engine a long time ago, but they didn´t allow me into the free tier. :(
My %Wins is a bit of a cheat. It is just 1 point per PL win, divided by bots in rumble. I prefer it to PL because it is not dependant on the number of bots in the rumble. So if Combat was doing well in PL, it should do well in %Win.
I'm still not using my Backend for anything, so I was thinking that once a day I could use it to generate some sort of pseudo-problembot stats stuff. ELO/Glicko is nice, but it is really designed for being good approximations when pairings are missing. In our case, the pairings are fairly easy to fill, so that isn't a problem; APS tends to converge to the same ranking order, and it isn't full of voodoo that makes it difficult to comprehend. It is also possible to correct APS easily if results get lost due to being in memcache =)
One ranking idea I had an idea for was doing a The Best Bot calculation (get a point for being the best against any competitor). It would increase my number of database writes in the only reasonably robust/non-batch way I can think of, which is what is holding me back at the moment. I could use a Backend for calculating it once or twice a day, I guess, or make it expire once every 6 hours and be triggered by a page load. It needs n*n runtime. Maybe I can fit it into the regular rankings calculations.
The hardest part is getting the rumble to stay in the free tier. I think it will be limited to about 6 melee clients in total, or maybe 12 1v1 clients instead (less pairings per battle in 1v1).
The last time I checked, App Engine offered about 1GB database in the free tier. Which is enough to store all pairings and all uploaded battles, as long as you delete data from retired bots once in a while.
As for the amount of clients the server can handle, it should not really be an issue, since there are usually 3 to 4 simultaneous clients at most.
If you want to use some batch processing, adding a Ranked Pairs ranking would make my day. I has O(n^4) complexity, but I think it can still fit inside the 10 minutes window from cron, so no need for a backend.
The problem isn't so much total storage space, but that I'm limited at 50k writes per day. Each bot counts as 2 writes, so effectively I have 25k updates I can do. I've figured out a caching scheme so that each melee battle comes out to 10 updates (1 per bot) instead of 45 updates (1 per pairing). I also need to update the total rumble battles count and the user upload count, so that leaves a bit of overhead, meaning I can have ~2000 melee battles per day uploaded.
I'l see what ratings systems are feasible..
Batch updates are more useful in a limited environment like that. Maybe it´s time for a refactoring in the upload protocol (1 update per batch upload), even if it breaks backward compatibility.
I've actually figured out a sort of temporary caching between requests where I wait for bots to accumulate a certain number of pairings before pushing them to disk. I don't think it's necessary to re-work the rumble upload protocol yet. One thing I would like the rumble to tell me is how many bots are in a melee battle though. Right now I just have it hardcoded at 10. It would help with my caching if I knew how many they were uploading per battle.
Hey that's neat! A quick and lightweight rumble setup could be really useful for tournaments and experiments. You just need a participants list URL, you make up a rumble name, and everything just works? Makes me want to try some new divisions. :-) Though that never seems to gain momentum...
What is App Engine pricing like? I'll take a look. I'd certainly be willing to pitch in some for Robocode related stuff if we needed more horsepower.
There is one division I would like to see. A twin melee rumble (like 5 teams of 2 bots each). Joining concepts of both melee and team/twin.
I like the idea, but I think it would be so crowded that it would pretty much reduce to melee strategy. (Having to fight off 8 other bots with 1 ally out there somewhere is not much different than fighting off 9 other bots.) Maybe 3 teams of 3? I've thought about MegaBot TwinDuel for a while...
I think megabot TwinDuel would be awesome! Although it might reduce to wavesurfing quite quickly. I'd also be interested in a TriDuel - a 3 vs 3. I think having that extra bot will completely change the dynamics compared to twinduel, and make surfing much harder.
Maybe split teamrumble into categories?
5 bots (teamrumble bots)
4 bots (DeltaSquad)
3 bots
2 bots (twin duel bots)
Teams with fewer bots can compete in categories with more bots but not the opposite.
Imagine 2 melee bots using minimum risk movement (dominant in melee), and 2 bots using provocative movement (dominant in twin duel). The 2 melee bots will be each on a different corner, but the 2 with provocative movement will be on the same corner ganking on the lonely melee bot. But at the same time, 3 bots close together become tasty targets for swarm targeting from other 3 teams.
There must be a balance between minimum risk and provocative movement, or a third undiscovered strategy. Maybe there is still room for inovation.
Sure, but that's assuming both bots on both of those teams survive to the final stages of the round, which seems unlikely. And even if both bots on one team survive that long, I think how much energy they've retained from the "pure melee" early stage of the round will be the most important factor. Maybe on a bigger field than 1000x1000, and/or with 3-4 teams instead of 5?
It assumes ganks in the middle of a battle weakens "pure melee" strategies somewhat. Although not in the same way as in twin duel.
With 3 teams, I believe "shooting the team with lowest energy" 2x1 strategy will dominate. One team is eliminated almost on luck, and the battle is decided between the remaining 2. It happens in most 3 player games.
There is a catch though, since the API doesn´t tell you which bots from the opponents belong to the same team. Which is not a problem in either meleerumble or teamrumble. But estimating it in team melee might be worth. This alone may change the game significantly... or not.
A bigger battlefield or 4 teams seems nice. I thought of 5 teams of 2 bots each to keep the 10 bots total from meleerumble/teamrumble, and 2 bots per team from twin duel. And see strategies from all 3 divisions clashing against each other.
Any of these divisions sounds pretty interesting to me. I think the main hurdle is just getting that first person to write up a 3x3 team or add TwinMelee support to one of their bots. =) Nobody wants to commit the time if nobody else is going to compete, but if someone just does it, I bet others would follow suit...
I'm kind of caught up in my Diamond refactor right now, but maybe I'll make time for something fun soon. ;) Or try running a PerceptualRumble client just for kicks.
Hmm... all of those divisions do sound interesting to me too. Now it has me thinking about how best to adapt the LunarTwins/Polylunar strategy to a bit different formats...
Yeah, my thoughts are that something like this would be perfect for school/lab/office tournaments. Just give it a new name in the client, set up a participants list somewhere and away you go.
In the free tier I'm not really going to run out of disk space any time soon, a rumble of 300 bots comes out at around 2MB, it's the database writes which are the killer. From what I can tell, App Engine pricing starts at $2.10 a week for the minimum paying tier. That gets you quite a bit more quota than the free tier, which probably should be enough for everything, pretty much forever, without crossing that $2.10 limit. For now I'm going to see how much I can push the free tier, though.
I still have a bunch of optimisations I need to make - like not pulling all of the rumble data into memory just to serve the rankings page (it's all cached, doesn't affect my quota, just speed) - which should make it more snappy both on the main rankings pages and on the RatingDetails page the RR client queries occasionally.
A hidden feature: if you add timing=1
as an argument into your GET for any of the pages it summarises the timing breakdown for CPU usage at the bottom of the page and lets you know how many bots were pulled from cache vs. from the datastore.
I'm not sure, but I think PBI with Elo/Glicko was based on some magical formula between the ratings. Maybe an APS-based measure would just be the average score your neighbors get against that bot. So like if you're ranked #25, the average score against bot B for ranks 15-35 is your expected score.
Of course, if you're #1, you can only go from 2-11, but that's probably still useful info. And in that case (or in every case), you could shift everything so your average PBI is still 0.
PBI is the difference between expected score and real score. The expected score is based on difference between ratings.
- Zero difference is 50% expected score
- -800 difference is 1 to 20 odds, which is 1/(20+1) or 4,76% expected score
- -1600 difference is 1 to 20^2 odds or 0,25% expected score
- -Infinite difference is 0% expected score
Yeah, I was trying to think of a way of handling this elegantly without having to resort to a KNN type lookup or doing a whole ELO calculation. I was thinking something like:
Expected_for_bot_a = (bot_a_APS + (100 - bot_b_APS))/2
Eg: If BotA has APS of 70% and BotB 30% it predicts the 70%, 30% which seems intuitive to me. If BotA has APS of 80% and BotB 80% it predicts the 50%, 50% perfectly. If BotA has APS of 80% and BotB 60% it predicts 60%, 40%, which seems OK.
I think the trouble with this is that it assumes that there is a linear relationship between average score and pairwise score. I think it is more of a sigmoidal relationship, because once you have taken out the low hanging fruit there is less increase to draw from. Because of this I think a modified version of the above formula, something like:
Expected_for_bot_a = ((bot_a_APS^Q + (100 - bot_b_APS)^Q)/2)^(1/Q)
for some magic value of Q would probably be a better fit.
I've added a simple 'Vote' rankings page, where each bot votes for their worst pairing. The majority of bots don't get anything, predictably, but this is interesting for use in comparing who does the best. Again, this is a winner takes all ranking, so makes no differentiation between the bot that got 79.9% and 50% against another, where the worst pairing was 80%, and this makes me uncomfortable as there is clearly lost information. Perhaps I should change it so that every bot gets a vote of weight 100*pair%/worst pair%
, but I'll leave it as it is for a day or so.
The batch pairings get updated once an hour for any rumble which has had battles since the last batch run.
If you try to figure out a sigmoidal relationship, you will eventually end with the same logistic distribution used in Elo and Glicko.
Thinking on this more, I actually really like the KNN idea. It's the only one that really tells you "you can and should be doing better against this bot", as opposed to "this bot might just have a weird score profile". (RamBots are the perfect/extreme example of this - they can show up as Problem Bots even if you're doing well against them.)
I know when I'm trying to figure out who I could do better against, I don't look at PBI, I compare to DrussGT. ;) I understand it would be a lot of calculations, but it should still be simple to code up, and it's all just basic math operations.
Another thought is, if you already have the best score vs any bot, a useful number might be that score minus your score. Calling it "PBI" would be a misnomer, but It tells you how much room you have to improve.
If you look at the site, you might just notice the errors ;-) That's because I ran out of Datastore Read quota. I think it's because of the batch rankings - before them I never even got to 20% of read quota. So I've changed batch rankings to every 6 hours, so in about 17 hours the quota will reset and we can see how it works =)
Since I'm only doing updates once every 6 hours I should have lots of quota for long, tedious calculations. So I'll whip up a KNN-based PBI over the next few days to see how it does. Any ideas on how to calculate K? How about sqrt(participants)?
It seems we have similar ideas about 'max improvement indexes'. Thinking further on my comment above about my %pair/(%worst pair) idea, I'm thinking about an interesting new ranking system that I'd like to call 'Average Normalised Percentage Pairs' or ANPP. Each bot normalises all of their pairings by subtracting the min score and then dividing by (max - min). Your score is than calculated as the average of your pairing against each (100 - enemy normalised score). Thus, if the best anybody does is 75% against a rambot, and the worst anybody does is 30%, 30% will be treated as 0 and 75% treated as 100%. This would make it very easy to see problembots, as if your NPP against them is less than your average NPP, you should focus on them more. Thus, the worst bot against everybody would get 0%, and the best bot against everybody would get 100%.
Just thought I'd say... I rather do like the notion of KNN-based "expected score" system. The sigmoidal relationship given by Elo/Glicko is a reasonable fit for predicting score based each bot's overall rating, but it does really miss the sort of interesting subtitles/patterns that a system that considers multiple axis of strength would.
First page |
Previous page |
Next page |
Last page |