Talk:Darkcanuck/RRServer/Query
Desired queries
I'm currently interested in queries to enable tweets about RoboRumble entrants once they've achieved a stable rating. For that, I think I'd need the following:
- PARAMS: Bot/Version, Rumble; RETURN: Battle count
- PARAMS: Bot/Version, Rumble; RETURN: APS
- PARAMS: Bot/Version, Rumble; RETURN: Ranking (eg, 9th)
These might be nice, so I'll post them for sake of brainstorming, but they're not needed by me:
- PARAMS: Bot/Version, Rumble; RETURN: Glicko-2
- PARAMS: Bot/Version, 1v1 or Melee; RETURN: Lowest weight class
- Can be deduced from querying with each of the 4 rumbles, anyway.
- PARAMS: Bot/Version, Rumble; RETURN: Pairings complete? (boolean)
- Can be assumed complete for battle count > 2000.
--Voidious 14:37, 6 August 2009 (UTC)
- Any reason you list the first three ones separate? Wouldn't it make sense to have one query get all three of those things about a bot, since a number is so much smaller than a http header, wouldn't it be easier to just get all of those at once? --Rednaxela 21:14, 6 August 2009 (UTC)
- No particular reason, it could certainly get them all at once. One potential benefit to separating out battle count is that I wouldn't need APS / ranking until battle count reached a certain threshold, and I think ranking is a lot more overhead than the other two (requires sorting all the bots by APS). --Voidious 21:32, 6 August 2009 (UTC)
- One basic query could get you the entire row (minus rank #) from the any rankings table based on bot name and game type. Or you could get the whole table, sort it yourself by whatever column you wish. I'd rather keep the # of query types small and their functions general and then users consuming that data can do sorting, comparing, etc. --Darkcanuck 05:45, 7 August 2009 (UTC)
- Yep, makes sense. Those two queries (get row, get all rows for a given bot/rumble) would do everything I need. --Voidious 12:48, 7 August 2009 (UTC)
The one I'd mostly be interested in would be the following:
- PARAMS: Rumble, Timestamp(optional); RETURN: All paring details (bot names, score, survival, maybe battle count) in Rumble (that had a battle since Timestamp).
It would make it easy to get/update test data for doing analysis of how bots do against other bots.
--Rednaxela 15:43, 6 August 2009 (UTC)
- Pairing data (ie summarized battles for each pair) or raw battle data? It sounds very expensive... How would you use this? --Darkcanuck 05:45, 7 August 2009 (UTC)
- Pairing data, not raw battle data I mean. I was thinking of doing a huge clustering of all 269011 pairings, across 734 dimensions (one for each bot). Essentially, the same data as RatingDetailsJson, except 1) the pairings about all bots not just one, 2) being able to filter out data that's not new, and 3) Don't care about Ranking/ExpectedScore/PBI. Once this huge clustering is computed, the cluster centers could be used to categorize bots, and potentially reveal hidden categories of bots that aren't well known on the wiki. I really think the result of such a clustering would be immensely interesting :) --Rednaxela 06:28, 7 August 2009 (UTC)
Compressing the data
I'm thinking, some of this stuff could be quite a bit of data. Perhaps it would be better to put it all into a zip file before returning it? --Skilgannon 19:48, 6 August 2009 (UTC)
Err... zip file? A raw deflate or gzip stream would be less overhead and much saner. Zip files are for archiving groups of files, not compress raw data. Actually, if the web server is set up correctly, it can transparently compress anything coming out of it as it sends to the client, so long as the client said in the http header that it accepts compressed responses (which as a note, firefox for one does on any web page). Anyway, might I suggest that maybe the query results should be encoded in JSON, since this is a very simple low-overhead format, which php has a builtin function to serialize to? --Rednaxela 20:57, 6 August 2009 (UTC)
- I'm pretty sure my server is setup to send a gzip stream if the client can handle it. No zip files though, just a plain text stream that other apps can consume easily. JSON was exactly what I was thinking, it's a simple structure and already used by ABC's very cool LRP graphs. (try using RatingDetailsJson instead of RatingsDetails to see an example). --Darkcanuck 05:45, 7 August 2009 (UTC)
Applications
Well, I was eager to flesh out the rest of the logic in the Rumble tweeting script, so I got it working with parsing the RR server pages. (Developed with local copies, then final tests on the actual URLs.) I figured that if I wrote it right, it would be very simple to switch over to the query API later. I've only run it at long intervals as of last night (and killed it for now), but I think the result is pretty cool: twitter/roborumble. It watches the 1v1 and Melee participants lists; when a new version's added, it fetches / caches the rating info for the old version; when the new version hits 2000 battles in General, it tweets. It will also print the same style of results for the bot's lowest weight class if it's a MiniBot or lower, and slightly different text for a new bot (not new version). --Voidious 17:44, 9 August 2009 (UTC)
- You can re-enable it, nothing to do with Diamond's troubles. --Darkcanuck 19:50, 9 August 2009 (UTC)
I also wanted to mention another idea I had that could make use of the query interface. I know that I have become super lazy in updating Diamond/Version History, perhaps in part because pasting the URLs and transcribing all the APS/ranking info feels tedious. Well, maybe I could write a MediaWiki extension that would let you write something like [[RumbleLink|minimelee|voidious.mini.BrokenSword 1.04]], and it would formulate the link, fetch the rating/ranking info, and insert a line like I have on Diamond's version history. (I'm thinking it would run once when you make the edit, not every time the page is viewed.) Or I could do it via a form like the UseMod table converter, if the MediaWiki extension doesn't work out. --Voidious 17:44, 9 August 2009 (UTC)
- That would be pretty neat. Will be working on the server today, might take a stab at the interface... --Darkcanuck 19:50, 9 August 2009 (UTC)
Feedback
Ok, the first two query types are live. Future ones to come are for pairings (full set for one bot, plus single pairing result) and battles (full set for one pairing). Rednaxela's request for recently updated pairs will be evaluated after that. Request your keys, play with the interface (gently) and post your feedback here. The API is not yet set in stone, I'm willing to make changes to make it useable. --Darkcanuck 05:34, 16 August 2009 (UTC)
Note that the MINE type of JSON is application/json. It may produce 406 Not Acceptable with JSON client request with "Accept: application/json". I did get that sometimes when I try it (via Telnet/NetCat) » Nat | Talk » 10:06, 16 August 2009 (UTC)
- Interesting, I'll look into it. The API right now sends nothing special as headers, just the data. --Darkcanuck 17:44, 16 August 2009 (UTC)
This is great, Darkcanuck, thanks. I'm just tinkering with it now. I have a question about Glicko-2. Right now, DrussGT in the rumble shows a Glicko-2 rating of 2187.7 (3, 0.003). This is the info I get for DrussGT when I query the "roborumble" rankings:
battles: 2147 bot_id: 1222 count_wins: 736 created: 2009-08-10 13:45:44 name: jk.mega.DrussGT 1.3.10 pairings: 738 rating_classic: 1854.737 rating_glicko: 1879.878 rating_glicko2: 1878.26 rd_glicko: 0.895 rd_glicko2: 3.372 score_dmg: 79.848 score_pct: 86.827 score_survival: 93.978 state: 1 timestamp: 2009-08-16 15:51:12 vol_glicko2: 3.371
How is the Glicko-2 displayed via the web related to the data shown above? (1878.26 is very close to 300 below what the web server gives, but not exactly.) I actually don't need Glicko-2, anyway, but the discrepancy caught my eye when testing this stuff. --Voidious 17:10, 16 August 2009 (UTC)
- Good catch! A long time ago Skilgannon suggested scaling Glicko2 to match the ELO scale (of the day). I calculated a scaling factor using the numbers from that time period but didn't want to mess around in the database, so the scaling is applied when the table is displayed. Looks like the query interface is missing that -- easy to fix. I also have a few new queries ready to go on my test server, plus some formatting changes -- will upload in the next hour or so. --Darkcanuck 17:42, 16 August 2009 (UTC)
I've got my rumble Twitter script updated to use the query API. One comment and one question about error messages:
- I almost feel silly even mentioning this one, but:
message: ERROR: Could not find bot "voidious.Dookious 3.0 in database!
Just missing a closing quote. - What does the error message look like for rate limit exceeded?
As of now, if it encounters any error besides "Invalid robot name" or "Could not find bot", it immediately exits (and, after a few changes, won't get confused next iteration... I think). I doubt it would hit the rate limits very often, but if enough bots were entered/updated at once, it would. Once I learned how to work with JSON, it was all very easy to work with. Very cool. --Voidious 20:02, 16 August 2009 (UTC)
I may have broken your script with the latest update -- after adding the new queries I went back and re-arranged the JSON data for all so that they're consistent. Basically the data field will always contain an array of recordsets, regardless whether the query returns multiple (eg. rankings, details) or a single (eg. participant, pairing) record. And the ranking field has been moved to become part of each record. Fixed the missing quote - thanks! The rate error will set "error=1" and the message will be "Too many requests per hour/minute (<number>) - try again later." Try hitting it 10 times in a browser to see it in action. Nat, I've added the content type which means that I had to switch to using the JSONView FF extension to test the results. =) --Darkcanuck 20:25, 16 August 2009 (UTC)