Talk:Darkcanuck/RRServer
Fire away...
Just a suggestion for an additional check. I have never seen score a bot more than 8000 points, so this could be checked too. When examining the results that messed up the original roborumble rating beyond repair, I saw results of 20000 against 16000 (Thats what you get when running OneOnOne with MELEE=YES). For the time being I let my client running (unattended) for ABC's server, as I don't really have the time for bughunting. Your effort however seems promising. Good luck. -- GrubbmGait
- Thanks! That's a good check, will be combining that with the survival >=35 (also your suggestion I think) once I rearrange the error handling and failure output to the client. Then I'll look into ELO... --Darkcanuck
- Your checks have both been implemented. -- Darkcanuck
Looking very nice! I have a couple questions and thoughts I thought I'll mention. So what is this "Ideal" column in the results mean? One thought I had about ratings, is perhaps it would be best to make the APS fill missing pairings with Glicko-based estimates? I'm thinking that would give the best long term stability/accuracy once pairings are complete while having something a more meaningful before pairings are complete before the pairings are complete. --Rednaxela 01:18, 26 September 2008 (UTC)
Thanks! I've just posted a bit more about ratings here. The "Ideal" column is my attempt to reverse calculate a rating based on a bot's APS. I just inverted the Glicko formula for "E" (expected probability) to yield a rating given if given E (i.e. APS) and a competitor's rating and RD. For the latter two I used the defaults (1500 and 350) so theoretically if the APS represents the score vs an average bot (and there's a uniform distribution?) then the rating might converge to the "ideal" value. But I have no idea if it works, just wanted to see how close it might be. I'm not sure you could fill in the pairings using Glicko + APS -- the reason systems like Glicko exist is to get around the problem of incomplete pairings, so the Glicko rating should be enough in itself. If it's accurate, that is -- we'll see once the ratings catch up to the pairings already submitted... -- Darkcanuck 03:39, 26 September 2008 (UTC)
Ahh, I see. Thanks for the explanation. If the Glicko rating doesn't converge very very close to the "Ideal" then I'd say it alone might not be the best fit alone for Robocode due to how complete pairings are not hard to get. The reason I suggest using APS and filling missing pairings with Glicko-based percent estimates, is because my proposed method will be guaranteed to always converge to an exact APS ranking order when pairings are complete, and would quite surely be at least slightly better than APS when pairings are not complete. Perhaps I'm more picky than most, but I'd consider a hybrid necessary if "Glicko" doesn't in practice converge to "Ideal" to within an accuracy that preserves exact rankings with APS (which I think is very plain and simple the most fair when there are complete pairings). I suppose we'll see how accurately Glicko converges :) --Rednaxela 04:25, 26 September 2008 (UTC)
- Be careful about the "ideal" convergence concept! Keep in mind that I made this value up and it doesn't really have a statistical basis of any sort. I was just curious what a naive reversal with a single data point might produce, in order to get an idea of what neighbourhood DrussGT's rating might be in, for example. I also wanted to get a sense whether I had programmed the formulas correctly. I wonder though, if we're abusing these rating systems by using %score instead of absolute win/lose values (1/0)? Would the Glicko rating converge more rapidly to match the APS scale if I had chosen win/loss? I'm very curious, but no so much as to interrupt the current rebuild, which may take longer than I thought. -- Darkcanuck 04:54, 26 September 2008 (UTC)
- Well, I'm not talking about the convergence to that "Ideal" column. I'm talking about convergence of the relative rankings as opposed to specific rating numbers. If the rankings, don't converge to exactly the same order as APS, then I think there's issue enough to justify a hybrid that uses APS, with ELO or Glicko to estimate missing pairings. --Rednaxela 05:10, 26 September 2008 (UTC)
- Gotcha. I suppose you could keep track of the rating (Elo or Glicko) and just use it to calculate expected scores for missing pairings. Then generate an estimated APS for full pairings. We'll have to see how well the ratings stabilize. I'm thinking I should have used Glicko-2 instead, since it includes a volatility rating to account for erratic (read problem bot) performance. -- Darkcanuck 06:22, 26 September 2008 (UTC)
Started sending the results to your server, as long as you relay them to ABC's server. What is the delay btw? --GrubbmGait 10:08, 26 September 2008 (UTC)
- Thanks for joining in! I have no plans to stop relaying results and have been doing so for almost a week now. If by "delay" you mean occasional slow connections, it's due to the scoring update and I've posted it on the known issues page. I have this process cranked up at the moment while I try to get the ratings to catch up, but it will get faster soon. :) -- Darkcanuck 15:25, 26 September 2008 (UTC)
Great job with his server, you can always get the ranking/battles_* files from my server and sumbit them all into yours. I'm also experimenting with mySql atm. My SQL skills are a little rusty but it's all coming back pretty fast :).
I also have a few doubts about the new ranting method. The first one is: why? From what I understand Glicko is an ELO extension for rankings where the match frequency is not uniform between participants, which is not the rumble's case? As an experiment it's very cool, but for me the "old" ELO method is time tested and proven to work great, and should be the default sorting method for the ranking table. --ABC 11:23, 26 September 2008 (UTC)
- I also have some doubts about if Glicko will actually give better or much different results than ELO, however I'm not sure ELO is really the best default ranking system when full pairings are easiest to get. I suppose we'll see once your server gets to full pairings, but I'm strongly suspecting there will be some ranking deviations from the APS ranking, which I think is hard to argue is in any way biased. --Rednaxela 13:26, 26 September 2008 (UTC)
- I have doubts as well, but I wouldn't have known until I tried it. My major objection against Elo is the lack of a clear, published implementation. It was easier to implement Glicko than to sort through the RR server code. If someone can clarify this for me, sure I'll try it out. Why not? -- Darkcanuck 15:25, 26 September 2008 (UTC)
Contents
Bravo
I just want to leave a note saying you're awesome. :) It's really nice having someone put effort into improving the rumble itself. Good work! --Simonton 03:27, 11 October 2008 (UTC)
Oh, and FNL, if you're reading this, that goes double for you :). --Simonton 03:30, 11 October 2008 (UTC)
Style
Do you think you or I could restyle the page, some basic css could go a long way to making the page look more modern and less of an eyesore. An example of my work is here, though it wouldn't look like my page there but it will be clean (and it will validate). Currently its not even setup as a webpage. Which means it will render in quarks mode by all browsers, which is a very slow and cpu intensive rendering mode.
In fact there is alot you can do to both reduce html elements and increase rendering speed. Such as changing the <td><b> combo into just <th> tags, because thats what they are for <td> = table data, <th> = table header, with some css you can justify thier alignment, it wouldn't require much css, as such alot of css is actually undesirable in a simple page such as this, but css is perferred over tags because it is actually faster in most cases (very old or poorly designed browsers being the exception).
Chase-san 08:02, 14 October 2008 (UTC)
- I very strongly agree! It's on the roadmap, but I've focused on the data side first. The current "pages" were based on a view-source from the old server. A little css and valid xhtml would go a long way. I also want to switch to a template system (maybe Smarty or Zend?) for easier reading and better reuse -- having html mixed in with php makes for some very ugly code. If you want to style some static content and send it to me, that would be great! --Darkcanuck 15:04, 14 October 2008 (UTC)
- xhtml is very nice, while most browsers support true xhtml (except IE and Konqueror), the ones that do not, control a large enough majority where it would have to be described as text/html anyway. This mitigates the real purpose of an xhtml page but is nice to have the framework in place for when they catch up (all the work that would have to be done is switching the content type). I think using Smarty or Zend is overkill unless you plan on extending the system further and I only suggest them only if you plan on doing something like roborumble.org. They are template engines, meaning you would have to make templates for them, which jsut adds alot of extra overhead on something simple like this. Remember KIS, keep it simple. --Chase-san 21:36, 14 October 2008 (UTC)
- If you really want a really nice super-quick super-simple "template engine" I suggest you consider this. Instead of bothering with special "template" languages, you write your templates just in PHP, and all the "template engine" does is set up variables making really clean shorthand like
<?=$title;?>
all that's needed to put some variable in the template. I once tried it when hacking around and found it to be a really nice KIS approach to "template engines". Also the author put the code there in public domain, so there are no issues using it in here as we see fit. --Rednaxela 03:47, 15 October 2008 (UTC)
- If you really want a really nice super-quick super-simple "template engine" I suggest you consider this. Instead of bothering with special "template" languages, you write your templates just in PHP, and all the "template engine" does is set up variables making really clean shorthand like
- I at one point designed my own KIS template system, it was simular to others except that the content to replace was in {}, for example {title} and then for other parts I did things like <table>{row_start}<tr><td>{row_num}</td><td>{row_data}</td></tr>{row_end}</table>. All this was kept in a seperate file and required parsing, but otherwise was fairly simple that it was a template engine but also it only used half a dozen commands and you used the functions to fill in the data. I will see if I can locate it or remake it if you like the sound of using a template but still want to keep it very simple. --Chase-san 22:41, 15 October 2008 (UTC)
- Thanks for the suggestions guys, but I'm sticking to my original plan (Smarty). If the template engine ever becomes the bottleneck, then I'll look into something custom. --Darkcanuck 02:13, 16 October 2008 (UTC)
- Okay, cool. I would like to work on a template for the actual score page then, I am great at css and making it cross-compatible with other browsers (namely IE, Firefox, and Safari (I use Opera, so obviously it will work for that too)). Unlike making robots, web pages are not very time consuming. Do you have any kind of messenger we could talk (I have or can get any of them) --Chase-san 04:08, 16 October 2008 (UTC)
- Excellent! Don't use messenging much, and I'm currently travelling at the moment so email is better: jerome-at-darkcanuck-net --Darkcanuck 23:43, 16 October 2008 (UTC)
- Thanks for the suggestions guys, but I'm sticking to my original plan (Smarty). If the template engine ever becomes the bottleneck, then I'll look into something custom. --Darkcanuck 02:13, 16 October 2008 (UTC)
Team Rankings
Is it an idea to get the battles for teams from Pulsars server? I think they have no weird results, and your ranking will at least have a teamranking then. --GrubbmGait 17:57, 24 October 2008 (UTC)
- Good idea! I'll grab the battle file, but need to figure out how to exclude older team versions to keep the server load down. --Darkcanuck 01:58, 25 October 2008 (UTC)
Table Sorting
Very nice things things lately! I do have a couple little gripes though. One thing, is I think it would be more natural if first click does 'highest-first' unlike how TableSorter seems to operate by default. Secondly... ugh... it's so damn slow to sort. Even on my fairly modern system there's a very ugly delay when sorting the table (a 20 year old machine could sort the data faster with static code probably... not everyone uses Google Chrome or an experimental FF build) and I imagine this would become a very annoying delay on anything older. Not only is the JS sorting slower than server-side but there's no indication of it processing/loading which irks me a little. Perhaps if the JS sorting is stuck there should be a little line or two of code to makebo a 'loading...' indicator of some sort? In any case, great work lately! --Rednaxela 21:39, 26 October 2008 (UTC)
The problem with javascript when sorting big tables is not the sorting in itself but the big number of DOM document chand hges when you generate the resulting table HTML. I'm currently developing a small javascript application at work that sorts a table of around 500 entries pretty much instantaneously. It only shows the top 5 entries as a table (similar to DC targeting, curiously :)), if I generate the 500 rows it becomes very slow. --ABC 23:14, 26 October 2008 (UTC)
After some reading, I found that apparently TableSorter's slowest part, is how it READS the data from the DOM every time you sort. Perhaps a more efficent method would be to send the data in both HTML form and JSON form, and let the script change the order of the rows in the DOM based on the data efficently parsed from the JSON and stored in the JS memory. I think that model would have the fewest DOM operations and thus be the most efficent way to do client-side sorting. On a related but diverging note... once at that point, it might not be that much more work to do 'live' score updates... (which would also reduce bandwidth use in the face of mad-refereshers). I may be tempted to try and code such a fancy efficent-sorting live-updating score view some time... --Rednaxela 00:19, 27 October 2008 (UTC)
Well, it's faster than re-requesting the page, which the old sort did. :) But if you find a way to speed it up, I'm all ears -- javascript is pretty new to me. I don't like the default sort order either, but there didn't seem to be an option to start with a descending sort. The Glicko columns are also a little weird due to the RD value in brackets. I'm not sure I follow the bit about "live updates" though, the current pages are as live as you can get. Scores are updated every time a new result is uploaded. --Darkcanuck 05:47, 27 October 2008 (UTC)
Actually, I'm finding it very distinctly slower than re-requesting the whole page (of course my campus internet here is pretty damn fast). Well, I think it certainly be sped up by methods like I said above with sending the data in JSON form and keeping in JS memory, though it would likely involve using our own code instead of TableSorter (or mangling TableSorter considerably beyond recognition). And what I mean by "live updates", would be using "AJAX" stuff to every minute or so ask the server if there have been any more recent updates, the server sends any in JSON form and the results page gets updated without refreshing. --Rednaxela 12:35, 27 October 2008 (UTC)
Contributors
Just another small idea, can you distinguish the contributors per month? Long time ago, late 2004 I think, we had a sortof ranking of contributors when rebuilding the rankins after a servercrash. (He, sounds familiar . . ) This way 'new' contributors see their names without the need to scroll way down. Also: every ranking is a competition ;) --GrubbmGait 19:00, 28 October 2008 (UTC)
- Are you saying you don't like my score of 410,000+? ;) (melee is the key to high numbers, btw) Good idea to split the numbers out in more detail. I guess I could add some more columns to the users table to make some rolling counts. What interval would be best: once per day to keep a 30-day window, or start fresh every month to make a new competition? --Darkcanuck 05:40, 29 October 2008 (UTC)
- Well I personally think "once per day to keep a 30-day window" would be best for being a more meaningful and current reflection of things, but starting fresh each month would be best if we want to have something like a 'monthly rumble contributor award'. Of the tradeoff, I'm leaning to the former myself. Of course, if we really wanted we could just track both :) --Rednaxela 14:53, 29 October 2008 (UTC)
Ok, we now have current month and last-30-days upload rankings, split by game type. I've tried to scale the melee numbers to match the actual number of battles run (45 pairings uploaded per battle?). Hopefully someone will start to submit team battles (can't get my client to work). --Darkcanuck 23:51, 30 November 2008 (UTC)
Partecipant list
Can you please create a mirror of the official participant's list on your server (updated automatically)? That's good if the official page is off-line, like now ^-^ --Lestofante 22:05, 1 Dic 2008 (UTC)
- Try this: http://darkcanuck.net/rumble/particip1v1.txt . I just uploaded my copy to the server and added the 'pre' tags the rumble client is looking for. Once the old wiki comes back I'll try mirroring all of the participant lists -- shouldn't be difficult, just a daily 'wget'... --Darkcanuck 03:35, 2 December 2008 (UTC)
- Thank, now my client work. Here the modify: PARTICIPANTSURL=http://darkcanuck.net/rumble/particip1v1.txt. For the mirroring system just don't use only a wget but use a little script that control the integrity of the list.--lestofante 09:37, 2 December 2008 (UTC)
Survival
One thought is, now that removing the Glicko-1 column has cleared up a little space... maybe those survival percents that are in the details pages could be included? I think it would be nice to be able to easily see what bots are strong survivalists ;-) --Rednaxela 23:42, 2 December 2008 (UTC)
- Too easy! :) --Darkcanuck 04:53, 3 December 2008 (UTC)
- Nice. Now just to wait for all the bots to return so I can see how good 'RougeDC survival' really ranks in that... :) --Rednaxela 05:05, 3 December 2008 (UTC)
- It could be a long wait -- reactivation is just as slow as removal. But at least clients won't be fighting over the two. --Darkcanuck 05:13, 3 December 2008 (UTC)
- Aye, but at least based on the rate at which my client is currently uploading bots of which some need to be reactivated, I think there's a good chance it may be back to normal in less than 12 hours from now. --Rednaxela 05:29, 3 December 2008 (UTC)
"Suspicious Battle List"
One thought I had is that bad 0 scores could be filtered by taking a look at the expected score, and discarding 0 results where it they seem unreasonable. Of course, an alternative to automatic rejection, would be making a "suspicious battle list" page that could be watched for manually initialting removals. I would imagine it would take no more than a single SQL statement of moderate complexity to list suspicious uploads. --Rednaxela 06:25, 29 December 2008 (UTC)
Neat idea. Although bots which throw the occasional exception may get a lower than expected score once in a while. Rather than run a query against the battles table, the server could flag battles as they're submitted if the score deviates too far from the expected value. What do you think a good range would be, considering some bots have very high PBI's? --Darkcanuck 06:48, 29 December 2008 (UTC)
Well I think running a query against the battles table is necessary due to the number of bad results that are already in the server, which I'd consider quite important to fix and manually searching for all of them would be time intensive. As far as what kind of deviation? Well because of such high PBI cases I'd say something roughly like the following would be good. Flag them if: 1) The battle deviates from the Glicko-2 expected result by more than 50, or 2) The battle deviates from any results submitted by *other* clients by more than 30, or 3) The score is exactly 0 when the expected score is anything greater than 20
Of course, I strongly believe we can't get a really strong idea of exactly what thresholds are good until we to do some queries on the battles database to determine what level of sensetivity is most correct. --Rednaxela 07:07, 29 December 2008 (UTC)
Source code
Can I've your server source, please? I've write PHP and MySQL for over 3 years now and I've palnned to create new Thai RoboRumble for my country! Hope you'll give it to me. You can email me at email found on my user page. » Nat | Talk » 09:20, 10 February 2009 (UTC)
Rumble ideas
Hi! I'm very thankful to you for doing the new engine. I was thinking about brand new femto and haiku rumbles. What do you think about it? In my opinion it'd be fantastic if there were these kind of categories. They seem to be really cool and exiting, but unfortunately, there isn't any ranking or challenge for them. Femto can't be hard to implement, maybe haiku is a harder task. I imagined new categories for them with new participants list for them, but I can imagine that the actual bots can have this kind of rank, but then it would lose its importance. --HUNRobar 17:55, 14 February 2009 (UTC)
In femto battle, we really need to modify RR@H client. For Haiku, I think not. It require human to detect how many lines are there. But, wait a minute, I'm now creating my new Rumble Server which support many old rumble ideas and these rumble! That why I want his source code above. If you want to test your haiku bot or femto bot, you can see old ranking and bot in robocode little league by Kawigi. » Nat | Talk » 01:48, 15 February 2009 (UTC)
Valid versions
By the way Darkcanuck, just to let you know:
- I'm quite sure 1.6.1.4 (NOT plain 1.6.1) is at least as rumble-stable as 1.6.0 is, and is better because it fixed how ITERATE was broken.
- Also, I'm pretty sure EVERY single version from 1.6.2 to 1.7.1 Beta 2 have been bad for rumble.
- 1.7.1 Final look like they're probably good for rumble except for:
--Rednaxela 07:34, 3 April 2009 (UTC)
- Agree. I really like 1.7.1, even in alpha version, compare to 1.7.0.2, which have a ton of bugs =D I'm figuring out what behind SandboxDT and sure that Fnl is fixing another bug so expected 1.7.1.1 with better rumble :-) » Nat | Talk » 16:07, 3 April 2009 (UTC)
Ok, I can add 1.6.1.4 to the list -- but it won't matter much since that client won't report it's version number either. Nice summary though. (and you know I filed that 1.7.1 melee bug right?) Anyone interested in patching the 1.5.4/1.6.0/1.6.1.4 rumble jar(s) with the version check from 1.6.2? --Darkcanuck 15:40, 3 April 2009 (UTC)
- Either user won't download patched version. I'll try to do. Just getting head spinning around checking out from robocode svn :-) » Nat | Talk » 16:07, 3 April 2009 (UTC)
I see you patched your client already. Just few suggestion, you can (mostly) detect from user suffix. Most user suffix their name with version (except deewiant) so you can check with that. » Nat | Talk » 19:11, 5 April 2009 (UTC)
- Yes, but I like guarantees that the right version is being used. :) I've patched both 1.5.4 and 1.6.1.4 to report the client version and I'll post the new jars later today. Once rumble users have switched, then I can turn off the workaround for older clients. --Darkcanuck 22:08, 5 April 2009 (UTC)
I've been using 1.6.0 for the rumble, for what I understand I should install 1.6.1.4 (actually the version I develop with) and replace with the patched jar? Has it been tested, even a little bit, to ensure no side effects, or should I set upload to not for a while? --zyx 03:52, 6 April 2009 (UTC)
- I've tested both jars on my system and they seem to be fine. If you want to stick with 1.6.0 I can patch it tomorrow -- I got lazy and only did "ol' reliable" (1.5.4) and the latest stable version. Right now I'm using 1.6.1.4 myself, although I can't get that one to work on my Mac. --Darkcanuck 04:20, 6 April 2009 (UTC)
I tested 1.6.1.4 a fair bit and for a period of time it was what I was using for rumble. And also, like I note above, that version fixes the ITERATE option which has been broken for a long time (it still ran with ITERATE=YES in older versions, but it didn't choose the best bots properly after the first iteration). --Rednaxela 04:17, 6 April 2009 (UTC)
Rating
If in melee, both Bot A and Bot B had "0 survival", Bot A AND Bot B get "0% survival" against each other. is it 0 because 0 survival = 0% survival against the other bot, regardless of what the other bot got? or is it because of a 0 / 0 thing? --Starrynte 00:40, 4 April 2009 (UTC)
In a melee battle, if two bots have 0 survival then when the server tries to calculate the survival % for that pairing it becomes 0 / (0+0) for bot A, same for B. The divide by zero protection simply assigns 0 scores to both, although I suppose technically they should each get 50%. --Darkcanuck 05:08, 4 April 2009 (UTC)
Team Rumble
What the heck going in team rumble? What does melee bot from? » Nat | Talk » 20:26, 5 April 2009 (UTC)
Ugh, thanks for pointing this out! That's exactly the sort of thing the patched clients will prevent: they also report the MELEE and TEAMS settings from the properties file. Now if only we can get everyone to adopt them (only 6 uploaders active this month, shouldn't be too hard?) ... --Darkcanuck 03:16, 6 April 2009 (UTC)
- [View source↑]
- [History↑]
You cannot post new threads to this discussion page because it has been protected from new threads, or you do not currently have permission to edit.
Contents
Thread title | Replies | Last modified |
---|---|---|
retiring ELO column | 6 | 15:51, 17 February 2012 |
FatalFlaw's uploads have suspicious APS for Tomcat | 0 | 04:58, 16 February 2012 |
kidmumu uploads | 3 | 17:16, 1 February 2012 |
Feature Request: average APS diff in bots compare | 6 | 15:55, 17 November 2011 |
Performance | 1 | 22:48, 13 November 2011 |
Now that everyone's ELO rating is subzero in General 1v1 =), is it maybe time to retire it altogether?
I'm all for it =) Although, doesn't the LRP depend on ELO data? Maybe shift that over to Glicko data instead? And if there was some way to make the LRP show the 'expected' option by default... that would make my day =)
I'd also support removal of ELO from the rumble, and replacing it with Glicko or Glicko2 in the places that use it (LRP).
Elo is working fine, even with negative scores, but keeping both Elo and Glicko-2 is redundant. So, removing one of them is fine by me.
FatalFlaw's uploads have suspicious APS for Tomcat:
- lxx.Tomcat 3.55 VS voidious.mini.Komarious 1.88
- lxx.Tomcat 3.55 VS "baal.nano.N 1.42
- lxx.Tomcat 3.55 VS gf.Centaur.Centaur 0.6.7
Darkcanuck, can you rollback all his uploads?
I haven't had a chance to check if this could affect mn.Combat, but my #1 guess would be that perhaps it's a java version issue (i.e. kidmumu is using Java 5 and Combat requires Java 6?).
Failing that, I'd have to think that kidmumu's client may be skipping turns.
Probably a Java version issue. I´ll downgrade to 1.5 in future versions. But didn´t check other bots scores.
I'm sure there are lots of bots that require Java 6, right? We might want to have Darkcanuck rollback all his uploads until we can get kidmumu onto Java 6.
I find that until all pairings is done it's very useful to know current avarage difference in APS between two versions - after about 100 random battles this number says fairly exactly is newer version better, than older.
Darkcanuck, can you schedule to add row for columns "% Score", "% Survival" in section "+/- Difference" in bots compare page with avarage value of corresponds columns? I think, there're work for 1-2 hours maximum
I think this is already covered by the 'Common % Score (APS)' and 'Common % Survival', the lowest two lines in the top-table. At least I use it to check if my changes have a positive (or negative) result when the pairings are not complete yet.
No. May be i wrote not clear.
I mean, that i want to know average difference in pairings between 2 versions. According to my tests, this number stabilizes mach faster, than APS. And more, Common % Score does not make sense, because while there only 1 battle in every pairing it's exactly equals to APS and in another case, there may be 10 battles against Walls and 1 battle against Druss.
As far as I know, when your new version has for example 100 pairings, you will see the average APS for that 100 pairings. AND for your older version you will also see the APS for that 100 pairings. And you are right, this indicates much more reliable what your final score will be (relative to your older version) than plain APS. The one who can really answer this question is Darkcanuck.
The common %score is calculated just like APS, but only for pairings that the old and new versions have in common. That makes it easier to compare two versions when the new one is still missing many pairings, or in the case where the old bot may have pairings against a lot of retired bots (and may be missing scores vs newer bots). I think that's what you're looking for...
Yes, thank you:)
Can you turn on .htaccess browser caching for results.
#Caching ExpiresActive on ExpiresByType image/gif "access plus 1 year"
Other performance enhancing things you can do are: Set specific size for the images, inline or via CSS. CSS would be easier. This would speed up the page loading and be also less annoying while all the requests are going through (having default sized images deforming the table before they load). Minify the HTML/CSS/JS (less to send).
Not doing/doable for known reasons: Serve identical files from the same url. (flag images)