Difference between revisions of "Talk:King maker"

From Robowiki
Jump to navigation Jump to search
(accuracy in close matchups)
(robust negation reply)
Line 37: Line 37:
 
:But yes, it will punish cool algorithms which aim in increasing score far above 50%, because at the other end there is a king-maker allowing to be pushed far below 50%. But I don't know any other way to stop king-maker scenarios from happening. I didn't even try figuring out one, I just copied what is being done in other places. --[[User:MN|MN]] 04:51, 15 August 2011 (UTC)
 
:But yes, it will punish cool algorithms which aim in increasing score far above 50%, because at the other end there is a king-maker allowing to be pushed far below 50%. But I don't know any other way to stop king-maker scenarios from happening. I didn't even try figuring out one, I just copied what is being done in other places. --[[User:MN|MN]] 04:51, 15 August 2011 (UTC)
  
:: Well, yes, such scenarios can be involuntary, but consider the magnitude of the effect. There are over 800 robots in the rumble. If one or two are specialized they couldn't affect the rankings substantially. If a large number are "specialized" in the same way, then I'd view it taking advantage of a generic enough weakness that it's just as worthy as rambots hurting those who are not protected against rambots. If a large number are "specialized" in diverse ways, it should tend to average out overall. It seems to me that the sheer size of the rumble provides some amount of protection.
+
:: Well, yes, such scenarios can be involuntary, but consider the magnitude of the effect. There are over 800 robots in the rumble. If one or two are specialized they couldn't affect the rankings substantially. If a large number are "specialized" in the same way, then I'd view it taking advantage of a generic enough weakness that it's just as worthy as rambots hurting those who are not protected against rambots. If a large number are "specialized" in diverse ways, it should tend to average out overall. It seems to me that the sheer size of the rumble provides some amount of protection. --[[User:Rednaxela|Rednaxela]] 06:09, 15 August 2011 (UTC)
  
:: About precalculated data, this is an issue, and they are allowed in the rumble yes. They are however uncommon except as temporary-novelty-tests and I also doubt they impact the score much. As brief aside, if the community were to decide to get rid of the chance of pre-calculated data though, Robocode does now have the capability to "anonymize" robot names in scan data. Since we as a community have the capability to robustly negate it, I do not feel the scoring algorithm is the proper place to negate pre-calculated data impacts.
+
:::The huge differences in the main APS ranking, and Premier League or the one I offered for download tells otherwise. --[[User:MN|MN]] 14:19, 15 August 2011 (UTC)
 +
 
 +
:: About precalculated data, this is an issue, and they are allowed in the rumble yes. They are however uncommon except as temporary-novelty-tests and I also doubt they impact the score much. As brief asrobustide, if the community were to decide to get rid of the chance of pre-calculated data though, Robocode does now have the capability to "anonymize" robot names in scan data. Since we as a community have the capability to robustly negate it, I do not feel the scoring algorithm is the proper place to negate pre-calculated data impacts.
  
 
:: Then again, I am mostly thinking in terms of the main rumble. In the nano-codesize rumble, those issues of robots being over-specialized would play a greater role. --[[User:Rednaxela|Rednaxela]] 06:09, 15 August 2011 (UTC)
 
:: Then again, I am mostly thinking in terms of the main rumble. In the nano-codesize rumble, those issues of robots being over-specialized would play a greater role. --[[User:Rednaxela|Rednaxela]] 06:09, 15 August 2011 (UTC)
 +
 +
:::You touched the "community" aspect, so I´ll get very philosophical/political now. There are basically two ways to make things happen. Let people do what they want, guide them indirectly through rewards (rating system) and accept whatever comes out. Or restrict people choices (robust negation?) so they go in the way you want, they wanting it or not. I prefer the first approach. --[[User:MN|MN]] 14:19, 15 August 2011 (UTC)

Revision as of 16:31, 15 August 2011

Ah, I see what you mean now. The "king-making" references I found were to non-winners intentionally manipulating results to dictate the winner, which is obviously not the case in the RoboRumble. I'm fairly confident DrussGT has the strongest APS in every demographic of RoboRumble participants - low, mid, high-end bots, surfers, Pattern matchers, etc - so simply altering the composition of the rumble would not knock him off his throne. His strength is quite clear and not all that subjective, if you ask me. Only submitting bots with hard-coded behaviors against DrussGT could have an impact, and such a move would probably not go un-noticed and the community as a whole would intervene.

But it is true that with a drastically different RoboRumble population, say only DrussGT's worst 5 matchups =), another bot like Shadow could conceivably be called #1. And it's also reasonable if you want to view results as "a win is a win" - I personally quite like that view, and agree that the APS RoboRumble is more of a shared "challenge" than a direct competition. Though I do consider it a fair challenge, and one in which I still aspire to be #1 again some day. ;)

An important point to make in any scoring system that applies a winner-take-all view of each matchup is that we'd have to significantly alter priority battles to get accurate rankings. For close matchups, you may need 100 or more battles to determine a winner. There really is quite a lot of variance. Given that, we'd probably want to just start a separate participants list with only current and/or strong bots. Or even run a weekly tournament where each match is like best-of-99 or something.

--Voidious 03:48, 15 August 2011 (UTC)

Oh no! That "we need 9999 battles per pairing" accuracy talk again. That's why I went all the way with that as-accurate-as-possible batch algorithm. And the priority battles algorithm was already improved. And I prefer improving the rating system instead of blaming weaker competitors and kicking them out. Leave the sample bots alone! :P --MN 04:51, 15 August 2011 (UTC)
I'm not saying we need that for every pairing. But if the difference between #1 and #2 in the RoboRumble comes down to the winner of the DrussGT vs Shadow matchup, we have a problem if there were only 2-5 battles run. The ranking system you propose puts about a million times as much weight on who wins that matchup, so it better be accurate, and you need at least 100 battles in a close matchup to be reasonably sure you get the right winner. A cool new ranking system is going to be ignored if it's so unstable. --Voidious 12:48, 15 August 2011 (UTC)

First, thank you for clarifying what you were referring to with that page. Honestly though, I don't believe this is a significant problem in the rumble as it stands. Let me explain why.

Looking over things on the wikipedia link, like Voidious also notices, it appears problems of "king-making" are usually about when weaker opponents have an agenda to selectively hurt/help the score of certain other competitors. In the rumble however, I believe there are no known cases of any robots with such biases/agendas, it is certainly not widespread in any case.

Now, presuming no such dirty play is happening, where could the harm be? Well, if your high ranking bot is performing worse against low ranking bots than another high ranking bot? Is that a case of "outcomes are not dictated by a competitor's own performance"? I may be misunderstanding, but I don't believe it is, because so long as no selective biases are present, it is always possible to work to gain that same performance edge that the other high ranking bot has.

Also as far as competitive innovations, say you have a situation where one high ranking bot has an innovation that allows it to score 80-90% against rambots where most other high ranking bots reliably score in the 60-70% range. Is that not a competitive innovation in Robocode? Means of "king-maker" prevention that round things to win/tie/loss also don't value innovation of that sort, which I believe is a big shame. Is it silly that I consider such matters to be notable/interesting innovations?

Ranking methods that are more immune the low-ranking bots are certainly interesting , indeed valuable, and I believe are quite worth having in the rumble, but I don't see them as objectively better or worse. Both seem like equally valid challenges to me, neither with acute problems.

--Rednaxela 03:53, 15 August 2011 (UTC)

King-maker scenarios can also happen involuntarily. Simply make a specialist bot and it's done, you will hurt everyone it's specialized against. Or miss any bug in the implementation.
And there are also the bots with pre-calculated data:
if ("MyFavoriteRobot".equals(bot.getName())) {
    loadPreCalculatedData();
} else {
    System.out.println("Oh no!");
}
This was discussed a long time ago and allowed in the rumble. (can't find the link now)
But yes, it will punish cool algorithms which aim in increasing score far above 50%, because at the other end there is a king-maker allowing to be pushed far below 50%. But I don't know any other way to stop king-maker scenarios from happening. I didn't even try figuring out one, I just copied what is being done in other places. --MN 04:51, 15 August 2011 (UTC)
Well, yes, such scenarios can be involuntary, but consider the magnitude of the effect. There are over 800 robots in the rumble. If one or two are specialized they couldn't affect the rankings substantially. If a large number are "specialized" in the same way, then I'd view it taking advantage of a generic enough weakness that it's just as worthy as rambots hurting those who are not protected against rambots. If a large number are "specialized" in diverse ways, it should tend to average out overall. It seems to me that the sheer size of the rumble provides some amount of protection. --Rednaxela 06:09, 15 August 2011 (UTC)
The huge differences in the main APS ranking, and Premier League or the one I offered for download tells otherwise. --MN 14:19, 15 August 2011 (UTC)
About precalculated data, this is an issue, and they are allowed in the rumble yes. They are however uncommon except as temporary-novelty-tests and I also doubt they impact the score much. As brief asrobustide, if the community were to decide to get rid of the chance of pre-calculated data though, Robocode does now have the capability to "anonymize" robot names in scan data. Since we as a community have the capability to robustly negate it, I do not feel the scoring algorithm is the proper place to negate pre-calculated data impacts.
Then again, I am mostly thinking in terms of the main rumble. In the nano-codesize rumble, those issues of robots being over-specialized would play a greater role. --Rednaxela 06:09, 15 August 2011 (UTC)
You touched the "community" aspect, so I´ll get very philosophical/political now. There are basically two ways to make things happen. Let people do what they want, guide them indirectly through rewards (rating system) and accept whatever comes out. Or restrict people choices (robust negation?) so they go in the way you want, they wanting it or not. I prefer the first approach. --MN 14:19, 15 August 2011 (UTC)