Varying NUMBATTLES of RoborumbleAtHome?
The highlighted comment was created in this revision.
Recently, I noticed that more than half of the battles are dropped as queue is full — however, this won't happen even if I wait a few minutes. Seems that all the rumble clients are uploading battles periodically, and that upload is pretty concentrated — e.g. All four clients of mine upload ~200 battles within ~3 minutes, which makes the queue get full immediately. And If I take a look at literumble/statistics, I can see that there are 5 to 7 clients uploading within 2 minutes.
It generally takes a client about 15 min to finish 50 battles, but if we vary this to primes, the uploads will get evenly distributed, reducing the high concurrent which causes a lot of dropped battles.
Reducing NUMBATTLES would probably help here too. It would also reduce the delay which is the main cause of duplicated pairings for new bots being entered. Maybe a NUMBATTLES of 20 in the main rumble would be good enough to solve the client component of this.
However, I think one of the main causes of the full queue is the batch processing for Vote/NPP/KNNPBI, since the queue needs to paused while this is running. Because it is paused the projected processing time goes very high, and it stops accepting new uploads. I have an idea on how to tune this, it should help a bit.
However, even a NUMBATTLES of 3 can't prevent most of the battles from being dropped ;/
Seems that with 8 clients running the rumble at the same time, no attempt will help without stopping some clients.
Worth mention that I can notice dropped battles when there are 6 clients, also not frequently. Seems that with 2 more clients, the effectiveness dropped considerably?
Btw, one thing that's really interesting is that the duplicates of multiple versions can last hours. Seems that some clients are not checking participants list for hours.
Got it — maybe after the queue is paused for batch tasks and then resumed, it keeps near full as there are still much parings uploaded. Like some DoS, this decreases the ability to handle high concurrent (although the average pairings uploaded per minute is not very high, they came in during a short period of time, and get dropped)
Then I think increase the queue size a little after batch task (and then decrease to normal size slowly to make sure new uploads won't wait forever after some flood upload)
Or, we can handle uploads during pause separately — don't let them take place in normal queue, rather, store them in a separate queue (and cap it with normal uploads per minutes * pause time).