Rerun of Pairings
The highlighted comment was created in this revision.
I see quite a few robots that haven't changed are re-running pairings today as if they had new versions. Any idea why that is?
Looking into it myself. From what I can tell a bunch of battles didn't load into the Batch Rankings, so it assumed that they didn't exist and pulled them from the participants scores list. They've been slowly added back by clients over the last few hours.
I've removed the section of code that removed the battles from the Batch Rankings, but that is just putting a bandaid over the problem. I'll have to look deeper to see what caused them not to load in the first place.
Looks like half of General 1v1 has incomplete pairings now. Should I put my clients into overdrive to fix it, or am I making it worse by running clients because of some bug?
Running a bunch of clients isn't going to make it any worse, from what I can tell it was a once-off problem to do with the backend instance being unable to load data. I've removed the mechanism it used to remove the bots, but I'm still not sure (and may never know) why it happened.
I've also just identified a bottleneck/threadlock which will severely limit the ability to upload from multiple clients at once without increasing upload latency to where it will spawn new server instances and cause my quota to be hit again, but I have a fix for that which I'll implement and test tomorrow. The load right now seems pretty healthy though, I see in the logs uploads from you, MN and Wompi, thanks guys. I'll let you know when you can unleash the full power of your machine(s) =).
It's data isn't removed, so you can still see it in the BotDetails, but the pairings info in the other competitors which points to it is removed. Otherwise over many versions the access to other bots will get slower and slower due to increased serialising costs.
Keeping pairing data for a while can help protect the database against faults in clients removing competitors from the rumble, only to be re-added again some time later.
That sounds reasonable, yes. Perhaps adding a 30 day error window, so only if the last battle was more than 30 days ago the pairing data in the 'alive' bot gets purged. Until then it is just marked as 'removed'. I think this purging and checking will have to happen in the backend, because the frontends are fully loaded right now with your and Voidious's uploads.
The number of bots with not full pairings has gone up - we were under 400 yesterday and back up to 471 now. I noticed an over quota message from last night, was there another loss of data?
Let me know if I should dial back my clients or if there's anything else I can do.
The source of the problem has to be tracked down or the rumble will never stabilize.
I guessed it was the excludes feature from the clients erasing pairing data in the server. But looks like it is something else.
Sorry guys, I was trying to see if I could use the marshal
module to do my serialisation instead of cPickle
because my local testing showed it is about 50% faster, but it corrupted a few bots from each pairings dict so I quickly changed it back. I'm not sure why it had these issues since I tested locally on the dev server and it worked fine, but anyway it is fixed now, and was a completely different issue to what happened before.
It did hit the quota last night, so perhaps tone down the clients a little. There's a threshold below which it is cheap to run, but as the load increases I start leaving the free quota for the instances as well (not just database writes), which gets expensive much more quickly.
Took my clients from 4 down to 2.
Can you protect against us overloading your server? Both to avoid hitting quota, and to avoid someone DDoSing your bank account :-), it seems like it would be good to have some throttling or something in place.
Refilling the pairings is going to take a while. Is it possible to tune things (for now) to support a higher client load, to prioritize overall throughput over losing pairings here and there, while consuming less quota?
I've been trying to think of a good way to do that, but the 'recommended way' using Task Queues (which I can then limit to 3-4 queries a second instead of the 5-6 I was getting yesterday) will break any reasonable way of having priority battles.
Also, there is no way to programatically retrieve the current quota usage stats, which means I can't do any auto-throttling.
I can tune it not to do database writes unless a bot has x or more pending battles, which is how I did it previously when on free tier, but the majority of the time is actually taken up with (de)serialising the pairings data, which is why I was trying to shoehorn in marshal
. I'll add a min pending battles limit, and you can turn those clients back on, we'll see what happens. Of course, it will probably only hit quota tomorrow night if it's an issue, since today has been pretty slow.
I´m thinking in building a custom client which groups results from all local clients and upload them in a single thread, so the server needs only a single instance per user to receive data.
Combined with multi-threading, clients can keep running battles in parallel while a single thread uploads everything, making it faster than the current client, while at the same time consuming less server resources.