Rerun of Pairings
I see quite a few robots that haven't changed are re-running pairings today as if they had new versions. Any idea why that is?
Looking into it myself. From what I can tell a bunch of battles didn't load into the Batch Rankings, so it assumed that they didn't exist and pulled them from the participants scores list. They've been slowly added back by clients over the last few hours.
I've removed the section of code that removed the battles from the Batch Rankings, but that is just putting a bandaid over the problem. I'll have to look deeper to see what caused them not to load in the first place.
Looks like half of General 1v1 has incomplete pairings now. Should I put my clients into overdrive to fix it, or am I making it worse by running clients because of some bug?
Running a bunch of clients isn't going to make it any worse, from what I can tell it was a once-off problem to do with the backend instance being unable to load data. I've removed the mechanism it used to remove the bots, but I'm still not sure (and may never know) why it happened.
I've also just identified a bottleneck/threadlock which will severely limit the ability to upload from multiple clients at once without increasing upload latency to where it will spawn new server instances and cause my quota to be hit again, but I have a fix for that which I'll implement and test tomorrow. The load right now seems pretty healthy though, I see in the logs uploads from you, MN and Wompi, thanks guys. I'll let you know when you can unleash the full power of your machine(s) =).
It's data isn't removed, so you can still see it in the BotDetails, but the pairings info in the other competitors which points to it is removed. Otherwise over many versions the access to other bots will get slower and slower due to increased serialising costs.
Keeping pairing data for a while can help protect the database against faults in clients removing competitors from the rumble, only to be re-added again some time later.
That sounds reasonable, yes. Perhaps adding a 30 day error window, so only if the last battle was more than 30 days ago the pairing data in the 'alive' bot gets purged. Until then it is just marked as 'removed'. I think this purging and checking will have to happen in the backend, because the frontends are fully loaded right now with your and Voidious's uploads.
The number of bots with not full pairings has gone up - we were under 400 yesterday and back up to 471 now. I noticed an over quota message from last night, was there another loss of data?
Let me know if I should dial back my clients or if there's anything else I can do.
The source of the problem has to be tracked down or the rumble will never stabilize.
I guessed it was the excludes feature from the clients erasing pairing data in the server. But looks like it is something else.
Sorry guys, I was trying to see if I could use the marshal
module to do my serialisation instead of cPickle
because my local testing showed it is about 50% faster, but it corrupted a few bots from each pairings dict so I quickly changed it back. I'm not sure why it had these issues since I tested locally on the dev server and it worked fine, but anyway it is fixed now, and was a completely different issue to what happened before.
It did hit the quota last night, so perhaps tone down the clients a little. There's a threshold below which it is cheap to run, but as the load increases I start leaving the free quota for the instances as well (not just database writes), which gets expensive much more quickly.