Rerun of Pairings
I see quite a few robots that haven't changed are re-running pairings today as if they had new versions. Any idea why that is?
Looking into it myself. From what I can tell a bunch of battles didn't load into the Batch Rankings, so it assumed that they didn't exist and pulled them from the participants scores list. They've been slowly added back by clients over the last few hours.
I've removed the section of code that removed the battles from the Batch Rankings, but that is just putting a bandaid over the problem. I'll have to look deeper to see what caused them not to load in the first place.
Looks like half of General 1v1 has incomplete pairings now. Should I put my clients into overdrive to fix it, or am I making it worse by running clients because of some bug?
Running a bunch of clients isn't going to make it any worse, from what I can tell it was a once-off problem to do with the backend instance being unable to load data. I've removed the mechanism it used to remove the bots, but I'm still not sure (and may never know) why it happened.
I've also just identified a bottleneck/threadlock which will severely limit the ability to upload from multiple clients at once without increasing upload latency to where it will spawn new server instances and cause my quota to be hit again, but I have a fix for that which I'll implement and test tomorrow. The load right now seems pretty healthy though, I see in the logs uploads from you, MN and Wompi, thanks guys. I'll let you know when you can unleash the full power of your machine(s) =).
It's data isn't removed, so you can still see it in the BotDetails, but the pairings info in the other competitors which points to it is removed. Otherwise over many versions the access to other bots will get slower and slower due to increased serialising costs.
Keeping pairing data for a while can help protect the database against faults in clients removing competitors from the rumble, only to be re-added again some time later.
You do not have permission to edit this page, for the following reasons:
You can view and copy the source of this page.
Return to Thread:Talk:LiteRumble/Rerun of Pairings/reply (7).
The number of bots with not full pairings has gone up - we were under 400 yesterday and back up to 471 now. I noticed an over quota message from last night, was there another loss of data?
Let me know if I should dial back my clients or if there's anything else I can do.
The source of the problem has to be tracked down or the rumble will never stabilize.
I guessed it was the excludes feature from the clients erasing pairing data in the server. But looks like it is something else.
Sorry guys, I was trying to see if I could use the marshal
module to do my serialisation instead of cPickle
because my local testing showed it is about 50% faster, but it corrupted a few bots from each pairings dict so I quickly changed it back. I'm not sure why it had these issues since I tested locally on the dev server and it worked fine, but anyway it is fixed now, and was a completely different issue to what happened before.
It did hit the quota last night, so perhaps tone down the clients a little. There's a threshold below which it is cheap to run, but as the load increases I start leaving the free quota for the instances as well (not just database writes), which gets expensive much more quickly.
Took my clients from 4 down to 2.
Can you protect against us overloading your server? Both to avoid hitting quota, and to avoid someone DDoSing your bank account :-), it seems like it would be good to have some throttling or something in place.
Refilling the pairings is going to take a while. Is it possible to tune things (for now) to support a higher client load, to prioritize overall throughput over losing pairings here and there, while consuming less quota?
I've been trying to think of a good way to do that, but the 'recommended way' using Task Queues (which I can then limit to 3-4 queries a second instead of the 5-6 I was getting yesterday) will break any reasonable way of having priority battles.
Also, there is no way to programatically retrieve the current quota usage stats, which means I can't do any auto-throttling.
I can tune it not to do database writes unless a bot has x or more pending battles, which is how I did it previously when on free tier, but the majority of the time is actually taken up with (de)serialising the pairings data, which is why I was trying to shoehorn in marshal
. I'll add a min pending battles limit, and you can turn those clients back on, we'll see what happens. Of course, it will probably only hit quota tomorrow night if it's an issue, since today has been pretty slow.
I´m thinking in building a custom client which groups results from all local clients and uploads them in a single thread, so the server needs only a single instance per user to receive data.
Combined with multi-threading, clients can keep running battles in parallel while a single thread uploads everything, making it faster than the current client, while at the same time consuming less server resources.
That would be great. I'm not sure how you'd do priority battles though, would you have a local queue which would be filled, and you just take from there? I guess I could sort-of do this with task queues, but it wouldn't be very pretty.
Priority battles are downloaded by the uploader thread after each pairing is uploaded. They would be sent to a queue, which would be consumed by the clients.
Battle results would be sent to another queue, which would be consumed by the uploader thread.
That is the basic idea. You can add some logic inside the queues to make them smarter, like dealing with duplicated battles, excessive amount of data, or lack of data and fallback to random battles.
I've essentially implemented what you've said here but on the server side using a Task Queue, the only thing we lost was on-the-fly updated battle numbers, but those aren't really being used now that we have priority battles. Also, priority battles are delayed by up to 100 pairings per rumble, but this new design should mean that stuff sticks around in local memory longer than before.
Once I add contributor stats I'll also add information about the current amount of queue backlog, so people can decide whether or not to run a client.
If you check your clients you can see that the uploads are going much quicker, and it tells you it is adding it to a queue instead =)