Rerun of Pairings

Jump to navigation Jump to search

Rerun of Pairings

I see quite a few robots that haven't changed are re-running pairings today as if they had new versions. Any idea why that is?

Skotty16:23, 30 March 2013

Looking into it myself. From what I can tell a bunch of battles didn't load into the Batch Rankings, so it assumed that they didn't exist and pulled them from the participants scores list. They've been slowly added back by clients over the last few hours.

I've removed the section of code that removed the battles from the Batch Rankings, but that is just putting a bandaid over the problem. I'll have to look deeper to see what caused them not to load in the first place.

Skilgannon16:30, 30 March 2013
 

Looks like half of General 1v1 has incomplete pairings now. Should I put my clients into overdrive to fix it, or am I making it worse by running clients because of some bug?

Voidious18:45, 31 March 2013
 

Running a bunch of clients isn't going to make it any worse, from what I can tell it was a once-off problem to do with the backend instance being unable to load data. I've removed the mechanism it used to remove the bots, but I'm still not sure (and may never know) why it happened.

I've also just identified a bottleneck/threadlock which will severely limit the ability to upload from multiple clients at once without increasing upload latency to where it will spawn new server instances and cause my quota to be hit again, but I have a fix for that which I'll implement and test tomorrow. The load right now seems pretty healthy though, I see in the logs uploads from you, MN and Wompi, thanks guys. I'll let you know when you can unleash the full power of your machine(s) =).

Skilgannon21:54, 31 March 2013
 

When a competitor is removed from the rumble, is it´s data also removed?

MN21:17, 1 April 2013
 

It's data isn't removed, so you can still see it in the BotDetails, but the pairings info in the other competitors which points to it is removed. Otherwise over many versions the access to other bots will get slower and slower due to increased serialising costs.

Skilgannon21:21, 1 April 2013
 

Keeping pairing data for a while can help protect the database against faults in clients removing competitors from the rumble, only to be re-added again some time later.

MN21:38, 1 April 2013
 

That sounds reasonable, yes. Perhaps adding a 30 day error window, so only if the last battle was more than 30 days ago the pairing data in the 'alive' bot gets purged. Until then it is just marked as 'removed'. I think this purging and checking will have to happen in the backend, because the frontends are fully loaded right now with your and Voidious's uploads.

Skilgannon21:48, 1 April 2013
 

That is exactly what I had in mind.

MN21:53, 1 April 2013
 

The number of bots with not full pairings has gone up - we were under 400 yesterday and back up to 471 now. I noticed an over quota message from last night, was there another loss of data?

Let me know if I should dial back my clients or if there's anything else I can do.

Voidious17:53, 2 April 2013
 

The source of the problem has to be tracked down or the rumble will never stabilize.

I guessed it was the excludes feature from the clients erasing pairing data in the server. But looks like it is something else.

MN18:08, 2 April 2013
 

I suggest tons of logging, tracing all requests from clients.

MN18:10, 2 April 2013
 

Sorry guys, I was trying to see if I could use the marshal module to do my serialisation instead of cPickle because my local testing showed it is about 50% faster, but it corrupted a few bots from each pairings dict so I quickly changed it back. I'm not sure why it had these issues since I tested locally on the dev server and it worked fine, but anyway it is fixed now, and was a completely different issue to what happened before.

It did hit the quota last night, so perhaps tone down the clients a little. There's a threshold below which it is cheap to run, but as the load increases I start leaving the free quota for the instances as well (not just database writes), which gets expensive much more quickly.

Skilgannon18:58, 2 April 2013
 

Took my clients from 4 down to 2.

Can you protect against us overloading your server? Both to avoid hitting quota, and to avoid someone DDoSing your bank account :-), it seems like it would be good to have some throttling or something in place.

Refilling the pairings is going to take a while. Is it possible to tune things (for now) to support a higher client load, to prioritize overall throughput over losing pairings here and there, while consuming less quota?

Voidious20:02, 2 April 2013
 

I've been trying to think of a good way to do that, but the 'recommended way' using Task Queues (which I can then limit to 3-4 queries a second instead of the 5-6 I was getting yesterday) will break any reasonable way of having priority battles.

Also, there is no way to programatically retrieve the current quota usage stats, which means I can't do any auto-throttling.

I can tune it not to do database writes unless a bot has x or more pending battles, which is how I did it previously when on free tier, but the majority of the time is actually taken up with (de)serialising the pairings data, which is why I was trying to shoehorn in marshal. I'll add a min pending battles limit, and you can turn those clients back on, we'll see what happens. Of course, it will probably only hit quota tomorrow night if it's an issue, since today has been pretty slow.

Skilgannon21:12, 2 April 2013
 

I´m thinking in building a custom client which groups results from all local clients and uploads them in a single thread, so the server needs only a single instance per user to receive data.

Combined with multi-threading, clients can keep running battles in parallel while a single thread uploads everything, making it faster than the current client, while at the same time consuming less server resources.

MN21:14, 2 April 2013
 

That would be great. I'm not sure how you'd do priority battles though, would you have a local queue which would be filled, and you just take from there? I guess I could sort-of do this with task queues, but it wouldn't be very pretty.

Skilgannon21:22, 2 April 2013
 

Priority battles are downloaded by the uploader thread after each pairing is uploaded. They would be sent to a queue, which would be consumed by the clients.

Battle results would be sent to another queue, which would be consumed by the uploader thread.

That is the basic idea. You can add some logic inside the queues to make them smarter, like dealing with duplicated battles, excessive amount of data, or lack of data and fallback to random battles.

MN21:35, 2 April 2013
 

I've essentially implemented what you've said here but on the server side using a Task Queue, the only thing we lost was on-the-fly updated battle numbers, but those aren't really being used now that we have priority battles. Also, priority battles are delayed by up to 100 pairings per rumble, but this new design should mean that stuff sticks around in local memory longer than before.

Once I add contributor stats I'll also add information about the current amount of queue backlog, so people can decide whether or not to run a client.

If you check your clients you can see that the uploads are going much quicker, and it tells you it is adding it to a queue instead =)

Skilgannon11:03, 4 April 2013
 

Yay, back to full pairings in General 1v1!

Voidious16:03, 6 April 2013