Rerun of Pairings

Jump to navigation Jump to search
Revision as of 2 April 2013 at 17:18.
The highlighted comment was edited in this revision. [diff]

Rerun of Pairings

I see quite a few robots that haven't changed are re-running pairings today as if they had new versions. Any idea why that is?

    Skotty16:23, 30 March 2013

    Looking into it myself. From what I can tell a bunch of battles didn't load into the Batch Rankings, so it assumed that they didn't exist and pulled them from the participants scores list. They've been slowly added back by clients over the last few hours.

    I've removed the section of code that removed the battles from the Batch Rankings, but that is just putting a bandaid over the problem. I'll have to look deeper to see what caused them not to load in the first place.

      Skilgannon16:30, 30 March 2013
       

      Looks like half of General 1v1 has incomplete pairings now. Should I put my clients into overdrive to fix it, or am I making it worse by running clients because of some bug?

        Voidious18:45, 31 March 2013
         

        Running a bunch of clients isn't going to make it any worse, from what I can tell it was a once-off problem to do with the backend instance being unable to load data. I've removed the mechanism it used to remove the bots, but I'm still not sure (and may never know) why it happened.

        I've also just identified a bottleneck/threadlock which will severely limit the ability to upload from multiple clients at once without increasing upload latency to where it will spawn new server instances and cause my quota to be hit again, but I have a fix for that which I'll implement and test tomorrow. The load right now seems pretty healthy though, I see in the logs uploads from you, MN and Wompi, thanks guys. I'll let you know when you can unleash the full power of your machine(s) =).

          Skilgannon21:54, 31 March 2013
           

          When a competitor is removed from the rumble, is it´s data also removed?

            MN21:17, 1 April 2013
             

            It's data isn't removed, so you can still see it in the BotDetails, but the pairings info in the other competitors which points to it is removed. Otherwise over many versions the access to other bots will get slower and slower due to increased serialising costs.

              Skilgannon21:21, 1 April 2013
               

              Keeping pairing data for a while can help protect the database against faults in clients removing competitors from the rumble, only to be re-added again some time later.

                MN21:38, 1 April 2013
                 

                That sounds reasonable, yes. Perhaps adding a 30 day error window, so only if the last battle was more than 30 days ago the pairing data in the 'alive' bot gets purged. Until then it is just marked as 'removed'. I think this purging and checking will have to happen in the backend, because the frontends are fully loaded right now with your and Voidious's uploads.

                  Skilgannon21:48, 1 April 2013
                   

                  That is exactly what I had in mind.

                    MN21:53, 1 April 2013
                     

                    The number of bots with not full pairings has gone up - we were under 400 yesterday and back up to 471 now. I noticed an over quota message from last night, was there another loss of data?

                    Let me know if I should dial back my clients or if there's anything else I can do.

                      Voidious17:53, 2 April 2013
                       

                      The source of the problem has to be tracked down or the rumble will never stabilize.

                      I guessed it was the excludes feature from the clients erasing pairing data in the server. But looks like it is something else.

                        MN18:08, 2 April 2013
                         

                        I suggest tons of logging, tracing all requests from clients.

                          MN18:10, 2 April 2013
                           

                          Sorry guys, I was trying to see if I could use the marshal module to do my serialisation instead of cPickle because my local testing showed it is about 50% faster, but it corrupted a few bots from each pairings dict so I quickly changed it back. I'm not sure why it had these issues since I tested locally on the dev server and it worked fine, but anyway it is fixed now, and was a completely different issue to what happened before.

                          It did hit the quota last night, so perhaps tone down the clients a little. There's a threshold below which it is cheap to run, but as the load increases I start leaving the free quota for the instances as well (not just database writes), which gets expensive much more quickly.

                            Skilgannon18:58, 2 April 2013