September 2019 Outage Emails

From RoboWiki
Revision as of 23:15, 18 October 2019 by MultiplyByZer0 (Talk | contribs)

Jump to: navigation, search

From August 29 to October 11, 2019, RoboWiki and Old RoboWiki suffered a 44-day outage. During the downtime, there was an email discussion between RoboWiki's administrators about VPS hosting, server software, and how to move forward.

The highlights:

  • David Alves pays for and manages the VPS that RoboWiki runs. PEZ pays for and administers RoboWiki's domain name.
  • A subset of RoboWiki's administrators, of about seven people, have SSH access to the server.
  • RoboWiki runs on Ubuntu 12.04 (precise), MediaWiki 1.19.6, lighttpd, and MySQL.
  • RoboWiki's server makes automatic backups regularly, but these are stored on the server and not offsite.
  • RoboWiki's server regularly requires manual reboots from SSH. On August 29, when the server was rebooted, it failed to boot.
  • MultiplyByZer0 and Flemming N. Larsen notified RoboWiki administrators about the outage. They began investigating on September 24.
  • Because the server could not be restarted from SSH or cPanel, David Alves had to file a support ticket with the VPS hosting company. The host's response time (17 days) was unsatisfactory, but they fixed the issue.
  • RoboWiki administrators are now considering their options for moving to a new VPS host. Hosting companies have been scrutinized and tables have been made. The current top choice seems to be Hetzner Cloud.
  • Since the Old RoboWiki is now effectively a static website, it can placed on static web hosting, which is free. PEZ made a proof of concept with Netlify.

The root cause of the server failure remains unknown.

Contents

August 29

The RoboWiki server begins returning "500 - Internal Server Error" for every URL.

A day later, it stops responding at all. All requests time out.

September 24-25

Date Tuesday, September 24, 2019 21:42:10 UTC
Sender MultiplyByZer0
Recipient Rednaxela, Skilgannon, Voidious
Subject RoboWiki has been down for a month

Hello,

Apologies if you already know about this and are working on fixing it, but RoboWiki and the Old RoboWiki are both down, and have been that way for almost a month (since August 29). We are forced to read pages through the Internet Archive, and we cannot edit pages, discuss Robocode, or submit bots to the RoboRumble.

Since you have shell access to the server, can you take a look at why the website is down? Thanks.

MultiplyByZer0

Date Tuesday, September 24, 2019 22:13:39 UTC
Sender Rednaxela
Recipient David Alves, Skilgannon, Voidious, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Hmm, strange, the server isn't responding at all.

Hi David, hope you're doing well. It seems the VPS isn't responding at all?

Best Regards,

Rednaxela/Alex

Date Wednesday, September 25, 2019 07:05:23 UTC
Sender Skilgannon
Recipient Rednaxela, David Alves, Voidious, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Hey

I've been trying to look into this with David, David has already contacted the hosting provider.

From time to time before the wiki was crashing, I suspect security issues due to running on an old 12.04 installation. Each time it would require a manual reboot. When I rebooted the final time it didn't come back up.

We have backups daily/weekly/monthly on the server, otherwise I have an older (~1 year? I'd need to check) backup of the database locally. Hopefully we can restore these onto a more modern base OS + mediawiki install.

Best

Julian

October 7

Date Monday, October 7, 2019 16:34:12 UTC
Sender David Alves
Recipient Rednaxela, Skilgannon, Voidious, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Hey all,

Sorry for the lack of updates here. I filed a ticket with the VPS company, then emailed them asking for a status on Oct 1st, then emailed them again just now. So far no useful response. I can see the container on their control panel, but I can't ping it via any of the container IP addresses (209.40.205.177, 67.223.226.21, 64.79.213.157). I have the option of rebooting the container via the web interface which I've already tried, and I can also reinstall the container but I believe that completely wipes the drive so I haven't done that so far. Not really sure what to do next. We should probably switch to something else like AWS but I think we need to get the data off the drive first, right?.

David

October 11-13

Date Friday, October 11, 2019 15:14:38 UTC
Sender David Alves
Recipient Rednaxela, Skilgannon, Voidious, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

+Adding Flemming to this email chain (he emailed me separately).

They fixed the issue and I can now SSH to the server, so I'd love it if one of you guys could ssh in and get the web server running again.

I'm pretty unhappy with how long it took for them to resolve that considering how much this VPS costs. We should probably move robowiki to some other hosting solution (AWS? some mediawiki hosting thing?).

David

Date Friday, October 11 2019 20:43:21 UTC
Sender Skilgannon
Recipient Rednaxela, David Alves, Voidious, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Success!

After restarting mysql and lighttpd a few times it seems to be running again. I've also downloaded the backups, so even if it dies again we'll be OK.

Agreed, let's try to find some different hosting, and we can take the opportunity to upgrade the OS to Ubuntu 18.04.

Thanks for your help on this David!

Best

Julian

Date Friday, October 11 2019 20:50:12 UTC
Sender Voidious
Recipient Rednaxela, David Alves, Skilgannon, PEZ, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

+PEZ

Amazing, that's great news! And I also agree, we can probably find much better (cheaper, reliable, hands-off) hosting for a wiki now than we did last time, over ten years ago...

Julian, do you want to take the lead on that? Or should I? I don't have a ton of time to contribute, but I certainly do have some, and I care about the RoboWiki living on instead of dying out while people are still trying to use it...

Date Friday, October 11, 2019 20:58:31 UTC
Sender Rednaxela
Recipient David Alves, Skilgannon, PEZ, Voidious, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Nice, thanks David and Julian.

Yeah, different hosting may be good. While I don't think I have the time to take the lead on this from my side I am interested in helping to extent I can.

About other hosting options, I will say I tend to find AWS to be a bit overpriced for what it is, and I'm not sure about mediawiki-specific hosting but maybe there are okay options. If we want to stay with a VPS but just a different one I will say I've had good experiences with DigitalOcean, good price, easy to work with, and very reliable.

Date Friday, October 11 2019 22:50:34 UTC
Sender Flemming N. Larsen
Recipient Rednaxela, David Alves, Skilgannon, PEZ, Voidious, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Nice job David and Julian! I love you guys! :-D

I thought it was lost for god this time, which would probably kill Robocode really fast without access to the fantastic documentation, RoboRumble etc.

I simply can't thank you enough.

I don't know anything about hosting the MediaWiki. But if there is anyway I can help you out with keep it up and running, and/or perhaps move it to another hosting provider, I will do what I can to help you out.

Just tell me how I can help. I could also pay for the hosting etc.

I am so happy that you got it up and running again, and I know that lots of Robocoders out there will be really happy to get the news. :-D

Best,

- Flemming

Date Sunday, October 13 2019 02:19:49 UTC
Sender MultiplyByZer0
Recipient Rednaxela, David Alves, Skilgannon, PEZ, Voidious, Flemming N. Larsen
Subject Re: RoboWiki has been down for a month

Thanks for the hard work, everyone.

I definitely agree that RoboWiki should move off its current host; good hosting services have effectively zero downtime.

I did some research, and my suggestions are:

  • The MediaWiki hosting route: In terms of free and wiki-specific hosting, there's the volunteer and donation-funded Miraheze, the ad-supported ShoutWiki, and the commercial Gamepedia. Using these hosts will save us from having to manage infrastructure and give us spam protection. Given that these free hosts exist, I see no reason to consider the paid hosting services, but there's a list of them, or you can google "mediawiki hosting". These services are "managed" – they do the backups, software upgrades, spec the servers, etc., which can be both good and bad. I don't think they give SSH access, though.
  • The VPS route: If you can make RoboWiki's server run on 0.2 CPUs, 600 MB RAM, 30 GB HDD, and 1 GB/month network egress, you can run it on Google Compute Engine for literally completely free; as far as I know, GCP is the only service to offer a free VPS. Otherwise, the cheapest hosts I found are OVH, AWS Lightsail, AWS EC2, and Hostinger; everything else is hilariously overpriced in comparison. These services give you full root access and the complete freedom to run code as you see fit, as if you had a physical server in front of you. The downside is that you have to handle the maintenance yourself. Given that nobody here has the time to do that, you should either set it up to not require too much maintenance.
  • While you're at it, you can upgrade the software, set up backups, add a CAPTCHA, add HTTPS, clean some things up, etc.

Also, would it be fine to publish this email chain on RoboWiki? It contains some extremely useful information about the RoboWiki server setup.

MultiplyByZer0

Date Sunday, October 13 2019 06:16:14 UTC
Sender Rednaxela
Recipient David Alves, Skilgannon, PEZ, Voidious, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

I'm extremely strongly against using any ad-supported hosting, and am perfectly willing to pay to avoid that.

Miraheze looks interesting, however they do have a "dormany policy" where they close or delete wikis after 60 days of inactivity, which I don't see as favorable for something like RoboWiki where we want it to remain accessible regardless of how activity ebbs and flows

As far as VPSs, the two I have experience using personally are DigitalOcean and BuyVM, both of which I've been using for many years at this point. BuyVM is a little better value on paper, and I've found it to have pretty good reliability but not perfect. DigitalOcean on the other hand I've found to be flawless for uptime. I will note though that BuyVM's unmetered bandwidth is kind of nice, giving peace of mind one won't run out of transfer or have to pay overage.

Looking at Amazon Lightsale, the pricing is in the same ballpark as DigitalOcean and BuyVM.

I tend to think a 1GB instance is plenty, but here are some comparisons in general for both 1GB and 2GB sort of class.

DigitalOcean BuyVM Amazon Lightsail OVH Hostinger
Ram 1GB 1GB 1GB 2GB* 1GB
Disk 25GB 20GB 40GB 20GB 20GB
Transfer 1TB (excess at $0.01/GB) Unmetered 2TB (excess at $0.09/GB) Unmetered 1TB
Cost $5/mo $3.50/mo $5/mo $3.35/mo $3.95/mo BUT $7.86/mo after the initial buy
Auto Backup $1/mo (4 weeks of weekly backups) None built-in $0.05/GB/mo $4.19/mo Unclear
DigitalOcean BuyVM Amazon Lightsail OVH Hostinger
Ram 2GB 2GB 2GB 4GB* 2GB
Disk 50GB 40GB 60GB 40GB 40GB
Transfer 2TB (excess at $0.01/GB) Umetered 3TB (excess at $0.09/GB) Unmetered 2TB
Cost $10/mo $7/mo $10/mo $6.87/mo (sale? temporary?) $8.95/mo BUT $15.76/mo after the initial buy
Auto Backup $2/mo (4 weeks of weekly backups) None built-in $0.05/GB/mo Included Unclear

EC2 is harder to compare as it's a very different pricing model, so far as I can tell more expensive for hosting these sorts of things, and is also much less predictable in price (big negative in my books), at least when one factors in storage+bandwidth which are charged for separately with EC2

While OVH looks like a good value on paper, looks like like there's a fair bit of negative reviews around them and they seem a littler sketchier. Hostinger does the cheeky thing of advertising favorable looking introductory prices but charges more to renew making for a pretty poor deal really.

All in all, if going for a VPS, I'd be inclined to go with DigitalOcean or Amazon Lightsail. Mostly ruling out BuyVM as it doesn't have built-in automatic backups (though if we set up backups more manually maybe it would be fine)

Between those DO and Lightsail, I will say Amazon Lightsail's bandwidth overage rate spooks me a lot more than DO's rate, despite Lightsail offering a larger base transfer, and offers more disk. As a note, I'm pretty sure both DigitalOcean and Amazon Lightsail do have ways to allow multiple folks with separate accounts to have full admin console access, if we want to distribute that. A lot of smaller VPS providers don't have that.

Best Regards,

Alex/Red

Date Sunday, October 13 2019 08:22:31 UTC
Sender Skilgannon
Recipient Rednaxela, David Alves, PEZ, Voidious, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

I'd prefer the VPS route. Especially because we want a read-only copy of the old wiki, I don't see any other option TBH.

And just to throw another into the mix, Hetzner has an excellent reputation for uptime, and prices are good too.

Date Sunday, October 13 2019 09:26:58 UTC
Sender Rednaxela
Recipient David Alves, Skilgannon, PEZ, Voidious, Flemming N. Larsen, MulitplyByZer0
Subject Re: RoboWiki has been down for a month

That's true, the read-only old wiki backup does make going the VPS route fairly preferable.

Hetzner does look pretty good... looks like equivalent to 3.27USD/mo for 2GB/20GB/20TB and backup pricing that's 20% of instance price (just like DigitalOcean), and equivalent to just 6.43USD/mo to double that ram and disk space. From a search looks like they have a relatively solid reputation too. Also looks like their traffic pricing for going over the included traffic limit is like 1/10th of DO's traffic overage pricing (and like 1/75th what it is with Amazon Lightsail), which is good if there's unexpected traffic. Would be higher ping to those of us in the Americas but I don't see that as a big deal. On paper I'm leaning toward Hetzner as the most preferred I've seen yet (though I do still also like DigitalOcean too based on my experiences with it)

Date Sunday, October 13 2019 10:43:55 UTC
Sender PEZ
Recipient Rednaxela, David Alves, Skilgannon, Voidious, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Hi all!

I now see that I only sent this to Flemming:

Wonderful news! Like Flemming I also thought it was down for good this time.

Guess that means I'll keep renewing the domain name then.

It's a long time ago the wiki ran off of a Cygwin install of a computer in my laundry room, but I think it will isn't a very demanding application? Maybe it can run on AWS Free Tier? I have no time to help with setting it up, but I can certainly be on the team that knows how to bring it up if it falls down. I'm in the CET time zone.

About old wiki asking for a VPS solution. Since it is static, we can crawl it and make a static site that we then just let Netlify serve. That hosting would cost zero dollars.

Regards,

/PEZ

Date Sunday, October 13 2019 14:15:03 UTC
Sender PEZ
Recipient Rednaxela, David Alves, Skilgannon, Voidious, Flemming N. Larsen, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Here's a POC of the static site option. I just threw wget at old.robowiki.net and gave the files to Netlify: https://old-robowiki.netlify.com

Lots of the links do not work yet. (I ran wget in windows mode to make it rename files so that Netlify accepted them. Then only replaced the most obvious pattern (? -> @) in the files.)

We would need to run wget much more surgically than I did here, because with all diff links and all it gets A LOT OF pages.I aborted the crawling after some 5K pages.

Naturally we would run such a site off the real domain name.

A bit of more work, not a lot, to get it good enough, but after that no maintenance at all for the old wiki.

/PEZ

Date Sunday, October 13 2019 18:55:03 UTC
Sender Flemming N. Larsen
Recipient Rednaxela, David Alves, Skilgannon, PEZ, Voidious, MultiplyByZer0
Subject Re: RoboWiki has been down for a month

Running the old RoboWiki as a static site (read-only) at Netlify looks very promising. :-)

Regarding new hosting, for the current RoboWiki, I recommend using one which deals with spam and infrastructure etc. so it is easy to maintain it.

However, I don't know much about hosting like the rest of you guys. So I trust your expertise in this area.

- Flemming

Personal tools