Greg DeVore

By: Greg DeVore on April 22nd, 2011

Print/Save as PDF

The Day the Docs Died

Miscellaneous

As you have probably already heard there were some problems with ScreenSteps Live and ScreenSteps.me over the past 36 hours. ScreenSteps Live is hosted on an Amazon Cloud infrastructure and Amazon had some major problems yesterday which are still going on as I write this. The problems they had affected a lot of sites, including Reddit, Quora and Foursquare just to name a few.

The Current State of Things

Updated 4/25/2011 1:01 PM - Everything was back to normal as of Saturday morning. You can look at the end of the post for the log of updates we posted.

What Happened

Amazon had a failure of EBS volumes across multiple availability zones. Theoretically that isn't supposed to be possible, but apparently it is. This meant that not only was it impossible get access to the ScreenSteps Live site, but we were unable to start a new instance of the application (or reboot) in an availability zone that was unaffected by the outage since they were all affected.

As a fallback we moved ScreenSteps Live to a new region in the US. This took some time though since as the East Coast Region failed there was a "rush" to start instances in the West which started creating capacity issues.

But even with those issues, in hindsight we can see areas where we could have moved faster and at least saved a couple of hours of downtime. We apologize for this and we will be documenting procedures that will help us be more responsive in the future when a situation like this happens again. We are also making changes to our architecture that will allow us to be more responsive as well.

We apologize for the inconvenience this has caused you. Regardless of what people say, there are a lot of people who do "read the manual" and not having your documentation available to you can have a serious impact on your business. The customers that contacted us were very understanding and we really appreciate it. We apologize that we didn't do better. We will post updates here as the ability to upload and edit content comes back online.

Status Log

Here are the status items we posted starting on 4/22/2011 (Friday) at 10:57 PM EDT.

ScreenSteps Live

ScreenSteps Live content is now available for viewing. We have disabled lesson uploading and editing until Amazon has things fully sorted out but your customers should now be able to see all of their documentation. If you absolutely need something changed in your documentation today then send us an email at support@bluemangolearning.com and we will help you out.

ScreenSteps.me

ScreenSteps.me is working intermittently. ScreenSteps.me is hosted on Heroku which provides a platform layer over Amazon Web Services and they are working to restore full functionality. At the current time the service seems to be going on and off.

Update 4/22/2011 8:28 PM EDT

ScreenSteps Live is still serving content reliably for customers but we have not brought uploading/editing back online. We will do this by the latest Saturday morning.

ScreenSteps.me was back up for awhile but is down again. We don't have an ETA of when it will be back up.

Update 4/22/2011 10:53 PM EDT 

ScreenSteps.me is back up and operational. Everything seems to be normal there.

Update 4/23/2011 12:42 AM EDT

ScreenSteps Live is back up and fully functional as is ScreenSteps.me. Thanks for your patience through this ordeal.

About Greg DeVore

CEO of ScreenSteps