You might remember earlier this week — Tuesday to be exact — a disruption of Amazon Web Services caused people all over the East Coast to experience trouble accessing a large number of websites that rely on AWS. Now we know why: An employee accidentally turned off too many computer servers.
Amazon explained the outage on Thursday saying that an employee tasked with fixing a billing system bug executed a command incorrectly, inadvertently taking down two additional S3 servers.
According to Amazon, in order to get all of the systems back up and running, a full restart was required.
During this time, the S3 server was unable to service requests, making that sites using those servers were unable to function properly.
An enormous number of sites, including Airbnb, Business Insider, Expedia, Medium, Netflix, Quora, Slack, Trello, and the Securities and Exchange Commission experienced issues related to the outage, VentureBeat reported at the time of the outage.
“S3 has experienced massive growth over the last several years and the process of restarting these services and running the necessary safety checks to validate the integrity of the metadata took longer than expected,” Amazon said.
Eventually, the company was able to restore the systems and things were back to normal around 5 p.m. ET Tuesday.
“We are making several changes as a result of this operational event,” Amazon said Thursday, noting that it has modified certain tools to ensure similar issues don’t occur in the future.
“Finally, we want to apologize for the impact this event caused for our customers,” the company said. “We will do everything we can to learn from this event and use it to improve out availability even further.”
by Ashlee Kieler via Consumerist