Within 15 minutes of the kickoff of one of Amazon’s biggest sales days of the year, it was quickly apparent the e-commerce giant didn’t have enough servers to handle the surge in traffic, leading Amazon to throw up a scaled-down backup front page and to pause all international traffic.
That’s according to CNBC, which got a look at internal Amazon documents that also suggest the company’s auto-scaling capabilities may have failed in the lead-up to the crash. “That,” according to the business news network, “led to a cascading series of failures, including a slowdown in (Amazon’s) internal computation and storage service called Sable, and other services that depend on it, like Prime, authentication, and video playback.”
The snafus ended up causing problems throughout the company. Some warehouses, for example, reported being unable to scan or pack orders. Teams encompassing Alexa, Twitch and other Amazon offerings reported having trouble.
To be sure, the company’s sales don’t appear to have suffered (though we can’t say the same for whichever Amazon staffers likely had to feel the full fury of company founder Jeff Bezos’ well-known volcanic temper). Or maybe a better way to say it is the company still made money hand over fist.
The aftermath of Prime Day saw the company rave about the more than 100 million products Prime members scooped up during the 36-hour event. Estimates put the cost to Amazon from the downtime, though, at between $72 million and $99 million in lost sales, according to Business Insider.
Here’s a closer look at what happened.
The site started getting buggy almost as soon as the much-anticipated sales day launch on Monday. Per CNBC, “Updates made at 12PM PST say Amazon switched the front page to a simpler ‘fallback’ page, as it saw a growing number of errors.
“By 12:15PM PST, Amazon decided to temporarily cut off all international traffic to ‘reduce pressure’ on its Sable system, and by 12:37 PM PST, it re-opened the default front page to only 25 percent of traffic. At 12:40PM, Amazon made certain changes that improved the performance of Sable, but just two minutes later, it went back to ‘consider’ blocking approximately 5 percent of ‘unrecognized traffic to U.S.,’ according to one of the documents.”
The site’s error rate continued to get worse and then saw a drastic improvement in the afternoon until Amazon finally got a handle on things. At one point, more than 300 people were on an emergency conference call.
This year’s was the first Prime Day run by Amazon vice president of worldwide marketing and Prime Neil Lindsay. University of Southern California professor Carl Kesselman told CNBC that, all things considered, Amazon’s response was still impressive. Because, in many cases “the site would have crashed entirely under those circumstances.”
“Amazon is operating at a scale we haven’t operated before,” he said. “It’s not clear there’s a bad guy or an obvious screw up — it’s just we’re in uncharted territory and it’s amazing it didn’t just fall over.”