Amazon explains the outage that wiped out much of the internet


Amazon has explained the web services outage that took parts of the internet offline for several hours on December 7 – and promised more clarity if that happens in the future. Like CNBC reports, Amazon revealed an automated capacity scaling feature resulted in “unexpected behavior” from internal network clients. The devices connecting this internal network to AWS were overwhelmed, which blocked communications.

The nature of the failure made it difficult for teams to identify and resolve the issue, Amazon added. They had to use logs to find out what happened, and internal tools were affected as well. Rescuers were “extremely deliberate” in restoring service to avoid breaking still-functioning workloads, and had to deal with a “latent problem” that prevented network clients from backing down and giving systems a break. chance to recover.

The AWS division has temporarily disabled the scaling that led to the issue and will not re-enable it until solutions are in place. A fix for the latent problem is coming in two weeks, Amazon said. There is also an additional network configuration to protect the devices in the event of repeated failure.

You may find it easier to understand seizures next time. A new version of the AWS Service Status Dashboard is slated for early 2022 to provide a clearer view of outages, and a multi-regional support system will help Amazon connect with customers much sooner. These won’t bring AWS back any faster during an incident, but they can remove some of the mystery when services get dark, which is important when victims include everything from Disney + vacuums to Roomba vacuums.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through any of these links, we may earn an affiliate commission.


Comments are closed.