Blog/Quality Assurance

AWS Outage 2025: What Took Down Half the Internet

AWS Outage 2025: What Took Down Half the Internet

It’s business as usual at Amazon Web Services (AWS), after having had one of its biggest global meltdowns in years. On October 20, AWS experienced a major outage that knocked out a huge chunk of the internet for several hours. The issue started late Sunday night (Pacific time) in AWS’s US-EAST-1 region—and rippled worldwide.

What went wrong

At around 7:11 a.m. GMT, AWS suffered a major outage, disrupting apps and websites across banking, gaming, and entertainment.

It began at AWS’s Virginia data center, its largest and oldest, after a technical update to DynamoDB, a key cloud database that stores user and platform data. The update caused a Domain Name System (DNS) error, which prevents apps from locating the correct server addresses. With DNS failing, apps could not connect to DynamoDB’s API.

As the disruption cascaded, it affected 113 AWS services. By 10:11 a.m. GMT, Amazon said operations had returned to normal, though some backlogs would take hours to clear.

Downdetector, which tracks outages via user reports, still showed issues with services including OpenAI, ESPN, and Apple Music at the time of publication.

Who felt it

Pretty much everyone. The outage hit dozens of websites and apps, including Snapchat, Pinterest, Apple TV, WhatsApp, Signal, Zoom, and Slack. Gaming services such as Roblox, Fortnite, and Xbox were also affected, along with Starbucks and Etsy.

In the U.S., users reported problems with financial apps, including Venmo. Some said their Ring doorbells and Alexa devices stopped working, while others could not access Amazon’s website or download Kindle books.

Language app Duolingo, design tool Canva, and media outlets including The Associated Press, The New York Times, and The Wall Street Journal also experienced issues. Banks, cryptocurrency exchange Coinbase, AI firm Perplexity, and U.S. airlines Delta and United reported disruptions as well.

Why it matters

This outage reignited a familiar debate: Are we too dependent on a few cloud giants? When a single AWS region sneezes, half the internet catches a cold. Analysts and regulators say it’s time for stricter oversight and stronger multi-cloud strategies, especially for financial and public services.

For developers and companies, the lesson is clear:

  • Don’t rely on a single region or provider.
  • Design for failure.
  • Incorporate regular cloud testing, including real-world outage scenarios — not just in theory.

Go a step further:

  1. Test for disaster recovery to ensure that when cloud services return, your systems can recover gracefully. The goal is to avoid being stuck with a broken platform while competitors resume work and take your clients.
  2. Ensure rigorous monitoring of your own systems, so that internal alerts surface issues before customers do — and so your teams know exactly what’s going wrong.
  3. Consider chaos testing as part of your resilience strategy. It’s a proven approach to validate how your systems behave under failure conditions.

The bottom line

AWS’s October outage wasn’t the biggest ever, but it was one of the most disruptive in recent memory. It showed how fragile “the cloud” can be when critical systems go sideways — and why resilience engineering is no longer optional for anyone building on it.

ONLINE CONFERENCE

The industry-leading brands speaking at Quality Forge 2025

  • Disney
  • Nuvei
  • Lenovo
  • Stream
Register now