Major AWS Outage Disrupts Numerous Online Services
Date: October 20, 2023
On a chilly October morning, many users across the globe experienced disruptions due to a significant outage of Amazon Web Services (AWS). This incident impacted a vast array of websites, applications, games, and digital tools dependent on AWS’s cloud services, including widely used platforms such as Venmo, Snapchat, Canva, and Fortnite. Additionally, Amazon’s virtual assistant, Alexa, faced difficulties, contributing to online frustrations for countless users.
Timeline of the Outage
As of 1:15 PM ET on October 20, the AWS outage had not been fully remedied, with multiple services continuing to be unavailable. Users reported challenges when trying to utilize Alexa for simple tasks, such as checking the weather or controlling smart home devices. Venmo users experienced service interruptions, with the app issuing alerts acknowledging its ongoing issues. Similarly, the Lyft application reported slower response times than usual.
The Root Cause
At 3:11 AM ET, AWS identified "increased error rates and latencies for multiple services" emanating from its US-EAST-1 data centers located in Northern Virginia. By 5:01 AM, AWS traced the outage to a DNS resolution problem linked to its DynamoDB API, which caused data access disruptions even though Amazon had retained the information securely.
Mike Chapple, a teaching professor at the University of Notre Dame, explained to CNN that it was akin to the internet experiencing a brief period of amnesia, where client applications were unable to locate their necessary data.
Resolution Efforts
As of 6:35 AM, AWS reported that it had fully resolved the DNS issue and most operations were returning to normal. Nonetheless, the initial problems had cascading effects on additional AWS services, particularly EC2, which provides virtual machines for numerous companies’ applications.
Progress continued as AWS released updates indicating that they were addressing issues with new EC2 instances in the same region by 8:48 AM. Clients were advised not to anchor new deployments to specific Availability Zones in order to optimize AWS’s resource allocation. By 9:42 AM, AWS acknowledged ongoing elevated error rates for new EC2 launches and implemented rate limiting to facilitate recovery.
Furthermore, the company reported at 10:14 AM that significant API errors and connectivity problems persisted across multiple services in the US-EAST-1 region, leading to a backlog that would require time to clear.
Industry-Wide Impact
With numerous companies utilizing the US-EAST-1 region for their AWS operations, the outage had widespread repercussions. A multitude of online services, including banks, airlines, streaming platforms like Disney+, and social media sites were reported as sluggish or unresponsive. Downdetector noted a surge in outage reports, impacting brands such as Reddit, Apple Music, Pinterest, Roblox, and The New York Times, raising concerns for users, particularly those engaging with daily trivia like Wordle.
Despite AWS’s robust features that allow for dynamic resource scaling and global infrastructure support, this incident underscored the potential vulnerabilities of relying heavily on a limited number of cloud service providers. Market analysts estimate that AWS holds approximately 30% of the global cloud infrastructure market share as of mid-2023.
Conclusion
This major incident serves as a reminder of the critical role AWS plays in the digital landscape and the importance of having resilient infrastructure in place. As additional updates unfold, the tech community continues to monitor the situation and its repercussions for affected services.
Updates
- 10:57 AM ET: Article initially updated to reflect services affected by the outage.
- 11:17 AM ET: Inclusion of Reddit’s status update regarding service disruptions.
- 1:15 PM ET: Current status on specific popular services, including Lyft and Venmo, updated based on ongoing assessments.



