PRP Newswire Digital News & Media Platform

collapse
Home / Daily News Analysis / Amazon orders 90-day reset after code mishaps cause millions of lost orders

Amazon orders 90-day reset after code mishaps cause millions of lost orders

Mar 11, 2026  Twila Rosenbaum  5 views

Amazon is taking significant steps to reinforce its internal protocols following a series of outages that have affected its e-commerce operations, including a disruption linked to its AI coding assistant, Q. The company's Senior Vice President of e-commerce services, Dave Treadwell, communicated to employees that a "trend of incidents" has been observed since the third quarter of 2025, with several major incidents occurring in recent weeks.

According to internal documents, these disruptions included incidents characterized by "high blast radius changes," where software updates spread widely due to inadequate safeguards in control planes. Additionally, some problems led to data corruption that took hours to resolve, with some failures attributed to basic requirements for code change authorizations that were either overlooked or bypassed.

In light of these issues, Amazon is instituting tighter controls requiring engineers to thoroughly document code changes and obtain additional approvals. The company is simultaneously developing safeguards aimed at introducing what executives have termed "controlled friction" into the code-change review process.

Treadwell expressed in the internal document, "We are implementing temporary safety practices which will introduce controlled friction to changes in the most important parts of the Retail experience. In parallel, we will invest in more durable solutions including both deterministic and agentic safeguards." This reflects the ongoing challenges posed by generative AI in software development, where the volume of code produced has surged but still necessitates rigorous checks to avoid potential issues.

AI's Impact on Software Development

The recent issues at Amazon illustrate the disruption generative AI is causing in software development processes. AI coding tools like Claude Code and Amazon's Q and Kiro assist engineers in producing code at unprecedented rates. However, the influx of new code can overwhelm traditional software review systems, leading to unforeseen problems.

Understanding "Agentic" and "Deterministic" Approaches

Treadwell noted that Amazon's new code guardrails will integrate AI-driven, "agentic" tools with more predictable, rules-based "deterministic" systems. This dual approach aims to address one of the fundamental challenges with AI models, which is their non-deterministic nature. Unlike traditional systems that provide consistent outputs, AI models may yield varying results for the same input, making them unsuitable for corporate environments that require absolute accuracy.

On March 2, a significant disruption occurred when customers experienced incorrect delivery times while shopping, which resulted in nearly 120,000 lost orders and approximately 1.6 million website errors. An internal review linked this incident primarily to Amazon's AI tool, Q. The document stated, "GenAI's usage in control plane operations will accelerate exposure of sharp edges and places where guardrails do not exist; we need investment in control plane safety." Following this, on March 5, another outage caused a staggering 99% drop in orders across Amazon's North American marketplaces, leading to 6.3 million lost orders. This incident was attributed to a production change made without adhering to the formal documentation and approval process known as Modeled Change Management.

New 90-Day Safety Guidelines

As part of its response to these challenges, Amazon is implementing a 90-day temporary safety guideline that acts as an addendum to its existing policies. This new policy specifically targets around 335 "Tier-1 systems," which are services that can directly impact consumers and have encountered multiple order-affecting incidents over the past year.

Under the new regulations, Amazon engineers must secure two reviews of their work prior to executing any coding changes. They are also required to utilize an internal documentation and approval tool along with an automated coding system that complies with Amazon's central reliability engineering guidelines. Furthermore, Amazon is notifying all Tier-1 system owners, as well as Director and VP-level leaders, to audit all production code change activities within their departments.

Despite the challenges, an Amazon spokesperson clarified that only one of the incidents discussed in the Tuesday meeting was AI-related and that none involved AI-generated code. The spokesperson emphasized that the meeting was part of a routine weekly review process focused on continuous improvement.

In summary, Amazon's proactive measures and the introduction of a 90-day safety reset demonstrate the company's commitment to enhancing its operational reliability in response to recent coding mishaps that disrupted its services.


Source: Business Insider News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy