System Design Library

E-commerce (Amazon)

Browse a huge catalog, manage carts, and check out with consistent inventory.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Catalog + search
  • Cart
  • Checkout
  • Inventory
  • Orders

Non-functional

  • Fast catalog reads
  • Consistent inventory & payment

Scale

Hundreds of millions of items

The approach

Catalog/search read-optimized + cached (eventual OK); cart in a fast store (Redis/Dynamo); checkout is a transaction across inventory + payment + order, often a Saga across services.

Key components

Catalog + search · cart store · checkout (saga: inventory→payment→order)

Numbers that matter

Senior deep-dive

Catalog/search and checkout are fundamentally different consistency models — search is eventually-consistent and read-optimized (Elasticsearch, heavy caching); checkout must be transactionally consistent for inventory, payment, and order creation.

Cart is the most-read, rarely-consistent data in the system — storing it in a fast KV store (DynamoDB/Redis) with eventual reconciliation at checkout is the right tradeoff; strong consistency on cart reads adds latency for no user-visible benefit.

Inventory is the hard part — every oversell is a business failure, so the inventory decrement must be atomic and the reservation TTL-bounded so abandoned carts don't starve stock.

Catalog: eventual consistency is intentional

Product pages are served from a read-through cache backed by a denormalized document store (price, title, images, attributes merged into one document). Updates flow asynchronously from source-of-truth systems (pricing service, catalog service) via an event stream. Stale price displays (showing $19.99 when the price changed to $21.99) are acceptable; the authoritative price check happens at checkout, not on the product page — this is by design and legally required in most jurisdictions.

Cart: fast and eventually reconciled

Cart data lives in DynamoDB or Redis — O(1) reads, no joins, no transactions. The tradeoff: if the same user opens two browser tabs and modifies the cart in both, the last write wins (LWW). At checkout, the cart is validated against live inventory and pricing — this is the reconciliation point. Items may have gone out of stock or changed price since they were added; surfacing this at checkout (not at add-to-cart time) is the deliberate UX choice that avoids constant product page latency.

Inventory: atomic reservation with TTL

The inventory service maintains an available count separate from the total count. `reserve(sku, qty, ttl)` atomically decrements available stock and creates a time-bounded hold. If the user doesn't complete checkout within the TTL, a sweep job releases the hold and increments available. The atomic decrement is typically a database row lock or Redis DECRBY with a floor check — the important invariant is that available never goes below zero. Distributing this across multiple DBs without 2PC requires partitioning by SKU.

Checkout as a Saga

Checkout spans inventory, payment, and order services — too long-running for a single ACID transaction. A Saga sequences: reserve stock → authorize card → write order → capture payment. Each step has a compensating transaction: release stock, void authorization. The Saga orchestrator persists its state so a crash mid-saga doesn't lose the transaction. The hard case: payment captured but order write failed — the compensating void must succeed or a human-review queue handles the leaked charge.

Search ranking: beyond relevance

Elasticsearch gives you text relevance (BM25), but relevance alone surfaces obscure matching items over bestsellers. Amazon layers a ranking model trained on conversion rate (clicks that became purchases), incorporating sales velocity, reviews, sponsored status, and personal history. The model runs as a re-ranker on the top-K candidates from Elasticsearch, not over the full index — running ML over millions of results per query is infeasible at latency budgets.

What breaks at scale

Flash sales on a single SKU create a write hotspot on one inventory row — even with row locking, queue depth at the DB server causes checkout latency to spike. The mitigation is a pre-sale reservation queue (virtual waiting room) that serializes access. Payment provider outages during peak (Prime Day, Black Friday) require graceful degradation: surface clear error messages and retry queues rather than silently failing. Search index lag during a pricing update (millions of SKUs re-priced simultaneously) can leave stale prices in search results for minutes — the authoritative checkout price reconciles, but customer trust erodes when prices differ between search and checkout.

In production

Amazon's actual architecture separates the item page (Dynamo-backed, eventual) from the order pipeline (strongly consistent, Saga-driven). The product detail page is served from a heavily denormalized cache updated by an async pipeline from the catalog DB; inventory availability on the product page is approximate ("In Stock" vs "Low Stock") to avoid hammering the inventory service. The checkout pipeline is a Saga: reserve inventory → authorize payment → create order → confirm inventory → capture payment; each step has a compensating action (release inventory, void authorization). Search is Elasticsearch with a custom ranking model trained on conversion signals — relevance without conversion optimization produces correct but useless results.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →