System Design Library

Flash Sale / Limited Inventory

Sell N limited items to a massive simultaneous crowd without overselling.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Decrement stock atomically
  • Queue/waiting room
  • Fairness
  • Checkout

Non-functional

  • No oversell
  • Survive 100×+ spike

Scale

Millions hitting one SKU at once

The approach

Pre-load stock into an atomic counter (Redis DECR) as the source of truth for "got one"; a waiting room/queue throttles entry; confirmed reservations flow to the DB for checkout; reject fast when sold out.

Key components

Waiting room → atomic stock counter (Redis) → order queue → DB

Numbers that matter

Senior deep-dive

The atomic counter in Redis is the inventory source of truth — the DB is for confirmed orders, not for answering 'is stock available'; mixing the two is how you oversell.

A virtual waiting room (queue/token bucket) is the architectural keystone: without it, a 100k-user spike hits your checkout stack directly and nothing survives — the queue is not a UX nicety, it's load shedding.

Reject fast at the edge: a sold-out flag in a CDN edge cache or API gateway turns a database-hammering avalanche into a sub-millisecond 'sold out' response for the long tail of too-late requests.

Atomic counter: the single right answer for inventory

SELECT + UPDATE under a lock is how textbooks describe it; it's also the path to deadlocks at 10k RPS. Redis DECR (or DECRBY) is atomic by design — the single-threaded event loop serializes all operations on the key. Wrap the DECR + threshold check in a Lua script to make them a single atomic operation: decrement, check if ≥ 0, return success or rollback the decrement. This guarantees no oversell without any locking on the application side.

Virtual waiting room: the real load-shedding mechanism

The waiting room is not about fairness, it's about protecting backend capacity. Issue a signed, time-bounded waiting-room token to every arriving user. A rate-limiter admits N tokens per second to the checkout flow — N is chosen to match the RPS your checkout stack can handle at target latency. The waiting room itself must be lightweight (static HTML + a polling endpoint): it must stay up even when everything behind it is at capacity. Token bucket or leaky bucket at the admission point is the correct primitive.

Sold-out propagation: kill the tail load immediately

When the counter hits zero, publish a sold-out event to a fast propagation channel (Redis pub/sub or an edge cache purge). API gateway and CDN edge rules cache the sold-out state with a short TTL (5–30 seconds) and return a pre-baked 'sold out' response without hitting origin. This single optimization collapses the residual load from millions of hopeful retries from a backend problem into a CDN-layer concern. Without it, sold-out traffic can be worse than in-stock traffic.

Reservation expiry: TTL is the correctness mechanism

A user who abandons checkout must release their reservation. A short TTL on the reservation record (stored in Redis or a fast DB) plus a background sweeper that increments the counter on expiry is cleaner than relying on explicit cancellation. The failure mode: the sweeper falls behind under load, causing reserved-but-expired inventory to be unavailable. Solution: run multiple sweeper workers and use a sorted set (ZSET by expiry timestamp) for O(log n) range queries of expired reservations.

Consistency between Redis counter and DB orders

Redis is the fast gate; the DB is the authoritative record of confirmed orders. After a successful DECR (reservation granted), the user proceeds to checkout; on payment confirmation, write the order to the DB. If payment fails, INCR the counter to release the reservation — this is the compensating action. The danger: a crash between payment success and the order write creates a ghost reservation — a paid user with no order. The fix: write the order first, then process payment, or use an idempotency key to detect and recover the order on retry.

What breaks at scale

Counter sharding becomes necessary when a single Redis key for a hot SKU saturates one CPU core's throughput (~500k ops/sec). Shard the counter into N keys (counter:item:0 through counter:item:N-1), use consistent hashing to route requests, and sum shards only for display purposes. The edge case: the last few units may span multiple shards — if shard 0 has 1 unit left and shard 1 has 1 unit left, two users could simultaneously decrement different shards and both succeed, yielding 2 sales on 1 remaining unit. Drain shards sequentially near zero or accept a small oversell margin and reconcile.

In production

Amazon Lightning Deals and Nike SNKRS use a pre-loaded Redis counter with Lua script DECR-and-check as the reservation gate, backed by a queue that smooths the thundering herd into a metered checkout flow. Ticketmaster's virtual waiting room issues waiting-room tokens via a separate service and admits users in batches timed to checkout capacity. The real engineering challenge is the transition from 'in stock' to 'sold out': that boundary is a high-contention moment where thousands of requests are simultaneously decrementing the counter toward zero — Lua scripts ensure atomicity, but the key must be on a single Redis shard, making shard selection critical for multi-SKU sales.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →