System Design Library

Ticketmaster / Booking

Sell event seats with no double-booking, under flash-sale spikes.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Browse events/seats
Hold a seat (timeout)
Purchase
Release on timeout

Non-functional

No double-booking (strong consistency)
Handle thundering herd
Fair queueing

Scale

Flash sales: huge spike on one event

The approach

Reserve seats with a short-TTL hold (row lock or distributed lock); confirm via a transaction; a virtual waiting room/queue throttles the herd. Inventory is the consistency-critical core.

Key components

Waiting-room queue → app → inventory DB (transactions/locks) · cache for browse

Numbers that matter

~50,000 concurrent users can hit a single hot event (Taylor Swift, Super Bowl) in the first 60 seconds — the waiting room must absorb that before it reaches the DB.
Hold TTL of ~10 minutes is the industry standard: long enough to complete payment, short enough to recycle inventory without users rage-quitting.
A typical stadium has ~70,000 seats; at full concurrency each seat row needs a lock — keeping lock duration under ~200ms per transaction prevents cascading waits.
Payment confirmation latency budget is <3 seconds end-to-end including the external payment gateway round-trip — hold the seat in Redis, not in the DB, while waiting on the gateway.

Senior deep-dive

Inventory consistency is the only hard problem — the rest is standard CRUD with a nicer UI.

A short-lived hold (row lock + TTL ~10 min) bridges discovery to payment; without it you either oversell or frustrate users who "see" seats that vanish. Virtual waiting rooms absorb the flash-sale thundering herd before it ever hits your database.

Seat-level locking is brutal at scale — shard by event, use optimistic locking with version columns, and release holds aggressively to avoid phantom inventory.

The waiting room is a rate limiter, not a UX feature

A waiting room's job is to cap the arrival rate into your inventory service to something it can actually handle — e.g. 500 req/s instead of 50,000. It should issue metered tokens (signed JWTs with a queue position and issue time) so the inventory tier never sees raw demand. Without it, even a perfectly sharded DB gets thundering-herd cache misses on popular events.

Hold granularity: seat-level vs section-level

Seat-level holds are necessary for reserved seating but create O(seats) lock rows — for a 70k-seat stadium that's manageable in a relational DB with an index on (event_id, seat_id, status). Section-level counters (like airlines use for unsorted inventory) let you use an atomic Redis DECR for speed. The choice is driven by whether the product promises specific seats or just 'a seat in section B'.

Idempotency through the payment wall

Between hold and confirm sits an external payment gateway that can timeout, retry, or double-fire. Every payment initiation must carry an idempotency key (hold_id + attempt_number) so the gateway de-dupes. Your own confirm endpoint must also be idempotent — check-then-set with a version column prevents double-confirmation if the client retries on a slow network.

Inventory sharding by event, not by seat

Sharding on seat_id spreads load but puts one hot event's seats across many shards, requiring distributed transactions for multi-seat purchases. Shard by event_id instead: all seats for an event live on one shard, making multi-seat cart operations a local transaction. A single megaevent can be vertically scaled (move to a larger shard node) before the sale opens — you know well in advance.

Expiry reclaim: the silent second stampede

When 10,000 holds expire simultaneously at T+10min, those seats re-enter 'available' atomically. If 8,000 users are still in the waiting room, you've just re-triggered a mini-stampede. Stagger hold TTLs by ±60s using jitter on issue time; also batch-release reclaimed seats back into a drip queue rather than flipping them all available at once.

What breaks at scale

Redis as the hold store becomes your single point of failure — a Redis failover during peak sale is catastrophic. Use Redis Sentinel or Cluster with AOF persistence so holds survive a primary failure. A second failure mode: the payment gateway rate-limits you during a spike (Stripe/Braintree have per-account QPS caps); implement a payment-request queue with backpressure rather than letting upstream gateway errors cascade into double-charge retries.

In production

Ticketmaster uses a virtual waiting room (implemented via a separate queue service that issues metered tokens) to prevent the DB from seeing the raw thundering herd. Seat holds are implemented as time-bounded reservations backed by Redis with TTL for the hold state and a relational DB for the authoritative confirmation — a pattern also used by Airbnb for listing holds. The real engineering challenge is not the lock itself but the coordinated release: when a hold expires mid-sale, reclaiming and re-exposing that inventory to still-waiting users without a secondary stampede requires careful queueing discipline.

Common mistakes

Eventual consistency on inventory (double-sells)
No hold timeout (seats stuck)
Letting the full herd hit the DB at once

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →