Ticketmaster / Booking
Sell event seats with no double-booking, under flash-sale spikes.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Browse events/seats
- Hold a seat (timeout)
- Purchase
- Release on timeout
Non-functional
- No double-booking (strong consistency)
- Handle thundering herd
- Fair queueing
Scale
Flash sales: huge spike on one event
The approach
Reserve seats with a short-TTL hold (row lock or distributed lock); confirm via a transaction; a virtual waiting room/queue throttles the herd. Inventory is the consistency-critical core.
Key components
Waiting-room queue → app → inventory DB (transactions/locks) · cache for browse
Numbers that matter
- ~50,000 concurrent users can hit a single hot event (Taylor Swift, Super Bowl) in the first 60 seconds — the waiting room must absorb that before it reaches the DB.
- Hold TTL of ~10 minutes is the industry standard: long enough to complete payment, short enough to recycle inventory without users rage-quitting.
- A typical stadium has ~70,000 seats; at full concurrency each seat row needs a lock — keeping lock duration under ~200ms per transaction prevents cascading waits.
- Payment confirmation latency budget is <3 seconds end-to-end including the external payment gateway round-trip — hold the seat in Redis, not in the DB, while waiting on the gateway.
Senior deep-dive
Inventory consistency is the only hard problem — the rest is standard CRUD with a nicer UI.
A short-lived hold (row lock + TTL ~10 min) bridges discovery to payment; without it you either oversell or frustrate users who "see" seats that vanish. Virtual waiting rooms absorb the flash-sale thundering herd before it ever hits your database.
Seat-level locking is brutal at scale — shard by event, use optimistic locking with version columns, and release holds aggressively to avoid phantom inventory.
The waiting room is a rate limiter, not a UX feature
A waiting room's job is to cap the arrival rate into your inventory service to something it can actually handle — e.g. 500 req/s instead of 50,000. It should issue metered tokens (signed JWTs with a queue position and issue time) so the inventory tier never sees raw demand. Without it, even a perfectly sharded DB gets thundering-herd cache misses on popular events.
Hold granularity: seat-level vs section-level
Seat-level holds are necessary for reserved seating but create O(seats) lock rows — for a 70k-seat stadium that's manageable in a relational DB with an index on (event_id, seat_id, status). Section-level counters (like airlines use for unsorted inventory) let you use an atomic Redis DECR for speed. The choice is driven by whether the product promises specific seats or just 'a seat in section B'.
Idempotency through the payment wall
Between hold and confirm sits an external payment gateway that can timeout, retry, or double-fire. Every payment initiation must carry an idempotency key (hold_id + attempt_number) so the gateway de-dupes. Your own confirm endpoint must also be idempotent — check-then-set with a version column prevents double-confirmation if the client retries on a slow network.
Inventory sharding by event, not by seat
Sharding on seat_id spreads load but puts one hot event's seats across many shards, requiring distributed transactions for multi-seat purchases. Shard by event_id instead: all seats for an event live on one shard, making multi-seat cart operations a local transaction. A single megaevent can be vertically scaled (move to a larger shard node) before the sale opens — you know well in advance.
Expiry reclaim: the silent second stampede
When 10,000 holds expire simultaneously at T+10min, those seats re-enter 'available' atomically. If 8,000 users are still in the waiting room, you've just re-triggered a mini-stampede. Stagger hold TTLs by ±60s using jitter on issue time; also batch-release reclaimed seats back into a drip queue rather than flipping them all available at once.
What breaks at scale
Redis as the hold store becomes your single point of failure — a Redis failover during peak sale is catastrophic. Use Redis Sentinel or Cluster with AOF persistence so holds survive a primary failure. A second failure mode: the payment gateway rate-limits you during a spike (Stripe/Braintree have per-account QPS caps); implement a payment-request queue with backpressure rather than letting upstream gateway errors cascade into double-charge retries.
In production
Ticketmaster uses a virtual waiting room (implemented via a separate queue service that issues metered tokens) to prevent the DB from seeing the raw thundering herd. Seat holds are implemented as time-bounded reservations backed by Redis with TTL for the hold state and a relational DB for the authoritative confirmation — a pattern also used by Airbnb for listing holds. The real engineering challenge is not the lock itself but the coordinated release: when a hold expires mid-sale, reclaiming and re-exposing that inventory to still-waiting users without a secondary stampede requires careful queueing discipline.
Common mistakes
- Eventual consistency on inventory (double-sells)
- No hold timeout (seats stuck)
- Letting the full herd hit the DB at once