System Design Library

Ticketmaster / Booking

Sell event seats with no double-booking, under flash-sale spikes.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Browse events/seats
  • Hold a seat (timeout)
  • Purchase
  • Release on timeout

Non-functional

  • No double-booking (strong consistency)
  • Handle thundering herd
  • Fair queueing

Scale

Flash sales: huge spike on one event

The approach

Reserve seats with a short-TTL hold (row lock or distributed lock); confirm via a transaction; a virtual waiting room/queue throttles the herd. Inventory is the consistency-critical core.

Key components

Waiting-room queue → app → inventory DB (transactions/locks) · cache for browse

Numbers that matter

Senior deep-dive

Inventory consistency is the only hard problem — the rest is standard CRUD with a nicer UI.

A short-lived hold (row lock + TTL ~10 min) bridges discovery to payment; without it you either oversell or frustrate users who "see" seats that vanish. Virtual waiting rooms absorb the flash-sale thundering herd before it ever hits your database.

Seat-level locking is brutal at scale — shard by event, use optimistic locking with version columns, and release holds aggressively to avoid phantom inventory.

The waiting room is a rate limiter, not a UX feature

A waiting room's job is to cap the arrival rate into your inventory service to something it can actually handle — e.g. 500 req/s instead of 50,000. It should issue metered tokens (signed JWTs with a queue position and issue time) so the inventory tier never sees raw demand. Without it, even a perfectly sharded DB gets thundering-herd cache misses on popular events.

Hold granularity: seat-level vs section-level

Seat-level holds are necessary for reserved seating but create O(seats) lock rows — for a 70k-seat stadium that's manageable in a relational DB with an index on (event_id, seat_id, status). Section-level counters (like airlines use for unsorted inventory) let you use an atomic Redis DECR for speed. The choice is driven by whether the product promises specific seats or just 'a seat in section B'.

Idempotency through the payment wall

Between hold and confirm sits an external payment gateway that can timeout, retry, or double-fire. Every payment initiation must carry an idempotency key (hold_id + attempt_number) so the gateway de-dupes. Your own confirm endpoint must also be idempotent — check-then-set with a version column prevents double-confirmation if the client retries on a slow network.

Inventory sharding by event, not by seat

Sharding on seat_id spreads load but puts one hot event's seats across many shards, requiring distributed transactions for multi-seat purchases. Shard by event_id instead: all seats for an event live on one shard, making multi-seat cart operations a local transaction. A single megaevent can be vertically scaled (move to a larger shard node) before the sale opens — you know well in advance.

Expiry reclaim: the silent second stampede

When 10,000 holds expire simultaneously at T+10min, those seats re-enter 'available' atomically. If 8,000 users are still in the waiting room, you've just re-triggered a mini-stampede. Stagger hold TTLs by ±60s using jitter on issue time; also batch-release reclaimed seats back into a drip queue rather than flipping them all available at once.

What breaks at scale

Redis as the hold store becomes your single point of failure — a Redis failover during peak sale is catastrophic. Use Redis Sentinel or Cluster with AOF persistence so holds survive a primary failure. A second failure mode: the payment gateway rate-limits you during a spike (Stripe/Braintree have per-account QPS caps); implement a payment-request queue with backpressure rather than letting upstream gateway errors cascade into double-charge retries.

In production

Ticketmaster uses a virtual waiting room (implemented via a separate queue service that issues metered tokens) to prevent the DB from seeing the raw thundering herd. Seat holds are implemented as time-bounded reservations backed by Redis with TTL for the hold state and a relational DB for the authoritative confirmation — a pattern also used by Airbnb for listing holds. The real engineering challenge is not the lock itself but the coordinated release: when a hold expires mid-sale, reclaiming and re-exposing that inventory to still-waiting users without a secondary stampede requires careful queueing discipline.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →