System Design Library

Flight/Hotel Aggregator (Kayak)

Search many providers in parallel and merge results fast.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Query many suppliers
  • Merge/dedupe/rank
  • Filters
  • Price freshness

Non-functional

  • Fast despite slow suppliers
  • Fresh prices

Scale

Many providers, high query volume

The approach

Scatter-gather: query all supplier APIs in parallel with timeouts; merge/dedupe/rank what returns in time; cache results briefly (prices are volatile); degrade gracefully when a supplier is slow.

Key components

Search → parallel supplier calls (timeout) → merge/rank → short cache

Numbers that matter

Senior deep-dive

Scatter-gather with a deadline is the architecture — query all suppliers in parallel, accept whatever responds within your timeout budget, and degrade gracefully on the rest.

Results are volatile and cannot be durably cached: airfare prices change every few seconds, so a "cache hit" older than 30–60 seconds is potentially wrong and may fail at checkout.

The hard part is normalization: each airline/hotel API returns prices, availability, and fare rules in different schemas, currencies, and date formats — the supplier adapter layer, not the search fan-out, is where most engineering lives.

Supplier fan-out: the search bus

A search request is broadcast to all registered supplier adapters concurrently via a message bus or async RPC. Each adapter translates the normalized query (origin, destination, dates, passengers) into the supplier's native protocol (REST/JSON for NDC, SOAP/XML for legacy GDS). The aggregator collects responses up to a configurable deadline (~2.5s) and merges whatever arrived. Suppliers that consistently miss the deadline get deprioritized or served from a pre-fetched cache warm on popular routes.

Result normalization: the unglamorous hard part

Every supplier returns prices in its own schema: different fare basis codes, tax breakdowns, currency codes, baggage allowance representations, connection rules. A canonical flight model defines the internal representation; each adapter maps supplier output to it. Fare deduplication is needed because the same itinerary may appear from 3 different suppliers at slightly different prices — you need a stable itinerary fingerprint (origin + destination + flight numbers + departure times) to group duplicates and keep the cheapest. Missing a dedup means showing the same flight 5 times in results.

Price volatility and cache invalidation

You can cache search results for 30–90 seconds (long enough to reduce re-queries on browser refresh, short enough that prices aren't dangerously stale). The cache key is (origin, destination, date, passenger count, cabin class). On a cache miss, you go live to suppliers; on a cache hit, you show the cached price with a "prices may have changed" warning. The checkout flow always re-validates pricing live before issuing the booking — the displayed price is advisory, the confirmed price is authoritative.

Deep linking vs. in-house booking

Kayak's original model was deep-linking — clicking a result takes you to the airline's or OTA's own checkout page. This avoids the complexity of managing bookings but loses the user at the moment of purchase. Adding in-house booking requires integrating with GDS booking APIs (PNR creation, ticketing), handling PCI-compliant card processing, and managing post-booking changes/cancellations. Each of those is a multi-year integration project; most metasearchers stay deep-link or partner with a booking engine (like Booking Holdings for hotels).

Price prediction and alerts

"Prices are likely to rise — buy now" features require a historical price time-series per route+date combination. This is a background data collection job: periodically poll suppliers for popular routes, store the price, build a model over the series. The model is simple (linear trend + day-of-week + days-to-departure features); the hard part is data freshness — prices sampled every hour miss intraday spikes. Price alert subscriptions need a matching engine: when a live search result for a watched route drops below the user's threshold, trigger a notification asynchronously.

What breaks at scale

Slow supplier tail latency is the most common user-visible failure — one airline API at P99=5s drags every search that includes its routes. The fix is aggressive hedged requests: after 1.5s, re-issue the request to a backup supplier or GDS; use whichever responds first. Price discrepancy at checkout (price shown ≠ price charged) is the UX-destroying failure and regulators now mandate disclosure — it happens because the search result holds no inventory. GDS outages are rare but total: if Sabre goes down, every search that routes through Sabre returns no results for affected airlines, and there's no good fallback because the airlines often don't offer NDC at the same coverage level.

In production

Kayak, Google Flights, and Skyscanner all use a scatter-gather pattern backed by connections to the major GDSs (Sabre, Amadeus, Travelport) plus direct airline NDC APIs. The GDSs are legacy SOAP/X.12 systems — a huge part of the engineering is the protocol translation layer that maps EDIFACT/OTA XML into a normalized internal flight model. The real challenge is deep-link pricing: the price shown in search results is not always the price at checkout because airlines apply dynamic surcharges at booking time. Google Flights solved this partially by striking direct data agreements with airlines rather than going through GDS intermediaries.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →