Google Search
Index the web and answer queries in <200ms with ranked results.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Crawl & index
- Query → ranked docs
- Snippets
Non-functional
- <200ms
- Massive corpus
- Fresh
Scale
Trillions of pages
The approach
Crawler → inverted index (term → posting list of docs), sharded by document; query fans out to all index shards in parallel, each returns top candidates, results merged & ranked (PageRank + signals); heavily cached.
Key components
Crawler → indexer → sharded inverted index · query aggregator → ranking · cache
Numbers that matter
- Google's index covers ~hundreds of billions of web pages; the inverted index for that corpus is estimated at ~100+ petabytes across all replicas.
- A web search query fans out to ~1,000–3,000 index shards in parallel — each returning candidates in ~5–20ms — before merging, ranking, and serving the SERP in <200ms total.
- PageRank and link graph computation runs as a batch job over ~60 billion links; iterative convergence takes O(days) on a MapReduce/Pregel cluster.
- Google serves ~8.5 billion searches/day (~100k QPS); caching popular queries at the serving layer absorbs ~30–50% of that load before it hits the index.
Senior deep-dive
The inverted index is the data structure; the hard part is ranking — any system can build a posting list, but computing a relevance score that accounts for PageRank, query-term proximity, freshness, and personalization in <200ms across a trillion documents is where the work lives.
Scatter-gather over thousands of shards in parallel is the latency model: each shard returns its local top-k, a merger re-ranks the global top-k — the bottleneck is the slowest shard, so tail latency management (hedged requests, straggler mitigation) is structural, not optional.
Index freshness is a pipeline problem: the crawler → parser → indexer pipeline has multi-hour lag for most content; a real-time index layer (for breaking news, Twitter-style freshness) sits in front of the main index and is merged at query time.
Inverted index: the structure everyone knows, the scale nobody estimates
An inverted index maps term → posting list (doc IDs + positions + scores). At web scale, a single term like 'weather' has posting lists with billions of entries. These lists are compressed with variable-byte or PForDelta encoding and stored on SSDs; decompression at query time is the CPU bottleneck. Posting lists are sorted by doc ID to enable fast intersection (AND queries) via merge-join — this is why document ID assignment (e.g., by crawl cluster) matters for compression ratios.
Scatter-gather: tail latency is the design constraint
Querying 2,000 shards in parallel means the P99 of the slowest shard determines SERP latency. Google uses hedged requests — after a timeout threshold, the query is re-sent to a second replica of the lagging shard, and whichever responds first is used. Result merging must handle variable shard response times: set a strict deadline, use whatever results have arrived, and fill the rest with cached results or lower-quality signals. This is why Google search still returns 10 results even when shards are slow.
Ranking: the system behind the data structure
BM25 / TF-IDF gives you term-level relevance; PageRank gives you authority; neither alone gives quality. Production ranking is a multi-stage pipeline: index shards do cheap BM25 scoring to return top-1000 candidates; a learning-to-rank model (a GBDT or neural ranker) rescores the merged top-k with hundreds of features (anchor text, click-through rate, freshness, user location). The ranker runs in a single service post-merge — it's too expensive to run per shard.
Real-time index: the freshness layer
The base index has a crawl-to-index lag of hours to days for most pages. For breaking news, Twitter posts, and rapidly changing content, Google maintains a real-time index (sometimes called 'Caffeine layer') that ingests fresh crawl results in minutes. At query time, results from the real-time index are merged with the base index results and ranked together. The freshness signal itself is a ranking feature — boosting recently-crawled results for time-sensitive queries without surfacing them for evergreen queries.
Query understanding: what the query actually means
Literal keyword matching is a minority of queries. Query processing includes spell correction, synonym expansion, entity recognition, and intent classification before the query hits the index. 'Best pizza near me' is a local intent query — it routes to a geo-aware index shard, not the main web index. Query rewriting (expanding 'NYC' to 'New York City', adding synonyms) happens in a pre-processing layer; this is where significant search quality gains have come from in the last decade — the index is less important than understanding what the user meant.
What breaks at scale
Index shard imbalance is the operational nightmare: a shard serving a popular term prefix gets 10× the QPS of others. Rebalancing shards requires re-assigning document ID ranges, which invalidates posting list compression and requires re-indexing — a multi-day operation. The second failure mode: ranking model updates that degrade quality for a small query tail can be invisible in aggregate metrics; you need per-query-type quality monitoring (navigational vs. informational vs. transactional) to catch regressions. Canary serving with A/B on 0.1% of traffic before full rollout is the only safe deployment strategy.
In production
Google's serving stack is built on Bigtable for posting lists, Colossus (GFS successor) for raw storage, and a custom serving binary that runs the scatter-gather query fan-out. Bing uses a similar inverted-index + scoring pipeline with Azure infrastructure. The real engineering challenge is index serving consistency under continuous crawl updates: you can't take the index offline to update it, so Google uses a tiered index (a small real-time layer + a large base layer), merging results at query time — keeping those two layers in sync without query-time latency blowup is a continuous systems challenge.
Common mistakes
- Sharding by term (hot terms, unbalanced)
- No query result caching
- Serial shard queries instead of parallel scatter-gather