Google Ads / Ad Serving
Run a realtime auction to pick the best ad for each impression in <100ms.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Targeting/eligibility
- Realtime auction (bid)
- Budget pacing
- Click/impression tracking
Non-functional
- <100ms
- Budget never overspent
- High QPS
Scale
Millions of auctions/sec
The approach
Per-impression: retrieve eligible ads (targeting index), predict CTR (ML), run an auction (e.g. second-price) under budget/pacing constraints, serve, then track clicks (see ad aggregator). Budgets tracked with fast counters.
Key components
Ad request → eligibility/targeting → CTR model → auction → serve · budget/pacing · tracking
Numbers that matter
- Google serves ~8.5 billion ad impressions per day across Search and Display; each impression triggers a real-time auction completing in <100ms end-to-end.
- A CTR prediction model inference on a feature vector of ~1,000 sparse features takes <1ms on a CPU with a quantized model; the model is typically retrained daily or more frequently using prior-day click logs.
- Advertiser budgets are paced using a throttle rate updated every ~5–10 minutes; budget is allocated as daily_budget / expected_daily_impressions — rate adjusts as actual traffic deviates from forecast.
- Google's auction selects among ~5–50 eligible ads per impression in Search; Display can have 100–500 candidates after targeting filtering from a universe of millions of active creatives.
Senior deep-dive
The entire system is a latency-budget allocation problem: 100ms total, split between targeting (~10ms), CTR prediction (~20ms), auction (~5ms), and serving/logging (~15ms) — every component must know its budget and shed work rather than exceed it.
Second-price auction (Vickrey) aligns advertiser incentives but requires accurate CTR prediction — the effective CPM is bid × predicted_CTR, so a bad model wastes impressions and undermines the auction's economic efficiency.
Budget pacing is harder than the auction: without it, a $10K daily budget is exhausted in the first hour of high traffic, and the advertiser gets no exposure for the rest of the day; throttled admission to the auction (token-bucket per campaign) is the mechanism.
Targeting: narrowing millions of ads to dozens
At impression time, the targeting index filters the full ad universe (millions of creatives) down to eligible candidates using inverted indexes on demographics, keywords, audiences, and placements. This must complete in ~10ms and is essentially an AND/OR query over precomputed audience segments. Keyword targeting (Search) uses an exact/broad/phrase match hierarchy evaluated against the query; audience targeting (Display) uses a user-segment bitmap — the user's segment membership is looked up and intersected with campaign targeting criteria.
CTR prediction: the economic core
The effective bid is bid_price × predicted_CTR (eCPM). A model that predicts CTR 10% better improves auction efficiency by the same amount — it is the most economically leveraged ML problem in the company. Features include query/ad relevance (semantic similarity), user context (recency, category affinity), ad quality signals (historical CTR for this creative), and contextual signals (device, time, location). Models are served from an in-process feature lookup + quantized neural net to hit <1ms inference latency.
Auction mechanics: second-price and quality score
Google's auction is not pure second-price: the Ad Rank formula is bid × Quality Score (QS), where QS incorporates expected CTR, ad relevance, and landing page quality. The winner pays the minimum bid needed to maintain their Ad Rank above the next competitor — a modified Vickrey-Clarke-Groves mechanism. This means an advertiser with a 2x better QS can outrank a competitor bidding 2x more, which aligns incentives toward relevance. The reserve price (minimum Ad Rank) prevents extremely low-quality ads from winning even with high bids.
Budget pacing and throttling
Without pacing, a campaign with a $10K/day budget on a high-traffic morning could spend it all by 10am. The pacing service maintains a per-campaign token bucket refilled at a rate computed from daily_budget / forecasted_daily_impressions. Before entering an auction, the campaign checks if it has budget tokens — if not, it is excluded from that auction. The pacing rate is adjusted every 5–10 minutes using a feedback controller comparing actual spend rate to target. Over-pacing (spending too fast) tightens the rate; under-pacing loosens it.
Click tracking, attribution, and fraud
A click on an ad goes through a redirect URL that logs the click event (timestamp, user, ad, placement) before forwarding to the advertiser's landing page. The click log feeds the billing pipeline and the CTR ground-truth labels for model retraining. Invalid click detection (bots, click farms) runs as a streaming filter over the click log, classifying clicks using device fingerprint, IP reputation, and behavioral signals; invalid clicks are refunded. This is a cat-and-mouse arms race — fraud patterns evolve faster than detection rules, so ML classifiers are retrained frequently.
What breaks at scale
Budget exhaustion spikes: a breaking news event causes a massive traffic surge to news sites. Campaigns targeting those placements exhaust daily budgets in minutes — the pacing service must react within seconds. If the pacing update frequency is too slow (10-minute intervals), campaigns over-deliver significantly, requiring refunds to advertisers. The second failure mode is auction poisoning via bid manipulation: if an advertiser can probe the auction (by varying bids and observing clearing prices), they can reverse-engineer competitors' bids. Second-price with randomized reserve prices and limited auction transparency (not revealing exact clearing prices) mitigates this.
In production
Google's Smart Bidding and Meta's Advantage+ both use deep learning (transformers over user/ad/context embeddings) for CTR prediction, replacing the logistic regression that dominated ad systems through 2015. The real challenge is feature freshness: a user who just searched for "flights to Tokyo" is extremely valuable to airline advertisers for the next 10 minutes — the feature store must propagate that signal with <1 minute latency into the serving path. Criteo and The Trade Desk use retargeting pixel data to build user segments updated in near-real-time via Kafka pipelines into Redis feature stores. Budget pacing at Meta is discussed publicly: they use a global pacing service that maintains a per-campaign token bucket, updated by a feedback loop comparing actual spend rate to target — without this, a viral moment causes a campaign to overspend its daily budget in minutes.
Common mistakes
- Slow targeting retrieval (blows latency budget)
- Overspending budget (no distributed pacing)
- Billing on un-deduped clicks