Gmail / Email
Store, search, send and receive email at billions-of-mailboxes scale.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Send/receive (SMTP)
- Mailbox storage
- Search
- Spam filtering
- Labels/threads
Non-functional
- Durable
- Fast search
- Reliable delivery
Scale
Billions of mailboxes
The approach
Sharded mailbox storage (by user); inbound via SMTP → spam/virus pipeline → mailbox; outbound queued with retries; every message indexed for search; threads via references.
Key components
SMTP in → spam pipeline → mailbox store (sharded) + search index · outbound queue
Numbers that matter
- Gmail stores over 15 billion mailboxes; each user's free tier is 15 GB shared across Drive/Gmail, requiring per-user quota accounting on every write.
- Inbound SMTP pipeline adds roughly 100-300ms of processing latency for spam classification before delivery; async classification after pre-delivery is used for borderline cases.
- Thread detection uses References and In-Reply-To headers; collisions from broken mail clients affect roughly 1-2% of threads and require heuristic subject-line fallback grouping.
- Gmail's search index ingests message bodies with custom tokenizers (email addresses, phone numbers, quote stripping) and targets <1 second search latency for a mailbox of millions of messages.
Senior deep-dive
Mailbox sharding by user is the foundation — all of a user's mail lives on one shard set so thread assembly, search, and IMAP are purely local operations, no cross-shard joins needed.
The inbound pipeline (SMTP → spam/virus → delivery) is a multi-stage async fan — each stage can reject, quarantine, or transform the message; designing it as a pipeline means stages can be scaled and updated independently without affecting delivery throughput.
Search is powered by a private inverted index per user, not a shared cluster — this gives per-user isolation (no search leakage) and lets Gmail offer instant results because the index is co-located with the mailbox.
Sharding strategy: user is the partition key
All of a user's messages, labels, and thread metadata are co-located on one shard. This makes thread assembly, label filtering, and IMAP folder traversal local — no distributed joins. The risk is a hot shard if a single user has a multi-GB mailbox; Gmail handles this with per-user data routing that can migrate heavy mailboxes to under-loaded shard groups.
Threading: harder than it looks
Gmail threads on References/In-Reply-To headers first, falling back to normalized subject for replies from broken clients. The thread ID is assigned at first-message ingestion; subsequent messages are linked by header lookup in the thread index. Mailing lists break threading because they munge headers — Gmail has special-cased dozens of list server behaviors. A thread can span thousands of messages (legal hold mailboxes, mailing list subscriptions), so thread-level operations need pagination.
Spam pipeline: the real-time ML challenge
Every inbound message runs through a multi-stage classifier cascade: IP/domain reputation (fastest, cheapest), heuristic rules, then ML models scoring content and sender history. Sender reputation is the highest-signal feature — a first-time sender from a new domain is far more likely to be spam. The pipeline must complete in <500ms before the SMTP connection times out, forcing a tiered approach where uncertain messages are delivered optimistically and reclassified async.
Labels as the data model: not folders
Gmail stores messages once and attaches a set of label IDs per user-message pair — a message can have INBOX, STARRED, and a custom label simultaneously. This is a many-to-many relationship stored in an index table, not copies in folders. IMAP projection maps labels to folder paths, but mutations via IMAP (move = remove INBOX label, add label) must be translated, and concurrent IMAP + web edits require careful conflict resolution.
Search: per-user inverted index
Each mailbox has its own inverted index co-located with mailbox data — query results are not mixed across users. The index strips quoted text, expands abbreviations, and normalizes email addresses. The hard part is index freshness: a message delivered seconds ago must be searchable immediately, so the index is updated synchronously at delivery time, adding latency to the ingest path.
What breaks at scale
Attachment storms — a user who receives thousands of emails with large attachments can saturate their shard's disk I/O quota, degrading neighbors on the same physical host. IMAP clients doing bulk operations (mass-delete, folder sync on a 1M-message mailbox) generate write amplification that can overwhelm the label index. Spam model drift during major world events (elections, pandemics) means freshly trained models must be hot-swapped mid-stream without a delivery outage.
In production
Google built Colossus (successor to GFS) for mailbox storage and a Bigtable-backed metadata layer to track message flags, labels, and thread membership. The hardest operational problem is spam classification at ingestion — Gmail runs both heuristic rules and ML models (including sender reputation, IP reputation, and content signals) in under a second per message, and false positives (legitimate mail in spam) are far more damaging to user trust than false negatives. IMAP compatibility is a perpetual tax: Gmail's label model must be projected onto IMAP folders, and clients that do bulk-delete via IMAP cause enormous write amplification on the label index.
Common mistakes
- No per-user sharding
- Treating delivery as fire-and-forget
- Linear scan instead of an index for search