System Design Library

IoT Telemetry Ingestion

Ingest sensor data from millions of devices reliably and store it cheaply.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Device connect (MQTT)
  • Ingest telemetry
  • Store time-series
  • Commands to devices

Non-functional

  • Massive write volume
  • Lossy-OK for some metrics
  • Device auth

Scale

Millions of devices, high frequency

The approach

Devices connect via MQTT to a broker; telemetry buffered (Kafka) and written to a time-series store; downsampling for retention; a command channel pushes back to devices; per-device auth.

Key components

Devices ⇄ MQTT broker → Kafka → TSDB · command channel

Numbers that matter

Senior deep-dive

Ingestion is a fan-in funnel: millions of devices converge on a fleet of brokers; the back-pressure and protocol choice (MQTT over TCP, not HTTP) are dictated by device constraints and connection economics.

The time-series store and the command channel are separate concerns — reads are aggregations over time, writes are per-device streams; the command path (cloud → device) must survive the device being offline.

Downsampling is not optional at retention scale: storing raw 1-second samples from 1M devices indefinitely is economically infeasible — you must define a tiered retention policy (raw 24h → 1-min rollups 30d → 1-hour rollups forever) from day one.

Protocol choice: MQTT over persistent TCP

IoT devices use MQTT (Message Queuing Telemetry Transport) rather than HTTP because: (1) persistent TCP connection amortizes TLS handshake across thousands of messages, (2) the protocol overhead per message is 2–5 bytes vs HTTP's ~200-byte headers, (3) MQTT QoS levels (0=fire-and-forget, 1=at-least-once, 2=exactly-once) let devices trade delivery guarantee for battery/bandwidth. MQTT brokers (Mosquitto, EMQ X, AWS IoT Core) handle the fan-in of millions of connections and route messages to backend systems.

Ingestion pipeline: broker → Kafka → TSDB

The MQTT broker is stateless for data — it forwards messages to Kafka topics (partitioned by device_id for ordering within a device's stream). Kafka provides the durability buffer and decouples the broker from downstream consumers. Stream processors (Flink, Spark Streaming) consume from Kafka for real-time alerting; a separate batch consumer writes to the TSDB in bulk. Writing to the TSDB in individual-row inserts is fatal to performance — always use bulk batch writes (InfluxDB line protocol, Timestream batch API) to amortize write overhead.

Device shadow: offline command delivery

You cannot guarantee a device is online when you want to send it a command. The device shadow pattern decouples command time from execution time: you write the desired state ("firmware version = 2.3") to a cloud-side shadow store (DynamoDB, Redis). When the device connects, it reads the shadow, compares to its reported state, and executes any delta. The device then writes its reported state back, and the shadow stores both — "desired" and "reported" — so operators can see at a glance which devices have diverged from their target configuration.

Time-series storage and downsampling

Raw samples cannot be retained indefinitely. Define a tiered retention policy upfront: raw samples (1s resolution) for 24–48h, 1-minute rollups for 30 days, 1-hour rollups for 1 year, 1-day rollups forever. Rollup jobs run continuously, reading raw data and writing pre-aggregated summaries (min, max, mean, p95 per window). The Gorilla compression algorithm (delta-of-delta for timestamps, XOR for float values) achieves ~1–2 bytes/sample in hot storage. Old raw data is expired via a TTL policy — never DELETE rows individually, let the storage engine expire whole time-range blocks.

Device authentication and certificate management

Each device authenticates to the broker via mutual TLS — the device presents a client certificate signed by a root CA known to the broker. At 10M devices, issuing and rotating certificates is a PKI pipeline problem: a certificate rotation (e.g., after a CA compromise) requires pushing new certs to all devices over the command channel — if devices are offline, the rotation lags indefinitely. Device groups and fleet policies (AWS IoT Greengrass, Azure DPS) automate provisioning new devices at manufacturing time and revocation of compromised ones.

What breaks at scale

Reconnect storms are the canonical IoT failure: if the broker restarts (or a load balancer blips), millions of devices retry simultaneously with exponential backoff — but if the backoff jitter is insufficient, the retry waves remain correlated and overwhelm the broker repeatedly. The fix is full jitter (uniform random backoff over the full window, not additive). Kafka consumer lag grows when a downstream TSDB is slow — the Kafka topic acts as the backpressure buffer, but if lag grows faster than consumers drain it, you either increase consumer parallelism or accept that real-time alerts become delayed. The hardest long-term failure: device firmware diversity — 10-year-old devices running ancient firmware that speaks deprecated MQTT 3.1 (not 3.1.1 or 5.0) block protocol upgrades and force the broker to maintain backward compatibility indefinitely.

In production

AWS IoT Core, Azure IoT Hub, and Google Cloud IoT Core all use MQTT brokers as the device-facing front-end, backed by Kafka-equivalent message buses (AWS Kinesis, Azure Event Hubs) for durability and fan-out. Downstream, telemetry lands in time-series databases (InfluxDB, TimescaleDB, or managed services like AWS Timestream). The real challenge is device identity and auth at scale: each device must authenticate (usually X.509 mutual TLS or device-specific JWT) and have its identity provisioned before first connection — at 10M devices, the provisioning and certificate rotation pipeline is as complex as the data ingestion pipeline. Offline command delivery (sending a config update to a device that's currently offline) is solved with a device shadow / desired-state store: you write the desired state to the cloud and the device reads it on next connect.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →