System Design Library

Multiplayer Game Server

Authoritative realtime state for fast-paced multiplayer games.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Player actions
  • Authoritative game state
  • State sync to clients
  • Matchmaking

Non-functional

  • Very low latency (<50ms)
  • Cheat-resistant
  • Tick-rate consistency

Scale

Many concurrent match instances

The approach

Authoritative server simulates the game at a fixed tick; clients send inputs (UDP), server resolves and broadcasts state deltas; client-side prediction + reconciliation hides latency; matchmaking spins up per-match instances.

Key components

Matchmaker → per-match authoritative server (UDP) · state-sync

Numbers that matter

Senior deep-dive

The authoritative server is the source of truth — clients are just input terminals. Send inputs, not intended state; the server resolves physics and broadcasts deltas.

Client-side prediction hides latency but creates divergence — every client reruns the last N inputs when a correction arrives (reconciliation).

UDP is mandatory for inputs — a stale movement packet is worthless; only a fresh one matters. Reliability lives at the application layer for the things that count (pickups, kills).

Inputs not positions: the canonical server model

Never trust client position. Clients send discrete input events (keys pressed, mouse delta, timestamp); the authoritative server applies them to its simulation and broadcasts world-state deltas. Any client that sends its own position is trivially exploitable. Anti-cheat hooks (VAC, EAC) instrument the client process precisely because the server cannot validate physics on its own.

Client-side prediction and reconciliation

To hide 50–150ms RTT, the client speculatively simulates its own inputs immediately, displaying a predicted state. When the server's authoritative tick arrives, the client checks for divergence and replays buffered inputs on top of the corrected state. The harder problem is other players: their positions are interpolated between the last two received snapshots, introducing deliberate visual lag to smooth motion.

UDP and custom reliability

TCP's head-of-line blocking is fatal for game inputs — a dropped movement packet stalls the stream. UDP with a thin reliability layer (sequence numbers + selective ACK) lets the engine decide per-message-type: positions are fire-and-forget (old ones are useless), while critical events (kill confirm, item pickup) get app-level retransmit. QUIC is gaining traction as a middle ground.

Lag compensation: rewinding history to be fair

A 100ms-latency player fires at an enemy who has already moved server-side. Server-side rewind stores a ring buffer of the last ~500ms of entity positions and rolls back to the shooter's clock before hit-testing. This makes kills feel responsive but can surprise the receiver of a bullet — they die from a position they've already left. This tradeoff is deliberately tuned per game genre.

Matchmaking, lobbies and server provisioning

Matchmaking is a multi-constraint optimization (ping, skill bracket, queue time) solved via flex pools that widen constraints after a timeout. Once a match is formed, a dedicated server process (often a container or a VM) is spun up in the nearest region — Agones on Kubernetes is common for this. Session stickiness is critical: all players in a match must hit the same process; the matchmaker hands out a direct IP:port.

What breaks at scale

Hot-shard problem: one viral streamer's server is one process on one box — you cannot horizontally scale a single game world. Mitigation is interest management (partition world into zones, run each on a separate server, stitch at boundaries) but cross-zone interaction (a bullet crossing a zone line) requires tight coordination. Clock skew across game server fleet nodes corrupts replay files and anti-cheat forensics — servers must be NTP-synced to <1ms and use monotonic clocks for game time.

In production

Valve's Source engine pioneered the lag-compensation model still used by CS2 and TF2: the server rewinds world state to the shooter's clock before resolving hits, so a player with 80ms ping never feels penalized. Unreal Engine's NetDriver uses property replication — only changed actor fields are sent each tick, not full snapshots. The real operational challenge is hot-spot machines: a single popular game world saturates one box, so Riot (Valorant) segments maps into distinct server processes and routes players by region-tier to equalize load. Epic's large-scale battle royales use zone-of-interest culling so each client only receives updates for entities within visual range — without this, a 100-player lobby would broadcast O(n²) updates.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →