Academy · Databases & Replication

Indexing

Finding one row in a billion by scanning them all is hopelessly slow.

Open the interactive version → diagrams, practice & more

The problem

Finding one row in a billion by scanning them all is hopelessly slow.

The idea

An index is a sorted data structure that lets the DB jump straight to matching rows.

How it works

A B-tree keeps keys sorted so the planner can seek a row or walk a range in O(log n). Composite indexes obey a left-prefix rule — an index on (a,b) serves filters on a and a+b, never b alone. A covering index includes every column a query touches so the engine answers from the index without visiting the heap. LSM-tree stores (Cassandra, RocksDB) invert the bargain: writes are cheap appends to a memtable, but a read may probe several SSTables, gated by a bloom filter.

The tradeoff

Every index is a second write on each insert/update plus storage, so over-indexing throttles writes and bloats the table. Low-selectivity columns (a boolean, a 3-value status) barely help — the planner skips them for a scan. Index for your real top queries and confirm with EXPLAIN, never by guessing.

In the wild

A missing or unusable index is the #1 cause of a "suddenly slow" query as a table grows past memory.

Interview deep dive

Flow

  1. Read the slow query's EXPLAIN ANALYZE plan.
  2. Add a composite or covering index ordered by selectivity.
  3. Confirm the planner switched to an index-only scan.
  4. Drop any index no query uses; watch write bloat.

Watch for

Interviewer trap

Name the index type and the exact query it serves, then say which writes it slows.

Related Academy

Part of Academy on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →