Academy · Databases & Replication

Indexing

Finding one row in a billion by scanning them all is hopelessly slow.

Open the interactive version → diagrams, practice & more

The problem

Finding one row in a billion by scanning them all is hopelessly slow.

The idea

An index is a sorted data structure that lets the DB jump straight to matching rows.

How it works

A B-tree keeps keys sorted so the planner can seek a row or walk a range in O(log n). Composite indexes obey a left-prefix rule — an index on (a,b) serves filters on a and a+b, never b alone. A covering index includes every column a query touches so the engine answers from the index without visiting the heap. LSM-tree stores (Cassandra, RocksDB) invert the bargain: writes are cheap appends to a memtable, but a read may probe several SSTables, gated by a bloom filter.

The tradeoff

Every index is a second write on each insert/update plus storage, so over-indexing throttles writes and bloats the table. Low-selectivity columns (a boolean, a 3-value status) barely help — the planner skips them for a scan. Index for your real top queries and confirm with EXPLAIN, never by guessing.

In the wild

A missing or unusable index is the #1 cause of a "suddenly slow" query as a table grows past memory.

Interview deep dive

Flow

Read the slow query's EXPLAIN ANALYZE plan.
Add a composite or covering index ordered by selectivity.
Confirm the planner switched to an index-only scan.
Drop any index no query uses; watch write bloat.

Watch for

Left-prefix rule: an index on (a,b) won't serve a filter on b alone.
Random-key inserts cause B-tree page splits and write amplification.
A function on the column (lower(email)) skips the plain index.

Interviewer trap

Name the index type and the exact query it serves, then say which writes it slows.

Related Academy

Part of Academy on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →