Indexing
Finding one row in a billion by scanning them all is hopelessly slow.
Open the interactive version → diagrams, practice & moreThe problem
Finding one row in a billion by scanning them all is hopelessly slow.
The idea
An index is a sorted data structure that lets the DB jump straight to matching rows.
How it works
A B-tree keeps keys sorted so the planner can seek a row or walk a range in O(log n). Composite indexes obey a left-prefix rule — an index on (a,b) serves filters on a and a+b, never b alone. A covering index includes every column a query touches so the engine answers from the index without visiting the heap. LSM-tree stores (Cassandra, RocksDB) invert the bargain: writes are cheap appends to a memtable, but a read may probe several SSTables, gated by a bloom filter.
The tradeoff
Every index is a second write on each insert/update plus storage, so over-indexing throttles writes and bloats the table. Low-selectivity columns (a boolean, a 3-value status) barely help — the planner skips them for a scan. Index for your real top queries and confirm with EXPLAIN, never by guessing.
In the wild
A missing or unusable index is the #1 cause of a "suddenly slow" query as a table grows past memory.
Interview deep dive
Flow
- Read the slow query's EXPLAIN ANALYZE plan.
- Add a composite or covering index ordered by selectivity.
- Confirm the planner switched to an index-only scan.
- Drop any index no query uses; watch write bloat.
Watch for
- Left-prefix rule: an index on (a,b) won't serve a filter on b alone.
- Random-key inserts cause B-tree page splits and write amplification.
- A function on the column (lower(email)) skips the plain index.
Interviewer trap
Name the index type and the exact query it serves, then say which writes it slows.