Comparisons

B-Tree vs LSM-Tree

B-trees update data in place and are read-optimized (the classic SQL index); LSM-trees batch writes into sorted files and compact them, optimizing for high write throughput.

2 min read·8 sections
Open the interactive version → diagrams, practice & more

Overview

These are the two dominant storage-engine designs, and the choice shapes a database's whole performance character. A B-tree keeps data in a balanced, in-place-updated tree — excellent for reads and range scans, which is why most SQL databases (Postgres, MySQL/InnoDB) use it. An LSM-tree instead buffers writes in memory, flushes them as immutable sorted files, and merges those files in the background (compaction). That turns random writes into fast sequential ones — great for write-heavy workloads (Cassandra, RocksDB, LevelDB) — but reads may check several files and compaction competes for I/O.

B-Tree vs LSM-Tree: key differences

B-TreeLSM-Tree
WritesIn-place; random I/OSequential appends; high throughput
ReadsFast, predictable (one tree)May check several files (uses Bloom filters)
Write amplificationLowerHigher (compaction rewrites data)
SpaceFragmentation over timeCompresses well; transient compaction overhead
Used byPostgres, MySQL/InnoDBCassandra, RocksDB, LevelDB, ScyllaDB

When to use B-Tree

Read-heavy and mixed workloads, frequent range queries, and transactional databases where predictable read latency matters most.

When to use LSM-Tree

Write-heavy workloads — time-series, event logs, high-ingest pipelines — where sustained write throughput matters more than the lowest possible read latency.

Verdict

Pick a B-tree engine (most SQL databases) for read-heavy, range-query, transactional workloads. Pick an LSM-tree engine when write volume is the bottleneck and you can tolerate compaction overhead and slightly higher read amplification. Bloom filters and good compaction tuning narrow the LSM read penalty considerably.

Common questions

Why are LSM-trees better for writes?

They convert many small random writes into large sequential disk writes (buffer in memory, flush sorted files), which disks and SSDs handle far faster, and they defer the merging work to background compaction.

Do LSM-trees have slower reads?

Potentially — a read may need to check the memtable plus several on-disk files. Bloom filters skip files that definitely lack the key, and compaction reduces file count, so in practice the gap is small for most workloads.

Part of Comparisons on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Browse the glossary and "X vs Y" comparisons, or build this one →