B-Tree vs LSM-Tree — Key Differences & When to Use Each

Overview

These are the two dominant storage-engine designs, and the choice shapes a database's whole performance character. A B-tree keeps data in a balanced, in-place-updated tree — excellent for reads and range scans, which is why most SQL databases (Postgres, MySQL/InnoDB) use it. An LSM-tree instead buffers writes in memory, flushes them as immutable sorted files, and merges those files in the background (compaction). That turns random writes into fast sequential ones — great for write-heavy workloads (Cassandra, RocksDB, LevelDB) — but reads may check several files and compaction competes for I/O.

B-Tree vs LSM-Tree: key differences

	B-Tree	LSM-Tree
Writes	In-place; random I/O	Sequential appends; high throughput
Reads	Fast, predictable (one tree)	May check several files (uses Bloom filters)
Write amplification	Lower	Higher (compaction rewrites data)
Space	Fragmentation over time	Compresses well; transient compaction overhead
Used by	Postgres, MySQL/InnoDB	Cassandra, RocksDB, LevelDB, ScyllaDB

When to use B-Tree

Read-heavy and mixed workloads, frequent range queries, and transactional databases where predictable read latency matters most.

When to use LSM-Tree

Write-heavy workloads — time-series, event logs, high-ingest pipelines — where sustained write throughput matters more than the lowest possible read latency.

Verdict

Pick a B-tree engine (most SQL databases) for read-heavy, range-query, transactional workloads. Pick an LSM-tree engine when write volume is the bottleneck and you can tolerate compaction overhead and slightly higher read amplification. Bloom filters and good compaction tuning narrow the LSM read penalty considerably.

Common questions

Why are LSM-trees better for writes?

They convert many small random writes into large sequential disk writes (buffer in memory, flush sorted files), which disks and SSDs handle far faster, and they defer the merging work to background compaction.

Do LSM-trees have slower reads?

Potentially — a read may need to check the memtable plus several on-disk files. Bloom filters skip files that definitely lack the key, and compaction reduces file count, so in practice the gap is small for most workloads.

Part of Comparisons on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Browse the glossary and "X vs Y" comparisons, or build this one →