Overview
These are the two dominant storage-engine designs, and the choice shapes a database's whole performance character. A B-tree keeps data in a balanced, in-place-updated tree — excellent for reads and range scans, which is why most SQL databases (Postgres, MySQL/InnoDB) use it. An LSM-tree instead buffers writes in memory, flushes them as immutable sorted files, and merges those files in the background (compaction). That turns random writes into fast sequential ones — great for write-heavy workloads (Cassandra, RocksDB, LevelDB) — but reads may check several files and compaction competes for I/O.
B-Tree vs LSM-Tree: key differences
| B-Tree | LSM-Tree | |
|---|---|---|
| Writes | In-place; random I/O | Sequential appends; high throughput |
| Reads | Fast, predictable (one tree) | May check several files (uses Bloom filters) |
| Write amplification | Lower | Higher (compaction rewrites data) |
| Space | Fragmentation over time | Compresses well; transient compaction overhead |
| Used by | Postgres, MySQL/InnoDB | Cassandra, RocksDB, LevelDB, ScyllaDB |
When to use B-Tree
Read-heavy and mixed workloads, frequent range queries, and transactional databases where predictable read latency matters most.
When to use LSM-Tree
Write-heavy workloads — time-series, event logs, high-ingest pipelines — where sustained write throughput matters more than the lowest possible read latency.
Verdict
Pick a B-tree engine (most SQL databases) for read-heavy, range-query, transactional workloads. Pick an LSM-tree engine when write volume is the bottleneck and you can tolerate compaction overhead and slightly higher read amplification. Bloom filters and good compaction tuning narrow the LSM read penalty considerably.
Common questions
Why are LSM-trees better for writes?
They convert many small random writes into large sequential disk writes (buffer in memory, flush sorted files), which disks and SSDs handle far faster, and they defer the merging work to background compaction.
Do LSM-trees have slower reads?
Potentially — a read may need to check the memtable plus several on-disk files. Bloom filters skip files that definitely lack the key, and compaction reduces file count, so in practice the gap is small for most workloads.