Glossary

SimHash

A hashing technique where similar documents get similar hashes.

1 min read·4 sections
Open the interactive version → diagrams, practice & more

Definition

A hashing technique where similar documents get similar hashes.

How it works

Lets a crawler detect near-duplicate pages (not just exact matches) cheaply.

Common questions

What is SimHash?

A hashing technique where similar documents get similar hashes.

How does SimHash work?

Lets a crawler detect near-duplicate pages (not just exact matches) cheaply.

What is SimHash used for in system design?

Lets a crawler detect near-duplicate pages (not just exact matches) cheaply.

Part of Glossary on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Build this one →