Definition
A hashing technique where similar documents get similar hashes.
How it works
Lets a crawler detect near-duplicate pages (not just exact matches) cheaply.
Common questions
What is SimHash?
A hashing technique where similar documents get similar hashes.
How does SimHash work?
Lets a crawler detect near-duplicate pages (not just exact matches) cheaply.
What is SimHash used for in system design?
Lets a crawler detect near-duplicate pages (not just exact matches) cheaply.
Part of Glossary on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Build this one →