Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
-
Updated
Aug 28, 2023 - Python
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
Implementation of algorithms for big data using python, numpy, pandas.
Code for Shingling
Implementing Locality Sensitive Hashing for DNA Sequences.
Add a description, image, and links to the shingling topic page so that developers can more easily learn about it.
To associate your repository with the shingling topic, visit your repo's landing page and select "manage topics."