Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
-
Updated
Aug 28, 2023 - Python
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
Implementation of algorithms for big data using python, numpy, pandas.
A Java program to check Plagiarisms between multiple documents using the method of Shingling, MinHashing and Locality Sensitive Hashing.
Code for Shingling
Implementing Locality Sensitive Hashing for DNA Sequences.
Finding Similar Items: Textually Similar Documents
Data Mining Algorithms
Duplicate Detection on Hoaxy Dataset
Finding Similar Items: Textually Similar Documents
Add a description, image, and links to the shingling topic page so that developers can more easily learn about it.
To associate your repository with the shingling topic, visit your repo's landing page and select "manage topics."