Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
-
Updated
Aug 28, 2023 - Python
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
Data Mining Algorithms
Duplicate Detection on Hoaxy Dataset
Implementing Locality Sensitive Hashing for DNA Sequences.
Code for Shingling
Finding Similar Items: Textually Similar Documents
Implementation of algorithms for big data using python, numpy, pandas.
A Java program to check Plagiarisms between multiple documents using the method of Shingling, MinHashing and Locality Sensitive Hashing.
Finding Similar Items: Textually Similar Documents
Add a description, image, and links to the shingling topic page so that developers can more easily learn about it.
To associate your repository with the shingling topic, visit your repo's landing page and select "manage topics."