deduplication
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Here are 419 public repositories matching this topic...
A UI application for File Deduplication using Hashing
-
Updated
Jan 18, 2018 - Java
A workflow template for deduplication and record linkage using the Dedupe library
-
Updated
Jun 30, 2020 - Jupyter Notebook
Super simple list-based password guesser.
-
Updated
May 15, 2018 - C#
Efficient storage system for container images
-
Updated
Nov 3, 2017 - Haskell
Project for helping brother in finding duplicates in his photos directory.
-
Updated
Apr 26, 2021 - Java
a collection of image deduplication repositories
-
Updated
Jan 15, 2019 - Python
A simple tool for cataloging/deduplication/other backup preparation tasks.
-
Updated
Aug 21, 2019 - C
Big Data Analysis
-
Updated
Feb 23, 2020 - Python
A tool to enrich any OCDM compliant Knowledge Graph, finding new identifiers and deduplicating entities.
-
Updated
Apr 12, 2021 - Python
Implementation of text classification, duplicate question recognition and text deduplication in Python.
-
Updated
Jan 23, 2021 - Python
DeDuplicationKit: Advanced File Storage Deduplication
-
Updated
Mar 31, 2023 - C++
Graph QL wrapper for https://swapi.dev/ ( The best Star Wars API )
-
Updated
Dec 10, 2023 - TypeScript
System prototype for USENIX ATC 2015: "Convergent Dispersal Deduplication Datastore"
-
Updated
Apr 1, 2019 - C++
Model for data deduplication assignment.
-
Updated
Feb 21, 2018 - Python
A command line application that finds duplicate files and removes them. Duplicate files can also be replaced with symbolic links or hard links.
-
Updated
Jan 24, 2023 - C++
python script to analyze dedup usage in btrfs
-
Updated
Sep 5, 2019 - Python
Removes repeating pages with same page number in PDFs prepared for presentation purposes.
-
Updated
Dec 16, 2021 - Python
A calculator for storage and transmission of deduplicated data presentation in charts and tables
-
Updated
Sep 26, 2023
Created by Halbert L. Dunn
Released 1946
- Followers
- 38 followers
- Organization
- entity-resolution
- Wikipedia
- Wikipedia