Fair Entity Matching

A fairness suite for auditing Entity Matching approaches

Companion repository for the paper "Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching".

Publication(s) to cite:

[1] Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, and Divesh Srivastava. "Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching." Proceedings of the VLDB Endowment 16, no. 11 (2023): 3279-3292.

[VLDB Publication] https://dl.acm.org/doi/abs/10.14778/3611479.3611525
Technical Report
VLDB Slides

⚠️ Notice:

For easier reproducibility of the experiment results from the paper please see this repository.

Installation

Clone the repo
Create a virtual environment using e.g., venv or Conda
Install any missing packages using e.g., pip or Conda
- main packages are fairly standard (e.g., Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Py-EntityMatching)

Usage

Familiarize yourself with an example:

fairEM/run_example.py can be used to use our framework to look into the fair behavior of the models on NoFlyCompas dataset.

Data for reproducing the results:

Train/Test/Valid/TableA/TableB data for all the datasets in the accepted format by each: Link
- Please note that you do not need these data to reproduce the results. Thes data are only used if interested user wants to (re-)train the used (or any other entitymatching) models in this study.
Model Predictions: Link
- Please note that you need this data in order to recreate the results of our study. It includes the predictions of the 13 matchers for 8 datasets. The test sets are also included in the provided link. Test sets are placed in the Deepmatcher folder for each dataset.

Reproducing the results:

By putting the provided predictions and test data in the specified locations in fairEM/experiments.py file, the results (plots) of the study can be generated.
fairEM/threshold_experiments.py can be used to regenerate the heatmaps regarding the effect of matching threshold on the fairness and accuracy of the models.
fairEM/case_study_analysis.py can be used to look into the model's behavior on specific cases such as TPs, FPs, FNs and TNs.

Non-neural matchers:

Examples regarding rule-specifications for rule-based matcher and the settings used for non-neural matchers are brought in utils/entitymatching examples directory.

Synthetic data generator:

In synthetic data generator/FacultyMatch and synthetic data generator/NoFlyCompas paths, the scripts that can be used to generate synthetic socail data for entity matching are provided. Users can employ these scripts to create such datasets with a variety of settings such as limiting the rate on non-matches in the output (i.e. manual blocking), change the number and type of perturbations and etc.

Notice

This project is still under development, so please beware of potential bugs, issues etc. Use at your own responsibility in practice.

Contact

Feel free to contact the authors or leave an issue in case of any complications. We will try to respond as soon as possible.

License

This project is licensed under the MIT License — see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
fairEM		fairEM
synthetic dataset generator		synthetic dataset generator
utils		utils
.gitignore		.gitignore
FEM-Slides-VLDB23.pdf		FEM-Slides-VLDB23.pdf
LICENSE.md		LICENSE.md
README.md		README.md
techrep.pdf		techrep.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fair Entity Matching

A fairness suite for auditing Entity Matching approaches

Publication(s) to cite:

⚠️ Notice:

Installation

Usage

Familiarize yourself with an example:

Data for reproducing the results:

Reproducing the results:

Non-neural matchers:

Synthetic data generator:

Notice

Contact

License

About

Releases

Packages

Contributors 2

Languages

License

UIC-InDeXLab/fair_entity_matching

Folders and files

Latest commit

History

Repository files navigation

Fair Entity Matching

A fairness suite for auditing Entity Matching approaches

Publication(s) to cite:

⚠️ Notice:

Installation

Usage

Familiarize yourself with an example:

Data for reproducing the results:

Reproducing the results:

Non-neural matchers:

Synthetic data generator:

Notice

Contact

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages