This is the official repository for the paper Enhancing Utility in Differentially Private Recommendation Data Release with Exponential Mechanism currently under review.
The recommenders' training and evaluation procedures have been developed on the reproducibility framework Elliot, we suggest to refer to the official Github page and documentation.
This software has been executed on the operative system Ubuntu 20.04
.
Please have at least Python 3.9.0
installed on your system.
You can create the virtual environment with the requirements files included in the repository, as follows:
python3.8 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
At data/
, you may find all the files related to the datasets. Each dataset can be found in data/[DATASET_NAME]/data/dataset.tsv
The datasets used in the paper are Amazon Gift Card
, Facebook Books
and Yahoo! Movies
referred as
gift
, facebook_books
, and yahoo_movies
, respectively.
At config_templates/
, you may find the Elliot configuration templates used for setting the experiments.
The configuration template used for all the experiments is training.py
.
Here, we describe the steps to reproduce the results presented in the paper.
Run the data preprocessing step with the following:
python preprocessing.py
This step binarize all the datasets and splits them into train and test sets. The results will be stored in data/[DATASET_NAME]
for each dataset.
From the binarized datasets, 500 randomized versions have been generated with the following:
python randomize_split_recommend.py --dataset [DATASET_NAME]
The perturbed dataset will be stored in the directory perturbed_dataset/[DATASET_NAME]_train/0
.
For example, if you want to run the script on the Amazon Gift Card dataset
python randomize_split_recommend.py --dataset gift
Each perturbed dataset will be then split in train and validation set, which will be stored in data/[DATASET_NAME]/generated_train/0
.
Finally, the recommendation performance for each dataset will be stored in result_collection/[DATASET_NAME]_train/0/
.
We can run the selection module with the following:
python selection.py --dataset [DATASET_NAME]
where [DATASET_NAME] is the name of the dataset.
The results for each model and dataset will be stored in result_data/[DATASET_NAME]_train/0/[DATASET_NAME]_train_[MODEL_NAME]_nDCGRendle2020.tsv
.
Here we describe the steps to reproduce the baseline presented in the paper.
To reproduce the recommendation performance for the original datasets, run:
python baseline.py --dataset [DATASET_NAME]
where [DATASET_NAME] is the name of the dataset.
The result will be stored in data/[DATASET_NAME]/baseline
.
Run Subsample Exponential Mechanism with:
python subsample.py --dataset [DATASET_NAME]
where [DATASET_NAME] is the name of the dataset.
The result will be stored in results_data/[DATASET_NAME]_train/0/aggregated_results.tsv
.
To run One-Time RAPPOR, refer to Generate Datasets with Randomized Response.