Polisis_Benchmark

Reproducing state-of-the-art results

This repo is our effort to reproduce Polisis results for privacy policy classification based on their paper: https://arxiv.org/abs/1802.02561

Setup instructions

Setup a virtual environment using any tool (e.g., conda) and activate it: conda -n privacy_policy python=3.6 source activate privacy_policy
Install dependecies from the requirement file: pip install -r requirement.txt
install NLTK tokenizer: python -m nltk.downloader punkt

To run the experiment: python -u cnn_multi_label_classifier.py

Parameters can be found in args.py

Important Note: By default the code will use GloVe embeddings. Due to licesing the in-domain embeddings can be provided only upon request.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agg_data		agg_data
data		data
datasets		datasets
embeddings_data		embeddings_data
fastText-0.1.0		fastText-0.1.0
glove.6B		glove.6B
processed_data		processed_data
raw_data		raw_data
CNN Multilabel Classifier.ipynb		CNN Multilabel Classifier.ipynb
README.md		README.md
args.py		args.py
cnn.py		cnn.py
data_processing.py		data_processing.py
labels.pkl		labels.pkl
predict.py		predict.py
privacy_policies_dataset.py		privacy_policies_dataset.py
requirements.txt		requirements.txt