Co-occurrence computation for SNLI

This repository contains the code for the 2017 paper "Social Bias in Elicited Natural Language Inferences" by Rachel Rudinger, Chandler May, and Benjamin Van Durme. Rachel Rudinger and Chandler May contributed to this code, which is released under the two-clause BSD license.

Prerequisites

Install dependencies with:

pip install -r requirements.txt

Download and unzip the SNLI data:

wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip

Computing counts

Compute counts for unigrams and bigrams, across all inference types, using 7 subprocesses (in addition to the main process), and filtering out hypothesis words that occur in the premise. Read SNLI pairs from snli_1.0/snli_1.0_train.jsonl and write counts to snli_stats/counts.pkl:

python snli_cooccur.py \
    between-prem-hypo \
    --max-ngram 2 \
    --num-proc 7 \
    --filter-hypo-by-prem \
    snli_1.0/snli_1.0_train.jsonl snli_stats/counts.pkl

Run python snli_cooccur.py --help for more options.

Looping over all configurations

Alternatively, compute counts for all parameter configurations, in a loop:

bash snli_cooccur_loop.bash

To change the default input and output directories, or change the Python interpreter used to run snli_cooccur.py, create a file named snli_cooccur_loop_include.bash with the following contents and modify them as desired (and then run snli_cooccur_loop.bash):

snli_dir=snli_1.0
output_dir=snli_stats
big_python=python
little_python=python

The little_python and big_python variables are the Python commands used for unigram and unigram-and-bigram models, respectively. The latter have higher memory requirements. (Note the little_python and big_python variables can be set to job submission scripts invoking a python interpreter to parallelize the computation on a grid.)

Querying co-occurrences from counts

Query top-five co-occurrence lists, ranked by PMI, filtering candidates to unigrams (filtering out bigrams), and filtering out co-occurrence candidates with count less than five. Run queries from the YAML specification in top-y.yaml, using counts from snli_stats/counts.pkl, and write output to snli_stats/pmi.txt:

python snli_query.py \
    -k 5 \
    --filter-to-unigrams \
    --top-y-score-func pmi \
    --min-count 5 \
    snli_stats/counts.pkl top-y top-y.yaml snli_stats/pmi.txt

Run python snli_query.py --help for more options.

Looping over all configurations

Alternatively, run queries for all parameter configurations, in a loop:

bash snli_query_loop.bash

To change the default input paths and output directory, or change the Python interpreter used to run snli_query.py or other settings, create a file named snli_query_loop_include.bash with the following contents and modify them as desired (and then run snli_query_loop.bash):

min_count=5
output_dir=snli_stats
python=python
extra_args='-k 5 --filter-to-unigrams --top-y-score-func pmi'
query_type=top-y
query_path=top-y.yaml
output_ext=.txt
input_dir=snli_stats
input_paths=`find "$input_dir" -type f -name '*.pkl'`

Errata

In the definition of the likelihood ratio Λ(C') in the paper (last equation on the second page, or page 75 in the proceedings), the summations should be products. The code and results use the correct definition.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
grep_repl.bash		grep_repl.bash
identity-concept.yaml		identity-concept.yaml
integration-tests.bash		integration-tests.bash
requirements.txt		requirements.txt
score.yaml		score.yaml
snli_cooccur.py		snli_cooccur.py
snli_cooccur_loop.bash		snli_cooccur_loop.bash
snli_grep.py		snli_grep.py
snli_query.py		snli_query.py
snli_query_loop.bash		snli_query_loop.bash
test-requirements.txt		test-requirements.txt
top-y-csv-to-word-cloud.py		top-y-csv-to-word-cloud.py
top-y.yaml		top-y.yaml
tox.ini		tox.ini
validate-yaml.py		validate-yaml.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Co-occurrence computation for SNLI

Prerequisites

Computing counts

Looping over all configurations

Querying co-occurrences from counts

Looping over all configurations

Errata

About

Releases

Packages

Languages

License

ccmaymay/snli-ethics

Folders and files

Latest commit

History

Repository files navigation

Co-occurrence computation for SNLI

Prerequisites

Computing counts

Looping over all configurations

Querying co-occurrences from counts

Looping over all configurations

Errata

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages