CausalLLMs

Collection of experiments and utilities for testing causal reasoning in LLMs.

Perplexity calculator

Utility to quickly compute the perplexity score of a given sentence or list of sentences. Mean perplexity is printed in the terminal and all results are saved in a json file as a collection of {sentence: "text", perplexity: score}

Usage

If there is only one sentence, it can be directly passed as an argument with -t or --text:

python perplexity_calculator.py -t "This is my sentence"

A list of sentences saved in a .txt file (one sentence per line) can be processed in parallel with the -f or --file argument:

python perplexity_calculator.py -f ./path/to/file.txt

Additonal parameters are:

--model_id: must be a causalLLM, default gpt2
--out-file: desired location of the json file, default ./out/perplexities/TIMESTAMP.json

Examples

python perplexity_calculator.py -t "To be or not to be?"
>> Mean perplexity: 24.73

python perplexity_calculator.py -t "What is the square root of 343?"
>> Mean perplexity: 54.31

python perplexity_calculator.py -t "What is the 2nd root of 343?"
>> Mean perplexity: 135.41

python perplexity_calculator.py -t "lorem ipsum Wanna Bonjour cane?"
>> Mean perplexity: 438.12

Pile Word Frequency Counting

Utility to count the occurences of a given list of strings in the Pile corpus. This is a streaming script, so doesn't store the entire Pile corpus, nevertheless you will need at least 20GB of disk space.

Usage

Save the sentences you want to look for in the file input.txt in the data directory and then run the following shell script:

./run_pile_counting.sh

This script calls count_partial.py on multiple JSON, outputting to script_output/ and then merges the outputs in a single file by calling merge_json_counts.py

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
data		data
logs		logs
out		out
roscoe_exp		roscoe_exp
testing		testing
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
count_partial_pile.py		count_partial_pile.py
merge_json_counts.py		merge_json_counts.py
merged_counts.json		merged_counts.json
perplexity_calculator.py		perplexity_calculator.py
requirements.txt		requirements.txt
roscoe_pythonpath.txt		roscoe_pythonpath.txt
roscoe_requirements.txt		roscoe_requirements.txt
run_pile_counting.sh		run_pile_counting.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CausalLLMs

Perplexity calculator

Usage

Examples

Pile Word Frequency Counting

Usage

About

Releases

Packages

Languages

License

TheBlueHawk/CausalLLMs

Folders and files

Latest commit

History

Repository files navigation

CausalLLMs

Perplexity calculator

Usage

Examples

Pile Word Frequency Counting

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages