Neural network brain decoding

This repository contains analysis code for the paper:

Linking human and artificial neural representations of language.
Jon Gauthier and Roger P. Levy.
2019 Conference on Empirical Methods in Natural Language Processing.

This repository is open-source under the MIT License. If you would like to reuse our code or otherwise extend our work, please cite our paper:

 @inproceedings{gauthier2019linking,
   title={Linking human and artificial neural representations of language},
   author={Gauthier, Jon and Levy, Roger P.},
   booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
   year={2019}
 }

About the codebase

We structure our data analysis pipeline, from model fine-tuning to representation analysis, using Nextflow. Our entire data analysis pipeline is specified in the file main.nf.

Visualizations and statistical tests are done in Jupyter notebooks stored in the notebooks directory.

Running the code

Hardware requirements

~2 TB disk space (for storing brain images, model checkpoints, etc.)
8 GB RAM or more
1 GPU with > 4 GB RAM (for fine-tuning BERT models)

We strongly suggest running this pipeline on a distributed computing cluster to save time. The full pipeline completes in several days on an MIT high-performance computing cluster.

If you don't have a GPU or this much disk space to spare but still wish to run the pipeline, please ping me and we can make special resource-saving arrangements.

Software requirements

There are only two software requirements:

Nextflow is used to manage the data processing pipeline. Installing Nextflow is as simple as running the following command:
```
wget -qO- https://get.nextflow.io | bash
```
This installation script will put a binary nextflow in your working directory. The later commands in this README assume that this binary is on your PATH.
Singularity retrieves and runs the software containers necessary for the pipeline. It is likely already available on your computing cluster. If not, please see the Singularity installation instructions.

The pipeline is otherwise fully automated, so all other dependencies (data, BERT, etc.) will be automatically retrieved.

Starting the pipeline

Check out the repository by downloading the emnlp2019-final tag and run the following command in the root directory:

nextflow run main.nf

Configuring the pipeline

For technical configuration (e.g. customizing how this pipeline will be deployed on a cluster), see the file nextflow.config. The pipeline is configured by default to run locally, but can be easily farmed out across a computing cluster.

A configuration for the SLURM framework is given in nextflow.slurm.config. If your cluster uses a framework other than SLURM, adapting to it may be as simple as changing a few settings in that file. See the Nextflow documentation on cluster computing for more information.

For model configuration (e.g. customizing hyperparameters), see the header of the main pipeline in main.nf. Each parameter, written as params.X, can be overwritten with a command line flag of the same name. For example, if we wanted to run the whole pipeline with BERT models trained for 500 steps rather than 250 steps, we could simply execute

nextflow run main.nf --finetune_steps 500

Analysis and visualization

The notebooks directory contains Jupyter notebooks for producing the visualizations and statistical analyses in the paper (and much more):

quantitative_dynamic.ipynb is used to produce the majority of the plots in the paper, studying brain decoding across fine-tuning time in different models.
structural-probes.ipynb visualizes the structural probe results.
predictions.ipynb produces, among many other things, the RSA analysis on model representations.

After the Nextflow pipeline completes, you can load and run these notebooks by beginning a Jupyter notebook session in the same directory as where you began the pipeline. The notebooks require Tensorflow and general Python data science tools to function. I recommend using my tensorflow Singularity image as follows:

singularity run library:https://jon/default/tensorflow:1.12.0-cpu jupyter lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural network brain decoding

About the codebase

Running the code

Hardware requirements

Software requirements

Starting the pipeline

Configuring the pipeline

Analysis and visualization

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
bin		bin
data		data
notebooks		notebooks
src		src
structural-probes		structural-probes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow.slurm.config		nextflow.slurm.config

License

hans/nn-decoding

Folders and files

Latest commit

History

Repository files navigation

Neural network brain decoding

About the codebase

Running the code

Hardware requirements

Software requirements

Starting the pipeline

Configuring the pipeline

Analysis and visualization

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages