SEDA

This repository contains the source code for SEDA. You can follow the instructions below to reproduce our experiments. Our source code references https://github.com/shichao-wang/CircEvent.git

Dataset

Our experiments are conducted on the New York Times (NYT) portion of the English Gigawords. You can get access from the official website. The data split we used is provided by Granroth-Wilding[1] We annotate the raw documents based on Lee[2] with the standford CoreNLP toolkit. The configuration of CoreNLP is listed in corenlp.props file.

Environment Setup

We conducted our experiments with on a workstation with a tesla V100. Our programs are tested under PyTorch 1.8.1 + CUDA 10.2.

Setup Python environment. We encourage using conda to setup the python virtual environment. conda create -n seda python==3.8 && conda activate seda
Install the CUDA toolkit and Pytorch. `pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple``
Install the pip packages. pip install -r requirements.txt
Install the circumst_event package pip install -e .

Now the environment has been set up in the seda virtual environment.

Reproduce Steps

The source codes can be divided into two parts, i.e. data preprocessing and model training. The entry scripts are placed in bin folder. Each step and its corresponding script is listed below.

extract text out of gigaword xml file. 1_extract_gigaword_nyt.py
annotate text with CoreNLP. 2_corenlp_annotate.py
extract event chain from annotated document. 3_extract_event_chain.py
convert event chain words to ids. 4_index_event_chain.py
split into train, validation, test set. 5_split_dataset.py
train the circ model. 6_circ_train.py
evaluate the saved model and generate quantitative analysis file. 7_circ_eval.py
draw figures of changing in accuracies based on quantitative analysis file. 8_mask_multiple_items_with_weights.py

Reference

[1] Shichao Wang, Xiangrui Cai, Hongbin Wang, and Xiaojie Yuan. 2021. Incorporating circumstances into narrative event prediction. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4840–4849.

[2] Mark Granroth-Wilding and Stephen Clark. 2016. What happens next? Event Prediction Using a Compositional Neural Network Model. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 2727–2733, Phoenix, Arizona, February. AAAI Press.

[3] I-Ta Lee and Dan Goldwasser. 2019. Multi-Relational Script Learning for Discourse Relations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4214–4226, Florence, Italy, July. Association for Computational Linguistics.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bin		bin
src		src
.gitignore		.gitignore
README.md		README.md
conda-env.yml		conda-env.yml
corenlp.props		corenlp.props
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEDA

Dataset

Environment Setup

Reproduce Steps

Reference

About

Releases

Packages

Contributors 2

Languages

Shuaihu-Han/SEDA

Folders and files

Latest commit

History

Repository files navigation

SEDA

Dataset

Environment Setup

Reproduce Steps

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages