nqg/data at master · xinyadu/nqg

History

Name		Name	Last commit message	Last commit date
parent directory ..
processed		processed
raw		raw
README.md		README.md
doclist-dev.txt		doclist-dev.txt
doclist-test.txt		doclist-test.txt
doclist-train.txt		doclist-train.txt

README.md

Data

This is the data split described in the paper "Learning to Ask: Neural Question Generation for Reading Comprehension." by Du et. al, ACL (2017).

The structure of this folder is:

data
├── processed
│   ├── src-{train, dev, test}.txt
│   ├── tgt-{train, dev, test}.txt
│   └── para-{train, dev, test}.txt
│  
├── raw
│   ├── train.json
│   ├── dev.json
│   └── test.json
│
├── doclist-train.txt
├── doclist-dev.txt
└── doclist-test.txt

Our split is done at the article level, the doclist-*.txt contains the article titles of each split. We use the original dev set in the SQuAD dataset as our dev set, we split the original training set into our training set and test set.

The processed folder includes input sentence files (src-*.txt), corresponding questions files (tgt-*.txt), and the files of paragraphs which contain the input sentence (para-*.txt). The sentences/questions/paragraphs are tokenized.

The raw folder includes the raw data files from the SQuAD dataset, split into train, dev, test set.

Licence

We re-distribute this data split under the CC BY-SA 4.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Data

Licence

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Data

Licence