- Given data we need to extract a stratified subset with pronouns
To do so run:
python createtestset.py -i [INPUTFILE] -o [OUTPUTFILE] -t [COUNTOFPRONOUNS]
where [INPUTFULE] is the data file containing all your data, [OUTPUTFILE] is where you want to keep the extracted sentences and [COUNTOFPRONOUNS] is the total amount of pronouns that you would like to have.
Make sure the INPTUFILE contains tokenized sentences.