This codebase is forked from Pytorch's Language Modelling example. This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the PTB dataset, provided. The trained model can then be used by the generate script to generate new text.
Using pretrained word2vec and debiased word embeddings. Run the code using the following args:
python main.py --data ./data/google --cuda --epochs 10 --emsize=300 --nhid=650 --dropout 0.5 --type=debiased --save model_deb_fn.pt
Where, --type
can be debiased
, word2vec
, glove
and concept
, and --data
can be ./data/penn
for Penn Tree Bank dataset, or ./data/google
for Google 1Billion words dataset.
The above hyperparams produce 95 perplexity on Google 1B dataset.
To generate examples:
python generate.py --checkpoint model_deb_fn.pt --words=100 --cuda --token "boss expects me to" && cat generated.txt
To generate both word2vec and debiased examples together:
./test_models_single.sh
To batch generate word2vec and debiased using input csv file input_tabs.csv
:
./test_models.sh
Forked from Pytorch Language modelling example.