Skip to content

Latest commit

 

History

History

word_vectors

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Language Modelling Experiment with Word Vectors

This codebase is forked from Pytorch's Language Modelling example. This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the PTB dataset, provided. The trained model can then be used by the generate script to generate new text.

Experiments

Training

Using pretrained word2vec and debiased word embeddings. Run the code using the following args:

python main.py --data ./data/google --cuda --epochs 10 --emsize=300 --nhid=650 --dropout 0.5 --type=debiased --save model_deb_fn.pt

Where, --type can be debiased, word2vec, glove and concept, and --data can be ./data/penn for Penn Tree Bank dataset, or ./data/google for Google 1Billion words dataset.

The above hyperparams produce 95 perplexity on Google 1B dataset.

Generate

To generate examples:

python generate.py --checkpoint model_deb_fn.pt --words=100 --cuda --token "boss expects me to" && cat generated.txt

To generate both word2vec and debiased examples together:

./test_models_single.sh

To batch generate word2vec and debiased using input csv file input_tabs.csv:

./test_models.sh

Acknowledgements

Forked from Pytorch Language modelling example.