Skip to content

Commit

Permalink
Added unk vectors to wordvec downloads
Browse files Browse the repository at this point in the history
  • Loading branch information
Russell Stewart authored and Russell Stewart committed Dec 30, 2015
1 parent f33bf82 commit 72b8433
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ Pre-trained word vectors are made available under the <a href="http:https://opendataco
and License</a>
<div class="entry">
<ul style="padding-left:0px; margin-top:0px; margin-bottom:0px">
<li> <a href="http:https://dumps.wikimedia.org/enwiki/20140102/">Wikipedia 2014</a> + <a href="https://catalog.ldc.upenn.edu/LDC2011T07">Gigaword 5</a> (6B tokens, 400K vocab, uncased, 50d, 100d, 200d, &amp; 300d vectors, 822 MB download): <a href="http:https://nlp.stanford.edu/data/glove.6B.zip">glove.6B.zip</a> </li>
<li> Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): <a href="http:https://nlp.stanford.edu/data/glove.42B.300d.zip">glove.42B.300d.zip</a> </li>
<li> Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): <a href="http:https://nlp.stanford.edu/data/glove.840B.300d.zip">glove.840B.300d.zip</a> </li>
<li> Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, &amp; 200d vectors, 1.42 GB download): <a href="http:https://nlp.stanford.edu/data/glove.twitter.27B.zip">glove.twitter.27B.zip</a> Ruby <a href="preprocess-twitter.rb">script</a> for preprocessing Twitter data </li>
<li> <a href="http:https://dumps.wikimedia.org/enwiki/20140102/">Wikipedia 2014</a> + <a href="https://catalog.ldc.upenn.edu/LDC2011T07">Gigaword 5</a> (6B tokens, 400K vocab, uncased, 50d, 100d, 200d, &amp; 300d vectors, 822 MB download): <a href="http:https://nlp.stanford.edu/data/wordvecs/glove.6B.zip">glove.6B.zip</a> </li>
<li> Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): <a href="http:https://nlp.stanford.edu/data/wordvecs/glove.42B.300d.zip">glove.42B.300d.zip</a> </li>
<li> Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): <a href="http:https://nlp.stanford.edu/data/wordvecs/glove.840B.300d.zip">glove.840B.300d.zip</a> </li>
<li> Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, &amp; 200d vectors, 1.42 GB download): <a href="http:https://nlp.stanford.edu/data/wordvecs/glove.twitter.27B.zip">glove.twitter.27B.zip</a> Ruby <a href="preprocess-twitter.rb">script</a> for preprocessing Twitter data </li>
</ul>
</div>

Expand Down

0 comments on commit 72b8433

Please sign in to comment.