Skip to content

Commit

Permalink
Update links to RaRe Technologies website to https
Browse files Browse the repository at this point in the history
Avoids a redirect and protects the content.
  • Loading branch information
pabs3 committed Mar 13, 2023
1 parent 0581678 commit b898dd0
Show file tree
Hide file tree
Showing 9 changed files with 14 additions and 14 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ Adopters

| Company | Logo | Industry | Use of Gensim |
|---------|------|----------|---------------|
| [RARE Technologies](http:https://rare-technologies.com) | ![rare](docs/src/readme_images/rare.png) | ML & NLP consulting | Creators of Gensim – this is us! |
| [RARE Technologies](https:https://rare-technologies.com/) | ![rare](docs/src/readme_images/rare.png) | ML & NLP consulting | Creators of Gensim – this is us! |
| [Amazon](http:https://www.amazon.com/) | ![amazon](docs/src/readme_images/amazon.png) | Retail | Document similarity. |
| [National Institutes of Health](https://github.com/NIHOPA/pipeline_word2vec) | ![nih](docs/src/readme_images/nih.png) | Health | Processing grants and publications with word2vec. |
| [Cisco Security](http:https://www.cisco.com/c/en/us/products/security/index.html) | ![cisco](docs/src/readme_images/cisco.png) | Security | Large-scale fraud detection. |
Expand Down Expand Up @@ -163,7 +163,7 @@ BibTeX entry:
[citing gensim in academic papers and theses]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C

[design goals]: https://radimrehurek.com/gensim/intro.html#design-principles
[RaRe Technologies]: http:https://rare-technologies.com/wp-content/uploads/2016/02/rare_image_only.png%20=10x20
[RaRe Technologies]: https:https://rare-technologies.com/wp-content/uploads/2016/02/rare_image_only.png%20=10x20
[rare\_tech]: //rare-technologies.com
[Talentpair]: https://avatars3.githubusercontent.com/u/8418395?v=3&s=100
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC
Expand Down
4 changes: 2 additions & 2 deletions docs/notebooks/WMD_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"\n",
"## Word Mover's Distance basics\n",
"\n",
"WMD is a method that allows us to assess the \"distance\" between two documents in a meaningful way, even when they have no words in common. It uses [word2vec](http:https://rare-technologies.com/word2vec-tutorial/) [4] vector embeddings of words. It been shown to outperform many of the state-of-the-art methods in *k*-nearest neighbors classification [3].\n",
"WMD is a method that allows us to assess the \"distance\" between two documents in a meaningful way, even when they have no words in common. It uses [word2vec](https:https://rare-technologies.com/word2vec-tutorial/) [4] vector embeddings of words. It been shown to outperform many of the state-of-the-art methods in *k*-nearest neighbors classification [3].\n",
"\n",
"WMD is illustrated below for two very similar sentences (illustration taken from [Vlad Niculae's blog](http:https://vene.ro/blog/word-movers-distance-in-python.html)). The sentences have no words in common, but by matching the relevant words, WMD is able to accurately measure the (dis)similarity between the two sentences. The method also uses the bag-of-words representation of the documents (simply put, the word's frequencies in the documents), noted as $d$ in the figure below. The intuition behind the method is that we find the minimum \"traveling distance\" between documents, in other words the most efficient way to \"move\" the distribution of document 1 to the distribution of document 2.\n",
"\n",
Expand All @@ -36,7 +36,7 @@
"\n",
"## Part 1: Computing the Word Mover's Distance\n",
"\n",
"To use WMD, we need some word embeddings first of all. You could train a word2vec (see tutorial [here](http:https://rare-technologies.com/word2vec-tutorial/)) model on some corpus, but we will start by downloading some pre-trained word2vec embeddings. Download the GoogleNews-vectors-negative300.bin.gz embeddings [here](https://code.google.com/archive/p/word2vec/) (warning: 1.5 GB, file is not needed for part 2). Training your own embeddings can be beneficial, but to simplify this tutorial, we will be using pre-trained embeddings at first.\n",
"To use WMD, we need some word embeddings first of all. You could train a word2vec (see tutorial [here](https:https://rare-technologies.com/word2vec-tutorial/)) model on some corpus, but we will start by downloading some pre-trained word2vec embeddings. Download the GoogleNews-vectors-negative300.bin.gz embeddings [here](https://code.google.com/archive/p/word2vec/) (warning: 1.5 GB, file is not needed for part 2). Training your own embeddings can be beneficial, but to simplify this tutorial, we will be using pre-trained embeddings at first.\n",
"\n",
"Let's take some sentences to compute the distance between."
]
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/Word2Vec_FastText_Comparison.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,7 @@
"3. In general, the performance of the models seems to get closer with the increasing corpus size. However, this might possibly be due to the size of the model staying constant at 100, and a larger model size for large corpora might result in higher performance gains.\n",
"4. The semantic accuracy for all models increases significantly with the increase in corpus size.\n",
"5. However, the increase in syntactic accuracy from the increase in corpus size for the n-gram FastText model is lower (in both relative and absolute terms). This could possibly indicate that advantages gained by incorporating morphological information could be less significant in case of larger corpus sizes (the corpuses used in the original paper seem to indicate this too)\n",
"6. Training times for gensim are slightly lower than the fastText no-ngram model, and significantly lower than the n-gram variant. This is quite impressive considering fastText is implemented in C++ and Gensim in Python (with calls to low-level BLAS routines for much of the heavy lifting). You could read [this post](http:https://rare-technologies.com/word2vec-in-python-part-two-optimizing/) for more details regarding word2vec optimisation in Gensim. Note that these times include importing any dependencies and serializing the models to disk, and not just the training times."
"6. Training times for gensim are slightly lower than the fastText no-ngram model, and significantly lower than the n-gram variant. This is quite impressive considering fastText is implemented in C++ and Gensim in Python (with calls to low-level BLAS routines for much of the heavy lifting). You could read [this post](https:https://rare-technologies.com/word2vec-in-python-part-two-optimizing/) for more details regarding word2vec optimisation in Gensim. Note that these times include importing any dependencies and serializing the models to disk, and not just the training times."
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions docs/notebooks/ldaseqmodel.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"\n",
"While most traditional topic mining algorithms do not expect time-tagged data or take into account any prior ordering, Dynamic Topic Models (DTM) leverages the knowledge of different documents belonging to a different time-slice in an attempt to map how the words in a topic change over time.\n",
"\n",
"[This](http:https://rare-technologies.com/understanding-and-coding-dynamic-topic-models/) blog post is also useful in breaking down the ideas and theory behind DTM.\n",
"[This](https:https://rare-technologies.com/understanding-and-coding-dynamic-topic-models/) blog post is also useful in breaking down the ideas and theory behind DTM.\n",
"\n"
]
},
Expand All @@ -65,7 +65,7 @@
"\n",
"There is some clarity on how they built their code now - Variational Inference using Kalman Filters, as described in section 3 of the paper. The mathematical basis for the code is well described in the appendix of the paper. If the documentation is lacking or not clear, comments via Issues or PRs via the gensim repo would be useful in improving the quality.\n",
"\n",
"This project was part of the Google Summer of Code 2016 program: I have been regularly blogging about my progress with implementing this, which you can find [here](http:https://rare-technologies.com/author/bhargav/)."
"This project was part of the Google Summer of Code 2016 program: I have been regularly blogging about my progress with implementing this, which you can find [here](https:https://rare-technologies.com/author/bhargav/)."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/soft_cosine_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
"source": [
"## Part 1: Computing the Soft Cosine Measure\n",
"\n",
"To use SCM, we need some word embeddings first of all. You could train a [word2vec][] (see tutorial [here](http:https://rare-technologies.com/word2vec-tutorial/)) model on some corpus, but we will use pre-trained word2vec embeddings.\n",
"To use SCM, we need some word embeddings first of all. You could train a [word2vec][] (see tutorial [here](https:https://rare-technologies.com/word2vec-tutorial/)) model on some corpus, but we will use pre-trained word2vec embeddings.\n",
"\n",
"[word2vec]: https://radimrehurek.com/gensim/models/word2vec.html\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/src/gallery/other/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Blog posts, tutorial videos, hackathons and other useful Gensim resources, from
- *Use FastText or Word2Vec?* Comparison of embedding quality and performance. `Jupyter Notebook <https://github.com/RaRe-Technologies/gensim/blob/ba1ce894a5192fc493a865c535202695bb3c0424/docs/notebooks/Word2Vec_FastText_Comparison.ipynb>`__
- Multiword phrases extracted from *How I Met Your Mother*. `Blog post by Mark Needham <http:https://www.markhneedham.com/blog/2015/02/12/pythongensim-creating-bigrams-over-how-i-met-your-mother-transcripts/>`__
- *Using Gensim LDA for hierarchical document clustering*. `Jupyter notebook by Brandon Rose <http:https://brandonrose.org/clustering#Latent-Dirichlet-Allocation>`__
- *Evolution of Voldemort topic through the 7 Harry Potter books*. `Blog post <http:https://rare-technologies.com/understanding-and-coding-dynamic-topic-models/>`__
- *Evolution of Voldemort topic through the 7 Harry Potter books*. `Blog post <https:https://rare-technologies.com/understanding-and-coding-dynamic-topic-models/>`__
- *Movie plots by genre*: Document classification using various techniques: TF-IDF, word2vec averaging, Deep IR, Word Movers Distance and doc2vec. `Github repo <https://github.com/RaRe-Technologies/movie-plots-by-genre>`__
- *Word2vec: Faster than Google? Optimization lessons in Python*, talk by Radim Řehůřek at PyData Berlin 2014. `Youtube video <https://www.youtube.com/watch?v=vU4TlwZzTfU>`__
- *Word2vec & friends*, talk by Radim Řehůřek at MLMU.cz 7.1.2015. `Youtube video <https://www.youtube.com/watch?v=wTp3P2UnTfQ>`__
Expand Down
6 changes: 3 additions & 3 deletions docs/src/gallery/tutorials/run_lda.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,14 +268,14 @@ def extract_documents(url='https://cs.nyu.edu/~roweis/data/nips12raw_str602.tgz'
# Note that we use the "Umass" topic coherence measure here (see
# :py:func:`gensim.models.ldamodel.LdaModel.top_topics`), Gensim has recently
# obtained an implementation of the "AKSW" topic coherence measure (see
# accompanying blog post, http:https://rare-technologies.com/what-is-topic-coherence/).
# accompanying blog post, https:https://rare-technologies.com/what-is-topic-coherence/).
#
# If you are familiar with the subject of the articles in this dataset, you can
# see that the topics below make a lot of sense. However, they are not without
# flaws. We can see that there is substantial overlap between some topics,
# others are hard to interpret, and most of them have at least some terms that
# seem out of place. If you were able to do better, feel free to share your
# methods on the blog at http:https://rare-technologies.com/lda-training-tips/ !
# methods on the blog at https:https://rare-technologies.com/lda-training-tips/ !
#

top_topics = model.top_topics(corpus)
Expand All @@ -299,7 +299,7 @@ def extract_documents(url='https://cs.nyu.edu/~roweis/data/nips12raw_str602.tgz'
# Where to go from here
# ---------------------
#
# * Check out a RaRe blog post on the AKSW topic coherence measure (http:https://rare-technologies.com/what-is-topic-coherence/).
# * Check out a RaRe blog post on the AKSW topic coherence measure (https:https://rare-technologies.com/what-is-topic-coherence/).
# * pyLDAvis (https://pyldavis.readthedocs.io/en/latest/index.html).
# * Read some more Gensim tutorials (https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#tutorials).
# * If you haven't already, read [1] and [2] (see references).
Expand Down
2 changes: 1 addition & 1 deletion docs/src/gallery/tutorials/run_wmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#
# WMD enables us to assess the "distance" between two documents in a meaningful
# way even when they have no words in common. It uses `word2vec
# <http:https://rare-technologies.com/word2vec-tutorial/>`_ [4] vector embeddings of
# <https:https://rare-technologies.com/word2vec-tutorial/>`_ [4] vector embeddings of
# words. It been shown to outperform many of the state-of-the-art methods in
# k-nearest neighbors classification [3].
#
Expand Down
2 changes: 1 addition & 1 deletion docs/src/gallery/tutorials/run_word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ def __iter__(self):
# one core because of the `GIL
# <https://wiki.python.org/moin/GlobalInterpreterLock>`_ (and ``word2vec``
# training will be `miserably slow
# <http:https://rare-technologies.com/word2vec-in-python-part-two-optimizing/>`_\ ).
# <https:https://rare-technologies.com/word2vec-in-python-part-two-optimizing/>`_\ ).
#

###############################################################################
Expand Down

0 comments on commit b898dd0

Please sign in to comment.