Skip to content

Commit

Permalink
Update Wikipedia URLs to https
Browse files Browse the repository at this point in the history
  • Loading branch information
pabs3 committed Mar 13, 2023
1 parent e50776e commit 1c9fc75
Show file tree
Hide file tree
Showing 12 changed files with 26 additions and 26 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,8 @@ BibTeX entry:
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC

[documentation and Jupyter Notebook tutorials]: https://github.com/RaRe-Technologies/gensim/#documentation
[Vector Space Model]: http:https://en.wikipedia.org/wiki/Vector_space_model
[unsupervised document analysis]: http:https://en.wikipedia.org/wiki/Latent_semantic_indexing
[Vector Space Model]: https:https://en.wikipedia.org/wiki/Vector_space_model
[unsupervised document analysis]: https:https://en.wikipedia.org/wiki/Latent_semantic_indexing
[NumPy and Scipy]: https://scipy.org/install/
[ATLAS]: http:https://math-atlas.sourceforge.net/
[OpenBLAS]: http:https://xianyi.github.io/OpenBLAS/
Expand Down
6 changes: 3 additions & 3 deletions docs/notebooks/distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ Available distributed algorithms
* [Distributed Latent Dirichlet Allocation][8]


[1]: http:https://en.wikipedia.org/wiki/Distributed_computing
[2]: http:https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
[1]: https:https://en.wikipedia.org/wiki/Distributed_computing
[2]: https:https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
[3]: https://pypi.org/project/Pyro4/
[4]: https://radimrehurek.com/gensim/intro.html#design
[5]: https://radimrehurek.com/gensim/distributed.html#term-worker
[6]: http:https://en.wikipedia.org/wiki/Broadcast_domain
[6]: https:https://en.wikipedia.org/wiki/Broadcast_domain
[7]: https://radimrehurek.com/gensim/dist_lsi.html
[8]: https://radimrehurek.com/gensim/dist_lda.html
4 changes: 2 additions & 2 deletions docs/src/_index.rst.unused
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ against other documents, words or phrases.

.. note::
If the previous paragraphs left you confused, you can read more about the `Vector
Space Model <http:https://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <http:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.
Space Model <https:https://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <https:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.


.. _design:
Expand Down
2 changes: 1 addition & 1 deletion docs/src/dist_lsi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ ____________

So let's test our setup and run one computation of distributed LSA. Open a Python
shell on one of the five machines (again, this can be done on any computer
in the same `broadcast domain <http:https://en.wikipedia.org/wiki/Broadcast_domain>`_,
in the same `broadcast domain <https:https://en.wikipedia.org/wiki/Broadcast_domain>`_,
our choice is incidental) and try:

.. sourcecode:: pycon
Expand Down
6 changes: 3 additions & 3 deletions docs/src/distributed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Why distributed computing?

Need to build semantic representation of a corpus that is millions of documents large and it's
taking forever? Have several idle machines at your disposal that you could use?
`Distributed computing <http:https://en.wikipedia.org/wiki/Distributed_computing>`_ tries
`Distributed computing <https:https://en.wikipedia.org/wiki/Distributed_computing>`_ tries
to accelerate computations by splitting a given task into several smaller subtasks,
passing them on to several computing nodes in parallel.

Expand All @@ -23,7 +23,7 @@ much communication going on), so the network is allowed to be of relatively high
The primary reason for using distributed computing is making things run faster. In `gensim`,
most of the time consuming stuff is done inside low-level routines for linear algebra, inside
NumPy, independent of any `gensim` code.
**Installing a fast** `BLAS (Basic Linear Algebra) <http:https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms>`_ **library
**Installing a fast** `BLAS (Basic Linear Algebra) <https:https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms>`_ **library
for NumPy can improve performance up to 15 times!** So before you start buying those extra computers,
consider installing a fast, threaded BLAS that is optimized for your particular machine
(as opposed to a generic, binary-distributed library).
Expand Down Expand Up @@ -71,7 +71,7 @@ inside `gensim` will try to look for and enslave all available worker nodes.
Cluster
Several nodes which communicate over TCP/IP. Currently, network broadcasting
is used to discover and connect all communicating nodes, so the nodes must lie
within the same `broadcast domain <http:https://en.wikipedia.org/wiki/Broadcast_domain>`_.
within the same `broadcast domain <https:https://en.wikipedia.org/wiki/Broadcast_domain>`_.

Worker
A process which is created on each node. To remove a node from your cluster,
Expand Down
4 changes: 2 additions & 2 deletions docs/src/gallery/core/run_corpora_and_vector_spaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,10 @@
# by the features extracted from it, not by its "surface" string form: how you get to
# the features is up to you. Below I describe one common, general-purpose approach (called
# :dfn:`bag-of-words`), but keep in mind that different application domains call for
# different features, and, as always, it's `garbage in, garbage out <http:https://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out>`_...
# different features, and, as always, it's `garbage in, garbage out <https:https://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out>`_...
#
# To convert documents to vectors, we'll use a document representation called
# `bag-of-words <http:https://en.wikipedia.org/wiki/Bag_of_words>`_. In this representation,
# `bag-of-words <https:https://en.wikipedia.org/wiki/Bag_of_words>`_. In this representation,
# each document is represented by one vector where each vector element represents
# a question-answer pair, in the style of:
#
Expand Down
4 changes: 2 additions & 2 deletions docs/src/gallery/core/run_similarity_queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,10 @@
print(vec_lsi)

###############################################################################
# In addition, we will be considering `cosine similarity <http:https://en.wikipedia.org/wiki/Cosine_similarity>`_
# In addition, we will be considering `cosine similarity <https:https://en.wikipedia.org/wiki/Cosine_similarity>`_
# to determine the similarity of two vectors. Cosine similarity is a standard measure
# in Vector Space Modeling, but wherever the vectors represent probability distributions,
# `different similarity measures <http:https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Symmetrised_divergence>`_
# `different similarity measures <https:https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Symmetrised_divergence>`_
# may be more appropriate.
#
# Initializing query structures
Expand Down
8 changes: 4 additions & 4 deletions docs/src/gallery/core/run_topics_and_transformations.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
corpus_lsi = lsi_model[corpus_tfidf] # create a double wrapper over the original corpus: bow->tfidf->fold-in-lsi

###############################################################################
# Here we transformed our Tf-Idf corpus via `Latent Semantic Indexing <http:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_
# Here we transformed our Tf-Idf corpus via `Latent Semantic Indexing <https:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_
# into a latent 2-D space (2-D because we set ``num_topics=2``). Now you're probably wondering: what do these two latent
# dimensions stand for? Let's inspect with :func:`models.LsiModel.print_topics`:

Expand Down Expand Up @@ -175,7 +175,7 @@
#
# Gensim implements several popular Vector Space Model algorithms:
#
# * `Term Frequency * Inverse Document Frequency, Tf-Idf <http:https://en.wikipedia.org/wiki/Tf%E2%80%93idf>`_
# * `Term Frequency * Inverse Document Frequency, Tf-Idf <https:https://en.wikipedia.org/wiki/Tf%E2%80%93idf>`_
# expects a bag-of-words (integer values) training corpus during initialization.
# During transformation, it will take a vector and return another vector of the
# same dimensionality, except that features which were rare in the training corpus
Expand All @@ -202,7 +202,7 @@
#
# model = models.OkapiBM25Model(corpus)
#
# * `Latent Semantic Indexing, LSI (or sometimes LSA) <http:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_
# * `Latent Semantic Indexing, LSI (or sometimes LSA) <https:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_
# transforms documents from either bag-of-words or (preferrably) TfIdf-weighted space into
# a latent space of a lower dimensionality. For the toy corpus above we used only
# 2 latent dimensions, but on real corpora, target dimensionality of 200--500 is recommended
Expand Down Expand Up @@ -247,7 +247,7 @@
#
# model = models.RpModel(tfidf_corpus, num_topics=500)
#
# * `Latent Dirichlet Allocation, LDA <http:https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation>`_
# * `Latent Dirichlet Allocation, LDA <https:https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation>`_
# is yet another transformation from bag-of-words counts into a topic space of lower
# dimensionality. LDA is a probabilistic extension of LSA (also called multinomial PCA),
# so LDA's topics can be interpreted as probability distributions over words. These distributions are,
Expand Down
4 changes: 2 additions & 2 deletions docs/src/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ Once these statistical patterns are found, any plain text documents (sentence, p

.. note::
If the previous paragraphs left you confused, you can read more about the `Vector
Space Model <http:https://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <http:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.
Space Model <https:https://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <https:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.

.. _design:

Expand Down
4 changes: 2 additions & 2 deletions docs/src/wiki.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ into LDA topic distributions:
in your list appear to be meta topics, concerning the administration and
cleanup of Wikipedia. These show up because you didn't exclude templates
such as these, some of which are included in most articles for quality
control: http:https://en.wikipedia.org/wiki/Wikipedia:Template_messages/Cleanup
control: https:https://en.wikipedia.org/wiki/Wikipedia:Template_messages/Cleanup
The fourth and fifth topics clearly shows the influence of bots that import
massive databases of cities, countries, etc. and their statistics such as
Expand All @@ -232,7 +232,7 @@ into LDA topic distributions:
So the top ten concepts are apparently dominated by Wikipedia robots and expanded
templates; this is a good reminder that LSA is a powerful tool for data analysis,
but no silver bullet. As always, it's `garbage in, garbage out
<http:https://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out>`_...
<https:https://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out>`_...
By the way, improvements to the Wiki markup parsing code are welcome :-)
.. [3] Hoffman, Blei, Bach. 2010. Online learning for Latent Dirichlet Allocation
Expand Down
2 changes: 1 addition & 1 deletion gensim/corpora/hashdictionary.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Copyright (C) 2012 Homer Strong, Radim Rehurek
# Licensed under the GNU LGPL v2.1 - https://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html

"""Implements the `"hashing trick" <http:https://en.wikipedia.org/wiki/Hashing-Trick>`_ -- a mapping between words
"""Implements the `"hashing trick" <https:https://en.wikipedia.org/wiki/Hashing-Trick>`_ -- a mapping between words
and their integer ids using a fixed, static mapping (hash function).
Notes
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,8 @@ def run(self):
If this feature list left you scratching your head, you can first read more about the `Vector
Space Model <http:https://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <http:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.
Space Model <https:https://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <https:https://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.
Installation
------------
Expand Down

0 comments on commit 1c9fc75

Please sign in to comment.