Skip to content

Commit

Permalink
DOC Fix typos, via a Levenshtein-style corrector (scikit-learn#15923)
Browse files Browse the repository at this point in the history
  • Loading branch information
bwignall authored and TomDLT committed Dec 20, 2019
1 parent 2f86cdb commit 0e10b3a
Show file tree
Hide file tree
Showing 33 changed files with 36 additions and 35 deletions.
2 changes: 1 addition & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ jobs:
# It runs tests requiring pandas and PyAMG.
pylatest_pip_openblas_pandas:
DISTRIB: 'conda-pip-latest'
# FIXME: pinned until SciPy wheels are available for Pyhon 3.8
# FIXME: pinned until SciPy wheels are available for Python 3.8
PYTHON_VERSION: '3.8'
PYTEST_VERSION: '4.6.2'
COVERAGE: 'true'
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/bench_plot_randomized_svd.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@
# in case the reconstructed (dense) matrix is too large
MAX_MEMORY = np.int(2e9)

# The following datasets can be dowloaded manually from:
# The following datasets can be downloaded manually from:
# CIFAR 10: https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
# SVHN: http:https://ufldl.stanford.edu/housenumbers/train_32x32.mat
CIFAR_FOLDER = "./cifar-10-batches-py/"
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/bench_text_vectorizers.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def f():
text = fetch_20newsgroups(subset='train').data[:1000]

print("="*80 + '\n#' + " Text vectorizers benchmark" + '\n' + '='*80 + '\n')
print("Using a subset of the 20 newsrgoups dataset ({} documents)."
print("Using a subset of the 20 newsgroups dataset ({} documents)."
.format(len(text)))
print("This benchmarks runs in ~1 min ...")

Expand Down
2 changes: 1 addition & 1 deletion build_tools/azure/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ make_conda() {
version_ge() {
# The two version numbers are separated with a new line is piped to sort
# -rV. The -V activates for version number sorting and -r sorts in
# decending order. If the first argument is the top element of the sort, it
# descending order. If the first argument is the top element of the sort, it
# is greater than or equal to the second argument.
test "$(printf "${1}\n${2}" | sort -rV | head -n 1)" == "$1"
}
Expand Down
2 changes: 1 addition & 1 deletion doc/developers/advanced_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ feature, code or documentation improvement).
If you plan on submitting a pull-request, you should clone from your fork
instead.

#. Install a compiler with OpenMP_ support for your platform. See intructions
#. Install a compiler with OpenMP_ support for your platform. See instructions
for :ref:`compiler_windows`, :ref:`compiler_macos`, :ref:`compiler_linux`
and :ref:`compiler_freebsd`.

Expand Down
2 changes: 1 addition & 1 deletion doc/developers/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,7 @@ complies with the following rules before marking a PR as ``[MRG]``. The
methods available in scikit-learn.

10. New features often need to be illustrated with narrative documentation in
the user guide, with small code snipets. If relevant, please also add
the user guide, with small code snippets. If relevant, please also add
references in the literature, with PDF links when possible.

11. The user guide should also include expected time and space complexity
Expand Down
2 changes: 1 addition & 1 deletion doc/developers/maintainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Making a release
2. On the branch for releasing, update the version number in
sklearn/__init__.py, the ``__version__`` variable by removing ``dev*`` only
when ready to release.
On master, increment the verson in the same place (when branching for
On master, increment the version in the same place (when branching for
release).

3. Create the tag and push it::
Expand Down
2 changes: 1 addition & 1 deletion doc/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1161,7 +1161,7 @@ Methods

TODO: `This gist
<https://gist.github.com/jnothman/4807b1b0266613c20ba4d1f88d0f8cf5>`_
higlights the use of the different formats for multilabel.
highlights the use of the different formats for multilabel.
multioutput classification
A list of 2d arrays, corresponding to each multiclass decision
function.
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -775,7 +775,7 @@ core sample, and is at least ``eps`` in distance from any core sample, is
considered an outlier by the algorithm.

While the parameter ``min_samples`` primarily controls how tolerant the
algorithm is towards noise (on noisy and large data sets it may be desiable
algorithm is towards noise (on noisy and large data sets it may be desirable
to increase this parameter), the parameter ``eps`` is *crucial to choose
appropriately* for the data set and distance function and usually cannot be
left at the default value. It controls the local neighborhood of the points.
Expand Down
2 changes: 1 addition & 1 deletion doc/themes/scikit-learn-modern/javascript.html
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@
prevScrollpos = lastScrollTop;
};

/*** high preformance scroll event listener***/
/*** high performance scroll event listener***/
var raf = window.requestAnimationFrame ||
window.webkitRequestAnimationFrame ||
window.mozRequestAnimationFrame ||
Expand Down
2 changes: 1 addition & 1 deletion examples/manifold/plot_t_sne_perplexity.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
An illustration of t-SNE on the two concentric circles and the S-curve
datasets for different perplexity values.
We observe a tendency towards clearer shapes as the preplexity value increases.
We observe a tendency towards clearer shapes as the perplexity value increases.
The size, the distance and the shape of clusters may vary upon initialization,
perplexity values and does not always convey a meaning.
Expand Down
2 changes: 1 addition & 1 deletion examples/model_selection/plot_roc.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@
# .........................................
# The :func:`sklearn.metrics.roc_auc_score` function can be used for
# multi-class classification. The multi-class One-vs-One scheme compares every
# unique pairwise combination of classes. In this section, we calcuate the AUC
# unique pairwise combination of classes. In this section, we calculate the AUC
# using the OvR and OvO schemes. We report a macro average, and a
# prevalence-weighted average.
y_prob = classifier.predict_proba(X_test)
Expand Down
2 changes: 1 addition & 1 deletion examples/plot_changed_only_pprint_parameter.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
This example illustrates the use of the print_changed_only global parameter.
Setting print_changed_only to True will alterate the representation of
Setting print_changed_only to True will alternate the representation of
estimators to only show the parameters that have been set to non-default
values. This can be used to have more compact representations.
"""
Expand Down
2 changes: 1 addition & 1 deletion examples/plot_roc_curve_visualization_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
# We train a random forest classifier and create a plot comparing it to the SVC
# ROC curve. Notice how `svc_disp` uses
# :func:`~sklearn.metrics.RocCurveDisplay.plot` to plot the SVC ROC curve
# without recomputing the values of the roc curve itself. Futhermore, we
# without recomputing the values of the roc curve itself. Furthermore, we
# pass `alpha=0.8` to the plot functions to adjust the alpha values of the
# curves.
rfc = RandomForestClassifier(n_estimators=10, random_state=42)
Expand Down
2 changes: 1 addition & 1 deletion sklearn/decomposition/_incremental_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ def partial_fit(self, X, y=None, check_input=True):
self.mean_ = .0
self.var_ = .0

# Update stats - they are 0 if this is the fisrt step
# Update stats - they are 0 if this is the first step
col_mean, col_var, n_total_samples = \
_incremental_mean_and_var(
X, last_mean=self.mean_, last_variance=self.var_,
Expand Down
2 changes: 1 addition & 1 deletion sklearn/decomposition/_lda.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ class LatentDirichletAllocation(TransformerMixin, BaseEstimator):
evaluate_every : int, optional (default=0)
How often to evaluate perplexity. Only used in `fit` method.
set it to 0 or negative number to not evalute perplexity in
set it to 0 or negative number to not evaluate perplexity in
training at all. Evaluating perplexity can help you check convergence
in training process, but it will also increase total training time.
Evaluating perplexity in every iteration might increase training time
Expand Down
2 changes: 1 addition & 1 deletion sklearn/dummy.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ class DummyClassifier(MultiOutputMixin, ClassifierMixin, BaseEstimator):
.. versionchanged:: 0.22
The default value of `strategy` will change to "prior" in version
0.24. Starting from version 0.22, a warning will be raised if
`strategy` is not explicity set.
`strategy` is not explicitly set.
.. versionadded:: 0.17
Dummy Classifier now supports prior fitting strategy using
Expand Down
2 changes: 1 addition & 1 deletion sklearn/ensemble/_gb.py
Original file line number Diff line number Diff line change
Expand Up @@ -604,7 +604,7 @@ def _make_estimator(self, append=True):
raise NotImplementedError()

def _raw_predict_init(self, X):
"""Check input and compute raw predictions of the init estimtor."""
"""Check input and compute raw predictions of the init estimator."""
self._check_initialized()
X = self.estimators_[0, 0]._validate_X_predict(X, check_input=True)
if X.shape[1] != self.n_features_:
Expand Down
2 changes: 1 addition & 1 deletion sklearn/ensemble/_hist_gradient_boosting/loss.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def update_gradients_and_hessians(self, gradients, hessians, y_true,


class LeastAbsoluteDeviation(BaseLoss):
"""Least asbolute deviation, for regression.
"""Least absolute deviation, for regression.
For a given sample x_i, the loss is defined as::
Expand Down
2 changes: 1 addition & 1 deletion sklearn/ensemble/_stacking.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def _concatenate_predictions(self, X, predictions):
and `self.passthrough` is True, the output of `transform` will
be sparse.
This helper is in charge of ensuring the preditions are 2D arrays and
This helper is in charge of ensuring the predictions are 2D arrays and
it will drop one of the probability column when using probabilities
in the binary case. Indeed, the p(y|c=0) = 1 - p(y|c=1)
"""
Expand Down
2 changes: 1 addition & 1 deletion sklearn/ensemble/tests/test_weight_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,7 @@ def test_adaboostregressor_sample_weight():
X[-1] *= 10
y[-1] = 10000

# random_state=0 ensure that the underlying boostrap will use the outlier
# random_state=0 ensure that the underlying bootstrap will use the outlier
regr_no_outlier = AdaBoostRegressor(
base_estimator=LinearRegression(), n_estimators=1, random_state=0
)
Expand Down
2 changes: 1 addition & 1 deletion sklearn/externals/joblib/numpy_pickle.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Import necessary to preserve backward compatibliity of pickles
# Import necessary to preserve backward compatibility of pickles

from joblib.numpy_pickle import *
2 changes: 1 addition & 1 deletion sklearn/metrics/_plot/precision_recall_curve.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def plot(self, ax=None, name=None, **kwargs):
def plot_precision_recall_curve(estimator, X, y,
sample_weight=None, response_method="auto",
name=None, ax=None, **kwargs):
"""Plot Precision Recall Curve for binary classifers.
"""Plot Precision Recall Curve for binary classifiers.
Extra keyword arguments will be passed to matplotlib's `plot`.
Expand Down
2 changes: 1 addition & 1 deletion sklearn/metrics/_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -801,7 +801,7 @@ def mean_gamma_deviance(y_true, y_pred, sample_weight=None):
Gamma deviance is equivalent to the Tweedie deviance with
the power parameter `p=2`. It is invariant to scaling of
the target variable, and mesures relative errors.
the target variable, and measures relative errors.
Read more in the :ref:`User Guide <mean_tweedie_deviance>`.
Expand Down
2 changes: 1 addition & 1 deletion sklearn/neighbors/_quad_tree.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ cdef class _QuadTree:
return self._get_cell_ndarray()['is_leaf'][:self.cell_count]

def build_tree(self, X):
"""Build a tree from an arary of points X."""
"""Build a tree from an array of points X."""
cdef:
int i
DTYPE_t[3] pt
Expand Down
2 changes: 1 addition & 1 deletion sklearn/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ def _fit(self, X, y=None, **fit_params):
cloned_transformer = clone(transformer)
else:
cloned_transformer = clone(transformer)
# Fit or load from cache the current transfomer
# Fit or load from cache the current transformer
X, fitted_transformer = fit_transform_one_cached(
cloned_transformer, X, y, None,
message_clsname='Pipeline',
Expand Down
2 changes: 1 addition & 1 deletion sklearn/svm/src/liblinear/liblinear_helper.c
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ struct problem * csr_set_problem (char *X, int double_precision_X,
}


/* Create a paramater struct with and return it */
/* Create a parameter struct with and return it */
struct parameter *set_parameter(int solver_type, double eps, double C,
npy_intp nr_weight, char *weight_label,
char *weight, int max_iter, unsigned seed,
Expand Down
4 changes: 2 additions & 2 deletions sklearn/svm/src/libsvm/svm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -923,7 +923,7 @@ int Solver::select_working_set(int &out_i, int &out_j)
// return i,j such that
// i: maximizes -y_i * grad(f)_i, i in I_up(\alpha)
// j: minimizes the decrease of obj value
// (if quadratic coefficeint <= 0, replace it with tau)
// (if quadratic coefficient <= 0, replace it with tau)
// -y_j*grad(f)_j < -y_i*grad(f)_i, j in I_low(\alpha)

double Gmax = -INF;
Expand Down Expand Up @@ -1166,7 +1166,7 @@ int Solver_NU::select_working_set(int &out_i, int &out_j)
// return i,j such that y_i = y_j and
// i: maximizes -y_i * grad(f)_i, i in I_up(\alpha)
// j: minimizes the decrease of obj value
// (if quadratic coefficeint <= 0, replace it with tau)
// (if quadratic coefficient <= 0, replace it with tau)
// -y_j*grad(f)_j < -y_i*grad(f)_i, j in I_low(\alpha)

double Gmaxp = -INF;
Expand Down
4 changes: 2 additions & 2 deletions sklearn/svm/src/libsvm/svm.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ struct svm_model
int *sv_ind; /* index of support vectors */

double *rho; /* constants in decision functions (rho[k*(k-1)/2]) */
double *probA; /* pariwise probability information */
double *probA; /* pairwise probability information */
double *probB;

/* for classification only */
Expand All @@ -104,7 +104,7 @@ struct svm_csr_model
int *sv_ind; /* index of support vectors */

double *rho; /* constants in decision functions (rho[k*(k-1)/2]) */
double *probA; /* pariwise probability information */
double *probA; /* pairwise probability information */
double *probB;

/* for classification only */
Expand Down
3 changes: 2 additions & 1 deletion sklearn/tree/_classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -570,7 +570,8 @@ def feature_importances_(self):
Returns
-------
feature_importances_ : ndarray of shape (n_features,)
Normalized total reduction of critera by feature (Gini importance).
Normalized total reduction of criteria by feature
(Gini importance).
"""
check_is_fitted(self)

Expand Down
2 changes: 1 addition & 1 deletion sklearn/utils/_testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,7 @@ def all_estimators(type_filter=None):
-------
estimators : list of tuples
List of (name, class), where ``name`` is the class name as string
and ``class`` is the actuall type of the class.
and ``class`` is the actual type of the class.
"""
def is_abstract(c):
if not(hasattr(c, '__abstractmethods__')):
Expand Down
2 changes: 1 addition & 1 deletion sklearn/utils/deprecation.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ def _update_doc(self, olddoc):


def _is_deprecated(func):
"""Helper to check if func is wraped by our deprecated decorator"""
"""Helper to check if func is wrapped by our deprecated decorator"""
closures = getattr(func, '__closure__', [])
if closures is None:
closures = []
Expand Down
2 changes: 1 addition & 1 deletion sklearn/utils/graph_shortest_path.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ cdef np.ndarray dijkstra(dist_matrix,
graph, &heap, nodes)
else:
#use the csr -> csc sparse matrix conversion to quickly get
# both directions of neigbors
# both directions of neighbors
dist_matrix_T = dist_matrix.T.tocsr()

distances2 = np.asarray(dist_matrix_T.data,
Expand Down

0 comments on commit 0e10b3a

Please sign in to comment.