Skip to content

Latest commit

 

History

History
1201 lines (933 loc) · 55.4 KB

v0.20.rst

File metadata and controls

1201 lines (933 loc) · 55.4 KB
.. currentmodule:: sklearn

Version 0.20.0

September, 2018

This release packs in a mountain of bug fixes, features and enhancements for the Scikit-learn library, and improvements to the documentation and examples. Thanks to our contributors!

Warning

Version 0.20 is the last version of scikit-learn to support Python 2.7 and Python 3.4. Scikit-learn 0.21 will require Python 3.5 or higher.

Highlights

We have tried to improve our support for common data-science use-cases including missing values, categorical variables, heterogeneous data, and features/targets with unusual distributions. Missing values in features, represented by NaNs, are now accepted in column-wise preprocessing such as scalers. Each feature is fitted disregarding NaNs, and data containing NaNs can be transformed. The new :mod:`impute` module provides estimators for learning despite missing data.

:class:`~compose.ColumnTransformer` handles the case where different features or columns of a pandas.DataFrame need different preprocessing. String or pandas Categorical columns can now be encoded with :class:`~preprocessing.OneHotEncoder` or :class:`~preprocessing.OrdinalEncoder`.

:class:`~compose.TransformedTargetRegressor` helps when the regression target needs to be transformed to be modeled. :class:`~preprocessing.PowerTransformer` and :class:`~preprocessing.KBinsDiscretizer` join :class:`~preprocessing.QuantileTransformer` as non-linear transformations.

Beyond this, we have added :term:`sample_weight` support to several estimators (including :class:`~cluster.KMeans`, :class:`~linear_model.BayesianRidge` and :class:`~neighbors.KernelDensity`) and improved stopping criteria in others (including :class:`~neural_network.MLPRegressor`, :class:`~ensemble.GradientBoostingRegressor` and :class:`~linear_model.SGDRegressor`).

This release is also the first to be accompanied by a :ref:`glossary` developed by `Joel Nothman`_. The glossary is a reference resource to help users and contributors become familiar with the terminology and conventions used in Scikit-learn.

Sorry if your contribution didn't make it into the highlights. There's a lot here...

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Known Major Bugs

  • :issue:`11924`: :class:`LogisticRegressionCV` with solver='lbfgs' and multi_class='multinomial' may be non-deterministic or otherwise broken on macOS. This appears to be the case on Travis CI servers, but has not been confirmed on personal MacBooks! This issue has been present in previous releases.

Changelog

Support for Python 3.3 has been officially dropped.

Multiple modules

Miscellaneous

Changes to estimator checks

These changes mostly affect library developers.