Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PowerTransformer leaves constant feature unchanged #26566

Merged

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Jun 12, 2023

When a feature is constant, Powertransformer with method="yeo-johnson" sets an unmeaningful lambda which arbitrarily scales the feature. I think it should be left unchanged in that case, and lambda should be set to 1 because the yeo-johnson transformation with lambda=1 correponds to the identity transformation.

This also fixes a couple of failing tests in the scipy-dev job because scipy now raises an error when the optimisation of lambda failed (scipy/scipy#17704), which is the case for constant features.

(Note: this is irrelevant for method="boxcox" because it does not support constant features from the start and scipy already used to raise an informative error message)

Partial fix for #26154.

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2023

For the sake of searchability, here is the error raised on the scipy-dev error log for test_power_transformer_yeojohnson_any_input:

scipy.optimize._optimize.BracketError: The algorithm terminated without finding a valid bracket. Consider trying different initial points.
_______________ test_power_transformer_yeojohnson_any_input[X3] ________________
[gw0] linux -- Python 3.11.3 /usr/share/miniconda/envs/testvenv/bin/python

X = array([[0., 0., 0., ..., 0., 0., 0.],
            fb = fc
            fc = fw
    
        # three conditions for a valid bracket
        cond1 = (fb < fc and fb <= fa) or (fb < fa and fb <= fc)
        cond2 = (xa < xb < xc or xc < xb < xa)
        cond3 = np.isfinite(xa) and np.isfinite(xb) and np.isfinite(xc)
        msg = ("The algorithm terminated without finding a valid bracket. "
               "Consider trying different initial points.")
        if not (cond1 and cond2 and cond3):
            e = BracketError(msg)
            e.data = (xa, xb, xc, fa, fb, fc, funcalls)
>           raise e
E           scipy.optimize._optimize.BracketError: The algorithm terminated without finding a valid bracket. Consider trying different initial points.

_gold      = 1.618034
_verysmall_num = 1e-21
args       = ()
cond1      = False
cond2      = True
cond3      = True
e          = BracketError('The algorithm terminated without finding a valid bracket. Consider trying different initial points.')
fa         = inf
fb         = inf
fc         = inf
func       = <function PowerTransformer._yeo_johnson_optimize.<locals>._neg_log_likelihood at 0x7f8ec46f3560>
funcalls   = 3
grow_limit = 110.0
iter       = 0
maxiter    = 1000
msg        = 'The algorithm terminated without finding a valid bracket. Consider trying different initial points.'
xa         = -2
xb         = 2
xc         = 8.472135999999999

/usr/share/miniconda/envs/testvenv/lib/python3.11/site-packages/scipy/optimize/_optimize.py:3047: BracketError

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

doc/whats_new/v1.3.rst Outdated Show resolved Hide resolved
@lesteve
Copy link
Member

lesteve commented Jun 14, 2023

Merging, thanks a lot!

@lesteve lesteve merged commit e5df5fe into scikit-learn:main Jun 14, 2023
21 checks passed
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants