Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand docs to include cut and past examples for different Regressors #150

Open
alex-s-gardner opened this issue Aug 31, 2023 · 3 comments

Comments

@alex-s-gardner
Copy link

I think we could reduce learning curves by including some cut and past examples for the various regressors... it would also be good to include some discussion of when one Regressor model might be more appropriate than another.

@tlienart
Copy link
Collaborator

tlienart commented Sep 1, 2023

I don't disagree with this but would add that initially MLJLM was mainly meant to be used through MLJ by average users and there are examples in the MLJ tutorials of the various common regressors.

For the latter part (when x is more appropriate) it's pretty tricky and debatable apart from pretty generic stuff. I don't think you'll find an opinionated view on whether some model with regularisation is better than some other without or vice versa, typically people should get a sense of what problem they're facing (eg big outliers) then shove several of the models they think might address that in HP tuning and pick the one they believe generalises better based on some metric.

The current philosophy has been to follow sklearn with minimal doc explaining the loss function and letting users figure out whether that matches what they need.

But as always, if someone here would like to edit the docs to make them better for users who would like a bit more, PRs are always welcome

@alex-s-gardner
Copy link
Author

alex-s-gardner commented Sep 1, 2023

What about something like this?

using MLJ
using MLJLinearModels
using Plots
using Random

# create data 
t = 1:0.01:10;
n = length(t);
gaussian_noise = randn(n) * 3;
outliers = rand((zeros(round(Int64, n/20))..., 6, -8, 100, -200, 178, -236, 77, -129, -50, -100, -45, -33, -114, -1929, -2000), n);

# measurment y
y = 10 .+ 10 * sin.(t) .+ 5 * t .+ gaussian_noise .+ outliers;

# design matrix 
X = hcat(ones(length(t)), sin.(t), t);

scale_penalty = false
fit_intercept = false

begin
    scatter(t, y; 
        markerstrokecolor=:match, 
        markerstrokewidth=0, 
        label = "observations", 
        ylim = (-70, 70),
        legend = :outerbottom,
        color = :grey,
        size = (700, 900)
    )

    # Base LSQ model fit
    println("Base Julia Linear Least Squares")
    @time θ = X \ y;
    plot!(t, X * θ, label="Base Julia Linear Least Squares", linewidth=2)

    regressor = LinearRegression(fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = HuberRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = RidgeRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = LassoRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = ElasticNetRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = QuantileRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = LADRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = GeneralizedLinearRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = RobustRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)
end
Base Julia Linear Least Squares
  0.000168 seconds (36 allocations: 97.016 KiB)
GeneralizedLinearRegression{L2Loss, NoPenalty}
  0.000119 seconds (41 allocations: 118.719 KiB)
GeneralizedLinearRegression{RobustLoss{HuberRho{0.5}}, ScaledPenalty{L2Penalty}}
  0.001772 seconds (525 allocations: 699.094 KiB)
GeneralizedLinearRegression{L2Loss, ScaledPenalty{L2Penalty}}
  0.000100 seconds (8 allocations: 21.984 KiB)
GeneralizedLinearRegression{L2Loss, ScaledPenalty{L1Penalty}}
  0.003497 seconds (2.40 k allocations: 2.931 MiB)
GeneralizedLinearRegression{L2Loss, CompositePenalty}
  0.008676 seconds (4.13 k allocations: 4.338 MiB)
GeneralizedLinearRegression{RobustLoss{QuantileRho{0.5}}, ScaledPenalty{L2Penalty}}
  0.000732 seconds (323 allocations: 240.594 KiB)
GeneralizedLinearRegression{RobustLoss{QuantileRho{0.5}}, ScaledPenalty{L2Penalty}}
  0.000718 seconds (323 allocations: 240.594 KiB)
GeneralizedLinearRegression{L2Loss, NoPenalty}
  0.000143 seconds (41 allocations: 118.719 KiB)
GeneralizedLinearRegression{RobustLoss{HuberRho{0.1}}, ScaledPenalty{L2Penalty}}
  0.001428 seconds (493 allocations: 660.344 KiB)

Screenshot 2023-09-01 at 10 48 43 AM

@tlienart
Copy link
Collaborator

tlienart commented Sep 4, 2023

I think that's very nice :) (some curves don't appear?) if you wanted to add a page of the sorts to the docs, that would be great.

Small notes to add would be: (1) 2D data is quite different from nD data so the intuition you build with 2D might sometimes not help know what works best for nD, better to try when in doubt (2) hyperparameter tuning is essential for most of these models (in fact a nice small addition would be a visual representation of what happens to a curve, say to the L1 regression, when the strength of the regulariser is increased).

But generally speaking though, if this helped you, then no doubt it'll help others and it should be in the docs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: priority low / straightforward
Development

No branches or pull requests

2 participants