Support L2 regularization & cross validation for Classifier #135
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds an
l2_penalty
parameter toClassifier.fit
with a default value of 0.1— this is a change from the previous behavior, where there was no penalty by default.I initially tried to exactly imitate the behavior of scikit-learn's
C
inverse regularization parameter, but I couldn't quite figure out how they're computing the final loss. Based on their code it seems like they're doing some weird thing where they're summing the BCE loss over the samples rather than taking the average and this changes the scale of everything, making it dependent on the number of samples. But that didn't seem to give the exact same results either, so idk. I gave up on exactly imitating it— the tests only check that whenl2_penalty
is set to 0.0, the results are ~the same.This PR also adds a relatively well optimized
fit_cv
method that uses warm-starting to get at least a 2x speed up over a naive approach where you start from a zero initialization every time. I initially wanted to parallelize this code over the folds but this seemed like it would be a real pain in the ass that would complicate the code substantially, and I'm not sure there would be a significant speed boost at the end of the day (at least not without rewriting PyTorch's LBFGS optimizer which I don't want to do right now).There is a question of whether we want to use
fit_cv
by default intrain.py
. I think we probably should, but it does get us back into the territory whereClassifier
is taking up more compute than the actual VINC algorithm. At the moment the code does usefit_cv
and doesn't actually give you an option to turn this off (which should probably be changed).