New model selector interface #55

leahmcguire · 2018-08-14T23:11:31Z

Related issues
Changing model selector interface to use new more flexible model selector class.

Describe the proposed solution
Changing model selector interface to use new more flexible model selector class.

Describe alternatives you've considered
Leaving both interfaces. It would be confusing.

Additional context
Add any other context about the changes here.

tovbinm · 2018-08-14T23:21:50Z

@leahmcguire this PR is quite large. is there anything we can do here? perhaps split it or provide recommendations on how to review it.

leahmcguire · 2018-08-14T23:49:44Z

Not really - this is just all the tests and files that touched the ModelSelector interface

leahmcguire · 2018-08-14T23:51:10Z

I can walk you through the actual changes - they are actually surprising small

codecov · 2018-08-15T16:34:04Z

Codecov Report

Merging #55 into master will decrease coverage by 0.4%.
The diff coverage is 92.03%.

@@            Coverage Diff            @@
##           master     #55      +/-   ##
=========================================
- Coverage    86.3%   85.9%   -0.41%     
=========================================
  Files         298     294       -4     
  Lines        9305    9521     +216     
  Branches      303     535     +232     
=========================================
+ Hits         8031    8179     +148     
- Misses       1274    1342      +68

Impacted Files	Coverage Δ
...s/sparkwrappers/specific/SparkModelConverter.scala	`93.33% <ø> (ø)`	⬆️
...op/stages/impl/selector/ModelSelectorSummary.scala	`92.13% <ø> (ø)`	⬆️
...es/sparkwrappers/specific/OpPredictorWrapper.scala	`100% <ø> (ø)`	⬆️
...m/salesforce/op/stages/OpPipelineStageReader.scala	`65.62% <ø> (ø)`	⬆️
...p/evaluators/OpBinaryClassificationEvaluator.scala	`81.57% <ø> (ø)`	⬆️
...la/com/salesforce/op/stages/OpPipelineStages.scala	`73% <ø> (ø)`	⬆️
...com/salesforce/op/utils/stages/FitStagesUtil.scala	`94.73% <ø> (ø)`	⬆️
...lesforce/op/evaluators/OpRegressionEvaluator.scala	`91.66% <ø> (ø)`	⬆️
...alesforce/op/stages/impl/tuning/DataBalancer.scala	`96.19% <ø> (ø)`	⬆️
.../salesforce/op/stages/impl/tuning/DataCutter.scala	`95.65% <ø> (ø)`	⬆️
... and 46 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77a6adc...f4c5f43. Read the comment docs.

leahmcguire · 2018-08-15T18:35:04Z

...n/scala/com/salesforce/op/stages/impl/classification/BinaryClassificationModelSelector.scala

+ modelsAndParameters: Seq[(EstimatorType, Array[ParamMap])]
+ ): ModelSelector[ModelType, EstimatorType] = {
+ val modelStrings = modelTypesToUse.map(_.entryName)
+ val modelsToUse = if (modelsAndParameters == defaultModelsAndParams) {


remove check

leahmcguire · 2018-08-15T18:38:03Z

...n/scala/com/salesforce/op/stages/impl/classification/BinaryClassificationModelSelector.scala

- val cv = new OpCrossValidation[ProbClassifierModel, ProbClassifier](
+ parallelism: Int = ValidatorParamDefaults.Parallelism,
+ modelTypesToUse: Seq[_ <: BinaryClassificationModelsToTry] = modelNames,
+ modelsAndParameters: Seq[(EstimatorType, Array[ParamMap])] = defaultModelsAndParams


update docs

leahmcguire · 2018-08-15T18:43:09Z

...n/scala/com/salesforce/op/stages/impl/classification/BinaryClassificationModelSelector.scala

- ): BinaryClassificationModelSelector = {
- val ts = new OpTrainValidationSplit[ProbClassifierModel, ProbClassifier](
+ parallelism: Int = ValidatorParamDefaults.Parallelism,
+ modelTypesToUse: Seq[_ <: BinaryClassificationModelsToTry] = modelNames,


split methods

tovbinm · 2018-08-15T18:24:00Z

cli/src/main/scala/com/salesforce/op/cli/gen/templates/BinaryClassificationTemplate.scala

 .setInput(label, checkedFeatures)
 .getOutput()

 val evaluator =
 Evaluators.BinaryClassification()
- .setLabelCol(label).setPredictionCol(pred).setRawPredictionCol(raw)
+ .setLabelCol(label).setFullPredictionCol(pred)


let's rename setFullPredictionCol to setPredictionCol

tovbinm · 2018-08-15T18:33:06Z

...n/scala/com/salesforce/op/stages/impl/classification/BinaryClassificationModelSelector.scala



 /**
 * A factory for Binary Classification Model Selector
 */
 case object BinaryClassificationModelSelector {

+ private[op] val modelNames: Seq[_ <: BinaryClassificationModelsToTry] = Seq(MTT.OpLogisticRegression,


I think you can simply do Seq[BinaryClassificationModelsToTry] here and everywhere for other selectors

tovbinm · 2018-08-15T18:47:37Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala

+ case m => setDefault(sparkMlStage, Option(m))
+ }
+
+ lazy val recoveredStage: ModelType = getSparkMlStage() match {


private transient lazy val recoveredStage

…nsmogrifAI into lm/modelSelectorInterface

michaelweilsalesforce · 2018-08-16T23:04:23Z

@leahmcguire This is not be related to this PR, but I'm trying to understand the type safety in the models to try.
Here is an example : What happens if a (witty) user decides to do something this :
BinaryClassificationModelSelector.withCrossValidation(modelsAndParameters = Seq(new OpLinearRegression() -> Array.empty)... ?
In a word, it is a ClassificationModelSelector trying a regression model. Will it break? Will it do a kind of regression Model selector?

Should we consider in the future enforcing a type 'Classification' for OpLogisticRegression, OpRandomForest,... and a type Regression for OpLinearRegression, ... ?

leahmcguire · 2018-08-17T00:40:46Z

It is an not part of this PR but you are correct @michaelweilsalesforce there is no compile time type check for this now. It will fail at runtime because the evaluator will not find a raw prediction / probability. In order to support users being able to define their own estimators I had to relax the type checks. Any estimator that takes a label and feature vector and returns a prediction will try to run. So the default models and models that can be turned on by name are all of the correct type. A user can mess it up if they try :-P Good eye :-)

kinfaikan · 2018-08-17T00:48:01Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

- final val predictionCol: Param[String] = new Param[String](this, "predictionCol", "prediction column name")
- setDefault(predictionCol, "prediction")
+trait OpHasPredictionValueCol[T <: FeatureType] extends Params {
+ final val predictionValueCol: Param[String] = new Param[String](this, "predictionCol", "prediction column name")


"predictionCol" -> "predictionValueCol"?

yes and fullPredictionCol -> predictionCol

kinfaikan · 2018-08-17T00:48:52Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

-trait OpHasFullPredictionCol extends Params {
- final val fullPredictionCol: Param[String] = new Param[String](this, "fullPredictionCol", "prediction column name")
+trait OpHasPredictionCol extends Params {
+ final val predictionCol: Param[String] = new Param[String](this, "fullPredictionCol", "prediction column name")


Update name and doc of the param?
Add setDefault?

there is no default that can be set for this

kinfaikan · 2018-08-17T00:48:58Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

- !(isSet(predictionCol) && data.schema.fieldNames.contains(getPredictionCol))) {
- val fullPredictionColName = getFullPredictionCol
+ if (isSet(predictionCol) &&
+ !(isSet(predictionValueCol) && data.schema.fieldNames.contains(getPredictionValueCol))) {


data.columns.contains(getPredictionValueCol)

kinfaikan · 2018-08-17T00:49:04Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

- !(isSet(predictionCol) && data.schema.fieldNames.contains(getPredictionCol))) {
- val fullPredictionColName = getFullPredictionCol
+ if (isSet(predictionCol) &&
+ !(isSet(predictionValueCol) && data.schema.fieldNames.contains(getPredictionValueCol))) {


data.columns.contains(getPredictionValueCol)

kinfaikan · 2018-08-17T00:49:13Z

...n/scala/com/salesforce/op/stages/impl/classification/BinaryClassificationModelSelector.scala

+ * @param modelTypesToUse list of model types to run grid search on must from supported types in
+ * BinaryClassificationModelsToTry (OpLogisticRegression, OpRandomForestClassifier,
+ * OpGBTClassifier, OpLinearSVC, OpDecisionTreeClassifier, OpNaiveBayes)
+ * @param modelsAndParameters pass in an explicit list pairs of estimators and the accompanying hyper parameters to


hyper parameters -> hyperparameters

kinfaikan · 2018-08-17T00:49:19Z

...n/scala/com/salesforce/op/stages/impl/classification/BinaryClassificationModelSelector.scala

+ val modelStrings = modelTypesToUse.map(_.entryName)
+ val modelsToUse =
+ if (modelsAndParameters == defaultModelsAndParams || modelTypesToUse != modelNames) modelsAndParameters
+ .filter{ case (e, p) => modelStrings.contains(e.getClass.getSimpleName) }


To use a proper subset of the default models, one has to specify modelsAndParameters explicitly?

you can specify a subset of the model types using the modelTypesToUse parameter. To change the hyperparameters as well as the model types you have to specify the modelsAndParameters

kinfaikan · 2018-08-17T00:49:27Z

...n/scala/com/salesforce/op/stages/impl/classification/BinaryClassificationModelSelector.scala

-) extends Stage1ClassificationModelSelector(validator, splitter, evaluators, uid, stage2uid, stage3uid)
+object BinaryClassificationModelsToTry extends Enum[BinaryClassificationModelsToTry] {
+ val values = findValues
+ case object OpLogisticRegression extends BinaryClassificationModelsToTry


Should we associate each BinaryClassificationModelsToTry with the corresponding estimator and default params?

case object OpLogisticRegression extends BinaryClassificationModelsToTry { val estimator = new OpLogisticRegression() val params = new ParamGridBuilder() .addGrid(estimator.fitIntercept, DefaultSelectorParams.FitIntercept) .addGrid(estimator.elasticNetParam, DefaultSelectorParams.ElasticNet) .addGrid(estimator.maxIter, DefaultSelectorParams.MaxIterLin) .addGrid(estimator.regParam, DefaultSelectorParams.Regularization) .addGrid(estimator.standardization, DefaultSelectorParams.Standardized) .addGrid(estimator.tol, DefaultSelectorParams.Tol) .build() }

Also, we might let users define custom BinaryClassificationModelsToTry?

I can make the base class public for BinaryClassificationModelsToTry. And I could add a sub value to have the class they are associated with rather than relying on the name...

tovbinm · 2018-08-20T22:27:09Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelectorNames.scala

+
+ type ModelType = Model[_ <: Model[_]] with OpTransformer2[RealNN, OPVector, Prediction]
+ type EstimatorType = Estimator[_ <: Model[_]] with OpPipelineStage2[RealNN, OPVector, Prediction]
+


@leahmcguire totally minor - wanna add type ModelSelector = ModelSelector[ModelType, EstimatorType]?

tovbinm

lgtm!!

salesforce-cla · 2020-12-10T01:57:43Z

Thanks for the contribution! Unfortunately we can't verify the commit author(s): leahmcguire <l***@s***.com> Leah McGuire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

leahmcguire and others added 10 commits August 7, 2018 15:19

made case class to deal with model selector metadata

9151984

Merge branch 'master' into lm/metadata

571a167

deleted classes and started modifying interfaces

c2df9b7

minor

dff2139

updated interfaces

6ed8ec4

merged

126f785

all tests compile

9570ef5

trying to fix serialization

1f7bb14

fixed serialization and some tests

02087aa

fixed tests

4bd80fd

leahmcguire requested a review from tovbinm as a code owner August 14, 2018 23:11

leahmcguire and others added 4 commits August 14, 2018 16:58

merged

d448529

fixed last 2 tests

a6f2e32

Merge branch 'master' into lm/modelSelectorInterface

3f0ccbb

Merge branch 'master' into lm/modelSelectorInterface

4eeeaea

leahmcguire assigned Jauntbox and kinfaikan Aug 15, 2018

leahmcguire and others added 3 commits August 15, 2018 11:14

removed full default MS runs from test

d2860e2

fixed type

85744f8

Merge branch 'master' into lm/modelSelectorInterface

aedd70c

tovbinm changed the title ~~Lm/model selector interface~~ New model selector interface Aug 15, 2018

leahmcguire commented Aug 15, 2018

View reviewed changes

tovbinm reviewed Aug 15, 2018

View reviewed changes

test fix

0e322e4

Merge branch 'lm/modelSelectorInterface' of github.com:salesforce/Tra…

8ffef89

…nsmogrifAI into lm/modelSelectorInterface

leahmcguire assigned mweilsalesforce Aug 16, 2018

leahmcguire requested review from Jauntbox, kinfaikan and mweilsalesforce August 16, 2018 18:33

leahmcguire unassigned Jauntbox, kinfaikan and mweilsalesforce Aug 16, 2018

kinfaikan reviewed Aug 17, 2018

View reviewed changes

leahmcguire added 4 commits August 20, 2018 11:53

fixed evaluator param names and added custom to enums

99c32eb

made class private comstructor for enum

3332889

merged

d57c9ca

fixing typos

f4c5f43

tovbinm reviewed Aug 20, 2018

View reviewed changes

tovbinm approved these changes Aug 20, 2018

View reviewed changes

tovbinm merged commit c0d6ecb into master Aug 20, 2018

tovbinm deleted the lm/modelSelectorInterface branch August 20, 2018 22:31

ericwayman pushed a commit that referenced this pull request Feb 8, 2019

New model selector interface (#55)

d76823a

salesforce-cla bot added the cla:signed label Jun 28, 2020

salesforce-cla bot added cla:missing and removed cla:signed labels Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New model selector interface #55

New model selector interface #55

leahmcguire commented Aug 14, 2018

tovbinm commented Aug 14, 2018

leahmcguire commented Aug 14, 2018

leahmcguire commented Aug 14, 2018

codecov bot commented Aug 15, 2018 •

edited

Loading

leahmcguire Aug 15, 2018

leahmcguire Aug 15, 2018

leahmcguire Aug 15, 2018

tovbinm Aug 15, 2018

tovbinm Aug 15, 2018

tovbinm Aug 15, 2018

michaelweilsalesforce commented Aug 16, 2018 •

edited

Loading

leahmcguire commented Aug 17, 2018

kinfaikan Aug 17, 2018

leahmcguire Aug 17, 2018 •

edited

Loading

kinfaikan Aug 17, 2018

leahmcguire Aug 20, 2018

kinfaikan Aug 17, 2018

kinfaikan Aug 17, 2018

kinfaikan Aug 17, 2018

kinfaikan Aug 17, 2018

leahmcguire Aug 20, 2018

kinfaikan Aug 17, 2018

leahmcguire Aug 17, 2018

tovbinm Aug 20, 2018

tovbinm left a comment

salesforce-cla bot commented Dec 10, 2020


		type ModelType = Model[_ <: Model[_]] with OpTransformer2[RealNN, OPVector, Prediction]
		type EstimatorType = Estimator[_ <: Model[_]] with OpPipelineStage2[RealNN, OPVector, Prediction]

New model selector interface #55

New model selector interface #55

Conversation

leahmcguire commented Aug 14, 2018

tovbinm commented Aug 14, 2018

leahmcguire commented Aug 14, 2018

leahmcguire commented Aug 14, 2018

codecov bot commented Aug 15, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelweilsalesforce commented Aug 16, 2018 • edited Loading

leahmcguire commented Aug 17, 2018

Choose a reason for hiding this comment

leahmcguire Aug 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovbinm left a comment

Choose a reason for hiding this comment

salesforce-cla bot commented Dec 10, 2020

codecov bot commented Aug 15, 2018 •

edited

Loading

michaelweilsalesforce commented Aug 16, 2018 •

edited

Loading

leahmcguire Aug 17, 2018 •

edited

Loading