Model combiner #385

leahmcguire · 2019-08-19T18:38:50Z

Related issues
Want to do different feature engineering for different model types or do model combination. Basically this allows two different feature engineering flows to be passed into different model selectors (eg you want to do one kind of feature eng for random forest and one for regression but want grid search for each). The accuracy of the models can be compared and then the predictions willbe combined by either taking the best model, equal combination, or weighted combination of the predictions.

Describe the proposed solution
Made a stage which can combine the results of two model selectors to produce a new score. The inputs are the two prediction outputs and the label. The output is a new prediction which is the combined result of the input predictions. The metadata is updated to reflect the combined prediction.

codecov · 2019-08-19T19:50:58Z

Codecov Report

Merging #385 into master will decrease coverage by 37.07%.
The diff coverage is 0.66%.

@@             Coverage Diff             @@
##           master     #385       +/-   ##
===========================================
- Coverage   86.82%   49.75%   -37.08%     
===========================================
  Files         336      337        +1     
  Lines       10962    11076      +114     
  Branches      346      588      +242     
===========================================
- Hits         9518     5511     -4007     
- Misses       1444     5565     +4121

Impacted Files	Coverage Δ
...salesforce/op/evaluators/OpForecastEvaluator.scala	`0% <ø> (-96.43%)`	⬇️
...orce/op/stages/impl/tuning/OpCrossValidation.scala	`0% <ø> (-97.68%)`	⬇️
...ssification/MultiClassificationModelSelector.scala	`0% <ø> (-97.57%)`	⬇️
...op/evaluators/OpMultiClassificationEvaluator.scala	`0% <ø> (-94.74%)`	⬇️
...ages/impl/regression/RegressionModelSelector.scala	`0% <ø> (-98.15%)`	⬇️
...salesforce/op/stages/impl/tuning/OpValidator.scala	`0% <ø> (-94.6%)`	⬇️
...p/evaluators/OpBinaryClassificationEvaluator.scala	`0% <ø> (-82.5%)`	⬇️
...sification/BinaryClassificationModelSelector.scala	`0% <ø> (-98.25%)`	⬇️
...rce/op/stages/impl/preparators/SanityChecker.scala	`0% <ø> (-91.6%)`	⬇️
...m/salesforce/op/utils/spark/OpVectorMetadata.scala	`85.45% <ø> (ø)`	⬆️
... and 163 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b91ffe3...41c6654. Read the comment docs.

core/src/main/scala/com/salesforce/op/stages/impl/preparators/SanityCheckerMetadata.scala

tovbinm · 2019-08-20T21:13:35Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+ override def hashCode(): Int = super.hashCode()
+}
+
+object CombinationStrategy extends Enum[CombinationStrategy] {


I think this enum should be defined in feature module and registered with the json in OpPipelineStageReadWriteFormats

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

core/src/main/scala/com/salesforce/op/ModelInsights.scala

core/src/main/scala/com/salesforce/op/stages/impl/preparators/SanityCheckerMetadata.scala

Jauntbox · 2019-08-20T21:09:15Z

core/src/main/scala/com/salesforce/op/stages/impl/preparators/SanityCheckerMetadata.scala

@@ -313,10 +316,22 @@ case class Correlations
 corrMeta.putString(SanityCheckerNames.CorrelationType, corrType.sparkName)
 corrMeta.build()
 }
+
+ private[op] def +(corr: Correlations): Correlations =
+ new Correlations(featuresIn ++ corr.featuresIn, values ++ corr.values, nanCorrs ++ corr.nanCorrs, corrType)


Would the Correlations have to have the same correlation type for this to make sense?

yes- but I am not sure it makes sense to error out on that... should I maybe log a warning?

added custom to reflect if have multiple corr types

Jauntbox · 2019-08-20T21:25:51Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+ if (m2e1.nonEmpty) {
+ (getMetricValue(summary1.trainEvaluation, eval1), m2e1, eval1)
+ } else if (m1e2.nonEmpty) {
+ (m1e2, getMetricValue(summary2.trainEvaluation, eval2))


Don't you need an eval2 at the end of the tuple here? How did the Scala compiler miss that actually??

Can you add some tests that cover some of these weird edge cases? I feel like there's a lot of new stuff being added.

Jauntbox · 2019-08-20T21:39:01Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+
+ lazy val labelColName: String = in1.name
+
+ @transient private var evaluatorList: Seq[OpEvaluatorBase[_ <: EvaluationMetrics]] = Seq.empty


Why does this need to be transient?

so that we can save the model

@leahmcguire I think you need to write a custom serializer for SelectedCombinerModel that would serialize and deserialize the evaluators top make this work.

I wonder why OpEstimatorSpec base class did not catch it?!?

matthew this is exactly how it works in the model selector - we dont need these in the serialized model as they are only called during training

So no we do not need a custom serializer - evaluators do not need to be serialized

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

Jauntbox · 2019-08-20T21:41:56Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+ }
+}
+
+object CombinationStrategy extends Enum[CombinationStrategy] {


Docs here too

features/src/main/scala/com/salesforce/op/utils/spark/OpVectorMetadata.scala

tovbinm · 2019-08-27T18:27:31Z

core/src/main/scala/com/salesforce/op/evaluators/EvaluationMetrics.scala

 override def entryName: String = name.toLowerCase
 }
+ def withName(name: String, isLargerBetter: Boolean): OpEvaluatorNames =
+ Try(super.withName(name)).getOrElse(Custom(name, name, isLargerBetter))


Avoid Try and exceptions in simple program flows, use super.withNameOption instead

tovbinm · 2019-08-27T18:28:12Z

core/src/main/scala/com/salesforce/op/evaluators/EvaluationMetrics.scala

+
+ def withNameInsensitive(name: String, isLargerBetter: Boolean): OpEvaluatorNames =
+ super.withNameInsensitiveOption(name).getOrElse(Custom(name, name, isLargerBetter))
+ override def withNameInsensitive(name: String): OpEvaluatorNames = withNameInsensitive(name, true)


why do we want to have the default of true for isLargerBetter value? perhaps let's avoid specifying any defaults for withName and withNameInsensitive methods.

Instead it's better to have the method:

def withNameOrDefault(name: String, default: String => OpEvaluatorNames => name => Custom(name, name, isLargerBetter)): OpEvaluatorNames = { super.withNameInsensitiveOption(name).getOrElse(default) }

these are base methods @tovbinm , hence the override. if they want the fallback behavior your method provides they can use my definitions - but what happens when people call the base methods and there is no default?

tovbinm

See some of my comments.

Another question: would you think we should add a shortcut for it?

perhaps?

val (pred1, pred2) = ...
val combinedPred = label.combinePredictions(pred1, pred2)

tovbinm · 2019-08-27T18:29:37Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+ * @param operationName name of operation
+ * @param uid stage uid
+ */
+class SelectedCombiner


SelectedModelCombiner or ModelCombiner perhaps? it's difficult to understand what this stage does otherwise.

tovbinm · 2019-08-27T18:30:23Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+ isValid = (in: String) => CombinationStrategy.values.map(_.entryName).contains(in)
+ )
+ def setCombinationStrategy(value: CombinationStrategy): this.type = set(combinationStrategy, value.entryName)
+ def getCombinationStrategy(): CombinationStrategy = CombinationStrategy.namesToValuesMap($(combinationStrategy))


CombinationStrategy.withNameIncensitive($(combinationStrategy))

tovbinm · 2019-08-27T18:33:19Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+
+ def getMetricValue(metrics: EvaluationMetrics, name: EvalMetric) =
+ metrics.toMap.collectFirst{
+ case (k, v) if k.contains(name.humanFriendlyName) || k.contains(name.entryName) => v.asInstanceOf[Double]}


metrics.toMap.collectFirst { case (k, v: Double) if k.contains(name.humanFriendlyName) || k.contains(name.entryName) => v }

tovbinm · 2019-08-27T18:34:20Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+ } else (None, None, eval1)
+ }
+
+ def makeMeta(model: SelectedCombinerModel): Unit = {


can you please make these private methods on the stage instead of them being nested inside fit?

tovbinm · 2019-08-27T18:35:13Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+ dataPrepResults = summary1.dataPrepResults.orElse(summary2.dataPrepResults),
+ evaluationMetric = metricName,
+ problemType = summary1.problemType,
+ bestModelUID = summary1.bestModelUID + " " + summary2.bestModelUID,


is space a good separator here? perhaps _ or - instead?

space will be parsable to get them separated while "-" or "_" would not

tovbinm · 2019-08-27T18:36:14Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+
+ val strategy = getCombinationStrategy()
+ val (weight1, weight2) = strategy match {
+ case CombinationStrategy.Best =>


this can be a method on CombinationStrategy itself. e.g. val (weight1, weight2) = getCombinationStrategy().computeWeights(metricValue1, metricValue2)

it actually cant unless I move the EvalMetric class to the features module

tovbinm · 2019-08-27T18:39:21Z

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala

+
+ lazy val labelColName: String = in1.name
+
+ @transient private var evaluatorList: Seq[OpEvaluatorBase[_ <: EvaluationMetrics]] = Seq.empty


@leahmcguire I think you need to write a custom serializer for SelectedCombinerModel that would serialize and deserialize the evaluators top make this work.

I wonder why OpEstimatorSpec base class did not catch it?!?

…modelCombiner

…I into lm/modelCombiner

leahmcguire · 2019-08-30T21:38:35Z

@tovbinm @Jauntbox please take a look

Bug fixes: - Ensure correct metrics despite model failures on some CV folds [#404](#404) - Fix flaky `ModelInsight` tests [#395](#395) - Avoid creating `SparseVector`s for LOCO [#377](#377) New features / updates: - Model combiner [#385](#399) - Added new sample for HousingPrices [#365](#365) - Test to verify that custom metrics appear in model insight metrics [#387](#387) - Add `FeatureDistribution` to `SerializationFormat`s [#383](#383) - Add metadata to `OpStandadrdScaler` to allow for descaling [#378](#378) - Improve json serde error in `evalMetFromJson` [#380](#380) - Track mean & standard deviation as metrics for numeric features and for text length of text features [#354](#354) - Making model selectors robust to failing models [#372](#372) - Use compact and compressed model json by default [#375](#375) - Descale feature contribution for Linear Regression & Logistic Regression [#345](#345) Dependency updates: - Update tika version [#382](#382)

salesforce-cla · 2021-04-04T18:35:54Z

Thanks for the contribution! Unfortunately we can't verify the commit author(s): leahmcguire <l***@s***.com> Leah McGuire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

leahmcguire added 4 commits August 13, 2019 13:59

made model combiner

046dbef

got class test passing

ba8e587

need to fix insights for multiple sanity checkers

315744e

fixed model insights

6df8478

leahmcguire requested a review from tovbinm as a code owner August 19, 2019 18:38

leahmcguire requested review from Jauntbox and crupley August 19, 2019 18:39

merged

2dfec2f

tovbinm changed the title ~~Lm/model combiner~~ Model combiner Aug 19, 2019

leahmcguire added 2 commits August 19, 2019 12:26

style

82f130f

style

d3832ed

leahmcguire added 3 commits August 19, 2019 13:38

removed println

7690826

cleanup

b39da7c

minor fix

4f28cca

leahmcguire added the ready for review label Aug 20, 2019

tovbinm reviewed Aug 20, 2019

View reviewed changes

core/src/main/scala/com/salesforce/op/stages/impl/preparators/SanityCheckerMetadata.scala Show resolved Hide resolved

tovbinm reviewed Aug 20, 2019

View reviewed changes

core/src/main/scala/com/salesforce/op/stages/impl/selector/SelectedCombiner.scala Outdated Show resolved Hide resolved

Jauntbox suggested changes Aug 20, 2019

View reviewed changes

leahmcguire and others added 5 commits August 22, 2019 10:10

updated EinsteinAppArgs to be based on of MlAppArgs base class

63bf74b

added test for mismatched metrics

5bf6299

merged

e57fe22

fixed custom names

4e36467

Merge branch 'master' into lm/modelCombiner

96367d5

tovbinm reviewed Aug 27, 2019

View reviewed changes

leahmcguire added 2 commits August 29, 2019 13:43

Merge branch 'master' of github.com:salesforce/TransmogrifAI into lm/…

712e84e

…modelCombiner

adressing comments

6139bd0

Merge branch 'lm/modelCombiner' of github.com:salesforce/TransmogrifA…

7d27995

…I into lm/modelCombiner

leahmcguire requested review from gerashegalov and wsuchy as code owners August 29, 2019 22:00

Jauntbox approved these changes Sep 3, 2019

View reviewed changes

Merge branch 'master' into lm/modelCombiner

41c6654

leahmcguire merged commit 51037a8 into master Sep 3, 2019

leahmcguire deleted the lm/modelCombiner branch September 3, 2019 18:04

salesforce-cla bot added the cla:missing label Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model combiner #385

Model combiner #385

leahmcguire commented Aug 19, 2019 •

edited

Loading

codecov bot commented Aug 19, 2019 •

edited

Loading

tovbinm Aug 20, 2019

Jauntbox Aug 20, 2019

leahmcguire Aug 20, 2019

leahmcguire Aug 30, 2019

Jauntbox Aug 20, 2019

leahmcguire Aug 29, 2019

Jauntbox Aug 20, 2019

leahmcguire Aug 20, 2019

tovbinm Aug 27, 2019

leahmcguire Aug 29, 2019

leahmcguire Aug 30, 2019

Jauntbox Aug 20, 2019

leahmcguire Aug 29, 2019

tovbinm Aug 27, 2019

tovbinm Aug 27, 2019

leahmcguire Aug 29, 2019

tovbinm left a comment •

edited

Loading

tovbinm Aug 27, 2019

tovbinm Aug 27, 2019

tovbinm Aug 27, 2019

tovbinm Aug 27, 2019

tovbinm Aug 27, 2019

leahmcguire Aug 29, 2019

tovbinm Aug 27, 2019

leahmcguire Aug 29, 2019

tovbinm Aug 27, 2019

leahmcguire commented Aug 30, 2019

salesforce-cla bot commented Apr 4, 2021


		lazy val labelColName: String = in1.name

		@transient private var evaluatorList: Seq[OpEvaluatorBase[_ <: EvaluationMetrics]] = Seq.empty

Model combiner #385

Model combiner #385

Conversation

leahmcguire commented Aug 19, 2019 • edited Loading

codecov bot commented Aug 19, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovbinm left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leahmcguire commented Aug 30, 2019

salesforce-cla bot commented Apr 4, 2021

leahmcguire commented Aug 19, 2019 •

edited

Loading

codecov bot commented Aug 19, 2019 •

edited

Loading

tovbinm left a comment •

edited

Loading