made case class to deal with model selector metadata #39

leahmcguire · 2018-08-07T22:22:26Z

Related issues
metadata is terrible

Describe the proposed solution
we should have concrete classes to deal with it instead of nasty nested maps

Describe alternatives you've considered
leave the nasty nested maps and deal with them

codecov · 2018-08-07T22:50:53Z

Codecov Report

Merging #39 into master will increase coverage by 0.68%.
The diff coverage is 93.44%.

@@            Coverage Diff             @@
##           master      #39      +/-   ##
==========================================
+ Coverage   83.49%   84.18%   +0.68%     
==========================================
  Files         296      298       +2     
  Lines        9380     9749     +369     
  Branches      344      559     +215     
==========================================
+ Hits         7832     8207     +375     
+ Misses       1548     1542       -6

Impacted Files	Coverage Δ
...p/evaluators/OpBinaryClassificationEvaluator.scala	`81.57% <ø> (ø)`	⬆️
...lesforce/op/evaluators/OpRegressionEvaluator.scala	`91.66% <ø> (ø)`	⬆️
...tages/impl/preparators/SanityCheckerMetadata.scala	`92.2% <ø> (ø)`	⬆️
...op/evaluators/OpMultiClassificationEvaluator.scala	`94.66% <ø> (ø)`	⬆️
...ce/op/stages/impl/selector/ModelSelectorBase.scala	`98.87% <100%> (-0.02%)`	⬇️
...cala/com/salesforce/op/evaluators/Evaluators.scala	`96.72% <100%> (+3.61%)`	⬆️
...m/salesforce/op/stages/OpPipelineStageParams.scala	`91.17% <100%> (+0.26%)`	⬆️
.../salesforce/op/stages/impl/tuning/DataCutter.scala	`95.65% <100%> (ø)`	⬆️
...salesforce/op/stages/impl/tuning/OpValidator.scala	`98.55% <100%> (-0.03%)`	⬇️
...sforce/op/stages/impl/selector/ModelSelector.scala	`93.87% <100%> (+1.87%)`	⬆️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9155e7c...7082038. Read the comment docs.

tovbinm · 2018-08-08T21:28:39Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

+ BinaryClassEvalMetrics.withNameInsensitiveOption(name)
+ .orElse(MultiClassEvalMetrics.withNameInsensitiveOption(name))
+ .orElse(RegressionEvalMetrics.withNameInsensitiveOption(name))
+ .orElse(OpEvaluatorNames.withNameInsensitiveOption(name))


I think it weird to have evaluator names to be an evaluator metrics. perhaps lets just have evaluation metric and drop the OpEvaluatorNames completely. wdyt?

this is not a change I introduced it was always the case because there are metrics that are grouped by the evaluator. if we want to restructure it it should be in a separate PR

tovbinm · 2018-08-08T21:28:46Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

+}
+
+object EvalMetric {
+


docs please

tovbinm · 2018-08-08T21:29:12Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

+ *
+ * @param json encoded metrics
+ */
+ def fromJson(className: String, json: String): EvaluationMetrics = {


return type should be Try[EvaluationMetrics]

tovbinm · 2018-08-08T21:30:43Z

core/src/main/scala/com/salesforce/op/evaluators/OpEvaluatorBase.scala

+ def error(c: Class[_]) = throw new IllegalArgumentException(
+ s"Could not extract metrics of type $c from ${json.mkString(",")}"
+ )
+ className match {


I think its better to compare the full class instead of simple class name, i.e.

val metricsClass = ReflectionUtils.classForName(className).asInstanceOf[Class[_ <: EvaluationMetrics]] metricsClass match { case n if n == classOf[MultiMetrics] => case n if n == classOf[BinaryClassificationMetrics] => // etc }

it is totally unnecessary - I am going to delete this companion object and make this a private method inside the metadata so it is more clear how it is used and prevent people from trying to reuse it

tovbinm · 2018-08-08T21:32:53Z

core/src/main/scala/com/salesforce/op/ModelInsights.scala

@@ -521,15 +362,33 @@ case class Insights
 case object ModelInsights {
 @transient protected lazy val log = LoggerFactory.getLogger(this.getClass)

+ val SerFormats: Formats = Serialization.formats(FullTypeHints(List(


a bit more readable version:

val SerializationFormats: Formats = { val typeHints = FullTypeHints(List( classOf[Continuous], classOf[Discrete], classOf[DataBalancerSummary], classOf[DataCutterSummary], classOf[DataSplitterSummary], classOf[SingleMetric], classOf[MultiMetrics], classOf[BinaryClassificationMetrics], classOf[ThresholdMetrics], classOf[MultiClassificationMetrics], classOf[RegressionMetrics] )) val evalMetricsSerializer = new CustomSerializer[EvalMetric](_ => ( { case JString(s) => EvalMetric.withNameInsensitive(s) }, { case x: EvalMetric => JString(x.entryName) } ) ) Serialization.formats(typeHints) + EnumEntrySerializer.json4s[ValidationType](ValidationType) + EnumEntrySerializer.json4s[ProblemType](ProblemType) + new SpecialDoubleSerializer + evalMetricsSerializer }

tovbinm · 2018-08-08T21:43:56Z

core/src/main/scala/com/salesforce/op/stages/impl/tuning/Splitter.scala

+
+private[op] object SplitterSummary {
+ val ClassName: String = "className"
+ def fromMap(map: Map[String, Any]): SplitterSummary = {


below is slightly modified and a better version (pros: returns Try, uses class comparison and handles the default case):

def fromMap(map: Map[String, Any]): Try[SplitterSummary] = Try { val summaryClass = ReflectionUtils.classForName(map(ClassName).toString).asInstanceOf[Class[_ <: SplitterSummary]] summaryClass match { case s if s == classOf[DataSplitterSummary] => DataSplitterSummary() case s if s == classOf[DataBalancerSummary] => DataBalancerSummary( positiveLabels = map(ModelSelectorBaseNames.Positive).asInstanceOf[Long], negativeLabels = map(ModelSelectorBaseNames.Negative).asInstanceOf[Long], desiredFraction = map(ModelSelectorBaseNames.Desired).asInstanceOf[Double], upSamplingFraction = map(ModelSelectorBaseNames.UpSample).asInstanceOf[Double], downSamplingFraction = map(ModelSelectorBaseNames.DownSample).asInstanceOf[Double] ) case s if s == classOf[DataCutterSummary] => DataCutterSummary( labelsKept = map(ModelSelectorBaseNames.LabelsKept).asInstanceOf[Array[Double]], labelsDropped = map(ModelSelectorBaseNames.LabelsDropped).asInstanceOf[Array[Double]] ) case s => throw new IllegalArgumentException(s"Unrecognised splitter summary class: $s") } }

again this does not have general use it is only for metadata deser - that is why it is private i will move it and make it a private method

tovbinm · 2018-08-08T21:48:12Z

core/src/test/scala/com/salesforce/op/ModelInsightsTest.scala

+ }
+ }
+
+ implicit val doubleOptEquality = new Equality[Option[Double]] {


nice! how useful are these? do you think we should have them in a trait somewhere in com.salesforce.op.test in our test kit?

tovbinm · 2018-08-08T21:50:36Z

core/src/test/scala/com/salesforce/op/stages/impl/selector/ModelSelectorSummaryTest.scala

+ decoded.bestModelName shouldEqual summary.bestModelName
+ decoded.bestModelType shouldEqual summary.bestModelType
+ decoded.validationResults shouldEqual summary.validationResults
+ decoded.trainEvaluation.toJson() shouldEqual summary.trainEvaluation.toJson()


is this toJson() to avoid NaN issue?

trainEvaluation doesn't contain NaN, no?
Does scala know how to compare two complex case class objects?

tovbinm · 2018-08-08T21:50:49Z

core/src/test/scala/com/salesforce/op/stages/impl/tuning/OpValidatorTest.scala

@@ -107,7 +107,8 @@ class OpValidatorTest extends FlatSpec with TestSparkContext {
 assertFractions(Array(1 - p, p), train)
 assertFractions(Array(1 - p, p), validate)
 }
- balancer.get.metadataBuilder.build() should not be new MetadataBuilder().build()
+ println(balancer.get.summary)


remove println

…o lm/metadata

tovbinm · 2018-08-09T20:41:26Z

core/src/test/scala/com/salesforce/op/stages/impl/tuning/OpValidatorTest.scala

@@ -45,7 +45,8 @@ import org.apache.spark.sql.types.MetadataBuilder
 import com.salesforce.op.utils.spark.RichDataset._

 @RunWith(classOf[JUnitRunner])
-class OpValidatorTest extends FlatSpec with TestSparkContext {
+class
+OpValidatorTest extends FlatSpec with TestSparkContext {


remove redundant end line

…o lm/metadata

tovbinm

lgtm!!

kinfaikan · 2018-08-09T22:27:48Z

core/src/test/scala/com/salesforce/op/ModelInsightsTest.scala

+ def areEqual(a: Option[Double], b: Any): Boolean = b match {
+ case None => a.isEmpty
+ case s: Option[Double] => (a.exists(_.isNaN) && s.exists(_.isNaN)) ||
+ (a.nonEmpty && a.toSeq.zip(s.toSeq).forall{ case (n, m) => n == m })


Maybe (a.exists(_.isNaN) && s.exists(_.isNaN)) || (a == b)?

made case class to deal with model selector metadata

9151984

leahmcguire requested a review from tovbinm as a code owner August 7, 2018 22:22

leahmcguire requested a review from kinfaikan August 7, 2018 22:28

tovbinm and others added 3 commits August 8, 2018 09:05

Merge branch 'master' into lm/metadata

571a167

added error for metrics extraction

c1c7216

moving richparammap

9b56c15

tovbinm reviewed Aug 8, 2018

View reviewed changes

tovbinm and others added 4 commits August 8, 2018 15:12

Merge branch 'master' into lm/metadata

2d9f761

addressing comments

f4411a4

Merge branch 'lm/metadata' of github.com:salesforce/TransmogrifAI int…

875a991

…o lm/metadata

Merge branch 'master' into lm/metadata

8096a85

tovbinm reviewed Aug 9, 2018

View reviewed changes

leahmcguire added 6 commits August 9, 2018 13:54

made metric eval robust to extension

af415ed

Merge branch 'master' into lm/metadata

a40b135

Merge branch 'master' into lm/metadata

e6fd07a

fixed test

bf26e54

Merge branch 'lm/metadata' of github.com:salesforce/TransmogrifAI int…

42498da

…o lm/metadata

more test fixes

7082038

tovbinm approved these changes Aug 9, 2018

View reviewed changes

leahmcguire merged commit 8805218 into master Aug 9, 2018

leahmcguire deleted the lm/metadata branch August 9, 2018 21:59

kinfaikan reviewed Aug 9, 2018

View reviewed changes

ericwayman pushed a commit that referenced this pull request Feb 8, 2019

made case class to deal with model selector metadata (#39)

45164ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

made case class to deal with model selector metadata #39

made case class to deal with model selector metadata #39

leahmcguire commented Aug 7, 2018

codecov bot commented Aug 7, 2018 •

edited

Loading

tovbinm Aug 8, 2018

leahmcguire Aug 9, 2018

tovbinm Aug 8, 2018

tovbinm Aug 8, 2018

tovbinm Aug 8, 2018

leahmcguire Aug 9, 2018

tovbinm Aug 8, 2018

tovbinm Aug 8, 2018

leahmcguire Aug 9, 2018

tovbinm Aug 8, 2018

tovbinm Aug 8, 2018

kinfaikan Aug 9, 2018

tovbinm Aug 8, 2018

tovbinm Aug 9, 2018

tovbinm left a comment

kinfaikan Aug 9, 2018

made case class to deal with model selector metadata #39

made case class to deal with model selector metadata #39

Conversation

leahmcguire commented Aug 7, 2018

codecov bot commented Aug 7, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovbinm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 7, 2018 •

edited

Loading