Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update confusion_matrix #545

Closed
wants to merge 3 commits into from
Closed

Conversation

swkasula
Copy link
Collaborator

@swkasula swkasula commented Mar 5, 2021

Related issues
De-serializing the ConfMatrices: Seq[Seq[Long]] resulting in ClassCastException as there was no @JsonDeserialize
annotation used for this attribute: https://github.com/salesforce/TransmogrifAI/pull/533/files

Describe the proposed solution
Fix is it update the ConfMatrices format from Seq[Seq[Long]] -> Seq[ConfusionMatrixPerThreshold] to de-serialize in the right format

@salesforce-cla
Copy link

salesforce-cla bot commented Mar 5, 2021

Thanks for the contribution! It looks like @swkasula is an internal user so signing the CLA is not required. However, we need to confirm this.

@codecov
Copy link

codecov bot commented Mar 5, 2021

Codecov Report

Merging #545 (012e20d) into master (975eb2f) will increase coverage by 86.78%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #545       +/-   ##
===========================================
+ Coverage        0   86.78%   +86.78%     
===========================================
  Files           0      347      +347     
  Lines           0    12026    +12026     
  Branches        0      403      +403     
===========================================
+ Hits            0    10437    +10437     
- Misses          0     1589     +1589     
Impacted Files Coverage Δ
...op/evaluators/OpMultiClassificationEvaluator.scala 95.80% <100.00%> (ø)
...e/op/stages/impl/feature/SmartTextVectorizer.scala 95.33% <0.00%> (ø)
...com/salesforce/op/local/OpWorkflowModelLocal.scala 100.00% <0.00%> (ø)
...force/op/stages/impl/feature/OpStringIndexer.scala 100.00% <0.00%> (ø)
.../stages/impl/feature/OpStringIndexerNoFilter.scala 100.00% <0.00%> (ø)
...com/salesforce/op/test/PassengerFeaturesTest.scala 90.00% <0.00%> (ø)
...ala/com/salesforce/op/readers/CSVAutoReaders.scala 100.00% <0.00%> (ø)
...a/com/salesforce/op/test/OpPipelineStageSpec.scala 97.95% <0.00%> (ø)
...la/com/salesforce/op/stages/OpPipelineStages.scala 63.88% <0.00%> (ø)
...ain/scala/com/salesforce/op/cli/SchemaSource.scala 87.93% <0.00%> (ø)
... and 338 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 975eb2f...012e20d. Read the comment docs.

@swkasula swkasula requested a review from feifjiang March 8, 2021 08:44
@swkasula swkasula changed the title [WIP] update confusion_matrix Update confusion_matrix Mar 8, 2021
@swkasula swkasula requested a review from crupley March 8, 2021 08:53
@swkasula swkasula marked this pull request as ready for review March 8, 2021 08:53
@tovbinm
Copy link
Collaborator

tovbinm commented Mar 8, 2021

Thanks, @swkasula

Please confirm the OSS agreement.

Comment on lines +604 to +606
Threshold: Double,
@JsonDeserialize(contentAs = classOf[java.lang.Long])
ConfusionMatrixCounts: Seq[Long]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style issue: params are typically lower camel case:

Suggested change
Threshold: Double,
@JsonDeserialize(contentAs = classOf[java.lang.Long])
ConfusionMatrixCounts: Seq[Long]
threshold: Double,
@JsonDeserialize(contentAs = classOf[java.lang.Long])
confusionMatrixCounts: Seq[Long]

@@ -418,12 +418,12 @@ class OpMultiClassificationEvaluatorTest extends FlatSpec with TestSparkContext
outputMetrics.ConfMatrixThresholds shouldEqual testThresholds
outputMetrics.ConfMatrices.length shouldEqual testThresholds.length
// topK confusion matrix for p >= 0.4
outputMetrics.ConfMatrices(0) shouldEqual
outputMetrics.ConfMatrices(0).ConfusionMatrixCounts shouldEqual
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line here verifying the threshold? Something like outputMetrics.ConfMatrices(0).Threshold shouldEqual....

Seq(
6L, 6L,
4L, 4L)
// topK confusion matrix for p >= 0.7
outputMetrics.ConfMatrices(1).toArray shouldEqual
outputMetrics.ConfMatrices(1).ConfusionMatrixCounts.toArray shouldEqual
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above. Otherwise lgtm

@swkasula
Copy link
Collaborator Author

closing this as I created a different PR: #549 to merge from the tested dev branch.

@swkasula swkasula closed this Mar 11, 2021
@swkasula swkasula mentioned this pull request Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants