Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add confusion matrix #533

Merged
merged 8 commits into from
Dec 10, 2020
Merged

Add confusion matrix #533

merged 8 commits into from
Dec 10, 2020

Conversation

feifjiang
Copy link
Contributor

@feifjiang feifjiang commented Dec 4, 2020

This PR replaces #532 #530

@codecov
Copy link

codecov bot commented Dec 4, 2020

Codecov Report

Merging #533 (70b7cab) into master (13ad9cd) will increase coverage by 86.77%.
The diff coverage is 95.31%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #533       +/-   ##
===========================================
+ Coverage        0   86.77%   +86.77%     
===========================================
  Files           0      347      +347     
  Lines           0    12019    +12019     
  Branches        0      389      +389     
===========================================
+ Hits            0    10430    +10430     
- Misses          0     1589     +1589     
Impacted Files Coverage Δ
...op/evaluators/OpMultiClassificationEvaluator.scala 95.62% <95.31%> (ø)
...main/scala/org/apache/spark/ml/tree/RichNode.scala 0.00% <0.00%> (ø)
.../op/stages/impl/feature/NameEntityRecognizer.scala 100.00% <0.00%> (ø)
...salesforce/op/stages/impl/tuning/OpValidator.scala 94.59% <0.00%> (ø)
...org/apache/spark/ml/attribute/MetadataHelper.scala 100.00% <0.00%> (ø)
...ce/op/stages/impl/feature/TextLenTransformer.scala 100.00% <0.00%> (ø)
...cala/com/salesforce/op/test/TestSparkContext.scala 83.33% <0.00%> (ø)
...la/com/salesforce/op/stages/OpPipelineStages.scala 63.88% <0.00%> (ø)
.../com/salesforce/op/utils/spark/RichEvaluator.scala 100.00% <0.00%> (ø)
...ain/scala/com/salesforce/op/aggregators/Sets.scala 37.50% <0.00%> (ø)
... and 338 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13ad9cd...70b7cab. Read the comment docs.

Copy link
Contributor

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, question re confMatrixThresholds parameter upper bound

@@ -122,9 +120,9 @@ private[op] class OpMultiClassificationEvaluator
parent = this,
name = "confMatrixThresholds",
doc = "sequence of threshold values used for confusion matrix metrics",
isValid = _.forall(x => x >= 0.0 && x <= 1.0)
isValid = _.forall(x => x >= 0.0 && x < 1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is 1.0 not allowed?

Copy link
Contributor Author

@feifjiang feifjiang Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multinomial logistic regression, the scenario where top probability equals to 1.0 will never happen. For random forest it's quite unlikely, unless the record falls onto one and the only one leaf node of the same class in every tree. Therefore it doesn't really make sense to have confidence threshold of 1.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering how much value is there to restrict it here?

Copy link
Contributor Author

@feifjiang feifjiang Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov are you concerned about too many values present in the confMatrixThresholds array? That's a valid concern, I can add a check for that

@@ -597,6 +600,13 @@ case class MisClassificationsPerCategory
MisClassifications: Map[Double, Long]
)

case class labelPredictionConfidence
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feifjiang are you planning to use LabelPredictionConfidenceCt and labelPredictionConfidence somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tovbinm sorry - forgot to remove it. Just updated the PR again.

@feifjiang
Copy link
Contributor Author

@tovbinm @gerashegalov Can either one of you merge my PR with master? I don't have write access.

@tovbinm tovbinm merged commit 91724f1 into salesforce:master Dec 10, 2020
@tovbinm
Copy link
Collaborator

tovbinm commented Dec 10, 2020

@feifjiang thank you for your contribution!

@feifjiang
Copy link
Contributor Author

@tovbinm Thank you for the comments and being so responsive!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants