Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributions calculated in RawFeatureFilter to ModelInsights #103

Merged
merged 9 commits into from
Aug 30, 2018

Conversation

Jauntbox
Copy link
Contributor

@Jauntbox Jauntbox commented Aug 30, 2018

Related issues
n/a

Describe the proposed solution
Add distributions calculated in RawFeatureFilter to ModelInsights. Note that currently, raw features that aren't explicitly blacklisted by RFF, but are not used because they are inputs to explicitly blacklisted features are not present as raw features in the model, nor in ModelInsights. However, the distributions of such features are still present in the model and accessible via getRawFeatureDistributions().

Describe alternatives you've considered
n/a

Additional context
n/a

@codecov
Copy link

codecov bot commented Aug 30, 2018

Codecov Report

Merging #103 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #103      +/-   ##
=========================================
+ Coverage   86.19%   86.2%   +<.01%     
=========================================
  Files         294     294              
  Lines        9550    9552       +2     
  Branches      319     327       +8     
=========================================
+ Hits         8232    8234       +2     
  Misses       1318    1318
Impacted Files Coverage Δ
...a/com/salesforce/op/filters/RawFeatureFilter.scala 93.26% <ø> (ø) ⬆️
.../src/main/scala/com/salesforce/op/OpWorkflow.scala 87.5% <ø> (ø) ⬆️
...main/scala/com/salesforce/op/OpWorkflowModel.scala 92.85% <100%> (ø) ⬆️
...c/main/scala/com/salesforce/op/ModelInsights.scala 93.85% <100%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be1748a...9707723. Read the comment docs.

@@ -474,4 +477,22 @@ class ModelInsightsTest extends FlatSpec with PassengerSparkFixtureTest {
f0InDer3.variance shouldBe Some(3.3)
}

it should "include raw feature distribution information calculated in RawFeatureFilter when it's included" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can omit when it's included. Perhaps include raw feature distribution info calculated in RawFeatureFilter`

val wfRawFeatureDistributions = modelWithRFF.getRawFeatureDistributions()
val wfDistributionsGrouped = wfRawFeatureDistributions.groupBy(_.name)

// Currently, raw features that aren't explicitly blacklisted, but are not used because they are inputs to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this go to RawFeatureFilter docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can move it there

)
}

it should "not include raw feature distribution information if RawFeatureFilter is not used" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps not include raw feature distribution info if RawFeatureFilter is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing what's different here... you want "info" instead of "information"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ;) the point is - try to make the test name a bit shorter.

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

*/
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights])
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you perhaps reformat?

case class FeatureInsights
(
   featureName: String,
   featureType: String,
   derivedFeatures: Seq[Insights],
   distributions: Seq[FeatureDistributionLike] = Seq.empty
)

*/
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights])
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights],
distributions: Seq[FeatureDistributionLike] = Seq.empty)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make type FeatureDistribution

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leahmcguire why not the trait?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the class everywhere else

@tovbinm tovbinm merged commit a8eaf4b into master Aug 30, 2018
@tovbinm tovbinm deleted the km/feature-dist branch August 30, 2018 21:53
@salesforce-cla
Copy link

salesforce-cla bot commented Nov 9, 2020

Thanks for the contribution! It looks like @leahmcguire is an internal user so signing the CLA is not required. However, we need to confirm this.

@salesforce-cla
Copy link

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Leah McGuire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants