Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return scoring feature distributions from RawFeatureFilter #171

Merged
merged 14 commits into from
Nov 3, 2018

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Nov 2, 2018

Related issues
Scoring feature distributions are not exposed from RawFeatureFilter

Describe the proposed solution
Return training and scoring feature distributions in single collection from RawFeatureFilter, but add a type to distinguish between them.

  • Added type enum field for FeatureDistribution with Training and Scoring possible values.
  • Made sure OpWorkflow can read / write old feature distributions (without the type).
  • Added tests
  • Quite some avro logging which started with this commit Using MapReduce Api for Avro Read Write #150

Describe alternatives you've considered
Add an alternative collection with scoring feature distributions everywhere in RawFeatureFilter, OpWorkflow etc.

Additional context
This is an important piece of information that we don't want to loose ;)

@tovbinm tovbinm changed the title Mt/fd type Return scoring feature distributions from RawFeatureFilter Nov 2, 2018
@codecov
Copy link

codecov bot commented Nov 2, 2018

Codecov Report

Merging #171 into master will increase coverage by 0.02%.
The diff coverage is 91.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #171      +/-   ##
==========================================
+ Coverage   86.22%   86.25%   +0.02%     
==========================================
  Files         305      305              
  Lines        9937     9950      +13     
  Branches      345      531     +186     
==========================================
+ Hits         8568     8582      +14     
+ Misses       1369     1368       -1
Impacted Files Coverage Δ
...la/com/salesforce/op/features/types/OPVector.scala 100% <ø> (ø) ⬆️
...scala/com/salesforce/op/features/FeatureLike.scala 42.39% <0%> (+0.45%) ⬆️
...a/com/salesforce/op/filters/PreparedFeatures.scala 80.48% <100%> (ø) ⬆️
...om/salesforce/op/filters/FeatureDistribution.scala 98.33% <100%> (+0.33%) ⬆️
...rce/op/stages/OpPipelineStageReadWriteShared.scala 100% <100%> (ø) ⬆️
...a/com/salesforce/op/filters/RawFeatureFilter.scala 89.28% <100%> (+0.29%) ⬆️
...cala/com/salesforce/op/OpWorkflowModelReader.scala 91.3% <100%> (ø) ⬆️
.../main/scala/com/salesforce/op/OpWorkflowCore.scala 93.54% <100%> (+0.21%) ⬆️
...cala/com/salesforce/op/OpWorkflowModelWriter.scala 100% <100%> (ø) ⬆️
.../src/main/scala/com/salesforce/op/OpWorkflow.scala 87.5% <90%> (-0.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b54fbfb...f53ded2. Read the comment docs.

Copy link
Contributor

@Jauntbox Jauntbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tovbinm tovbinm merged commit 6ad1169 into master Nov 3, 2018
@tovbinm tovbinm deleted the mt/fd-type branch November 3, 2018 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants