Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 3.3.0 #14

Merged
merged 5 commits into from
May 4, 2018
Merged

Release 3.3.0 #14

merged 5 commits into from
May 4, 2018

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented May 4, 2018

  1. Json4s extension to serialize Joda time arguments with op stages
  2. Correctly produce json for OpWorkflowRunnerConfig
  3. Added SmartTextVectorizer and SmartTextMapVectorizer
  4. Update OP type hierarchy image
  5. Name update bug fix in SwThreeStageBinaryEstimator
  6. Added feature type conversion shortcuts for floats
  7. Smart bucketizer for numeric map values based on a Decision Tree classifier
  8. Allow serializing HashAlgorithm enum in a stage argument
  9. Update Model Insights to have features excluded by raw feature filter
  10. Redesign DataCutter for new Cross-Validation/Train-Validation-Split
  11. Allow setting log level in Sanity Checker
  12. Move reference data out of OPMapVectorizerModelArgs
  13. Made maps params needed for feature parity in builder + increase defaultNumOfFeatures and maxNumOfFeatures for hashing
  14. Association rule confidence/support checks
    1. Added maxConfidences function to OpStatistics that calculates the max confidence per row of the contingency matrix, along with the support of that row. Refactored SanityCheckerSummary metadata so that everything coming from the same feature group (contingency matrix) are grouped together.
  15. anity checker summary metadata redesign
  16. Tests indicator group collapsing for sanity checker
  17. Added loco record insights
  18. Redesign DataBalancer
    1. (Internal optimization) With new Cross Validation, the same DataBalancer (i.e. with the same fractions) will run many times. Estimation is not necessary, hence no need to count over and over again.
  19. Model selector modified in order to have cross validation and train-split validation called on it rather than running them internally.
  20. Added ability to set output name on all stages
  21. Allow suppressing arg parse errors
  22. Modify workflow to run cv on all stages with label mixed in
    1. New Restriction: OpWorkflows can only contain at most 1 Model Selector, an error will be thrown otherwise.
  23. Updated default for binary model selector evaluator
    1. New Default: Area under PR curve is default value.
  24. Added FeatureBuilder.fromRow and FeatureLike.asRaw methods.
  25. Added constructor parameter stratify in cross-validation and train-validation split for stratification.
  26. Added ability to use raw feature filter to workflow.
  27. Added RawFeatureFilter class
  28. Extend Cramer's V to work with MultiPickLists
    1. Added calculation of Cramer's V on MultiPickList fields, computed from the max of all the 2x2 Cramer's V values on each individual choice of the MultiPickList. Updated methods in OpStatistics to return chi squared statistic and p value.
  29. Added Prediction feature type
    1. Prediction is a new NonNullable feature type that inherits from RealMap. It requires at least a prediction: Double to be provided, otherwise the error is thrown at construction.
    2. Prediction can also contain the rawPrediction: Array[Double] and probability: Array[Double] values.
  30. Added UID.reset and UID.count + tests
  31. Modify OpParams to provide read locations for two readers in Assessor (RawFeatureFilter) stage
  32. Error on null/empty in RealNN + make OPVector nullable
    1. RealNN now throws an exception on null/empty values at construction
    2. OPVector is now a nullable type
    3. Removed OpNumeric.``map and OpNumber.toDouble(default)
  33. Make param settings take priority over code settings and allow setting params that do not correspond to an underlying spark param
  34. Drop indices transformer
  35. Bug fix in calculating max sibling correlation
  36. Added Date To Unit Circle Transformer
    1. Implements a transformer of a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle.

Migration Guide

  1. Use com.salesforce.op.utils.json.EnumEntrySerializer.json4s instead of EnumEntrySerializer.apply for creating JSON4S formats.
  2. Make sure to specify OpParams.alternateReaderParams when using main constructor (can default to Map.empty).
  3. RealNN now can only be created from an actual Double/Long/Int value or with a default value/behavior provided:
RealNN(0.0) // ok
1.0.toRealNN // ok
Real(None).value.toRealNN(-1.0) // ok, but default value is a requirement now
Real(None).value.toRealNN(throw new RuntTimeException("RealNN cannot be empty")) // ok
Real(None).value.toRealNN // NOT ok
  1. OP pipeline stage operationName was renames to getOperationName

@tovbinm tovbinm requested a review from marcovivero May 4, 2018 04:41
Copy link
Contributor

@marcovivero marcovivero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review changes in OPCollectionHashingVectorizer.scala, I am seeing some discrepancies here between releases.

@tovbinm
Copy link
Collaborator Author

tovbinm commented May 4, 2018

@marcovivero good catch! fixed.

@marcovivero
Copy link
Contributor

Should we remove OPCollectionHashingVectorizer.scala.rej?

Copy link
Contributor

@marcovivero marcovivero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tovbinm tovbinm merged commit f6a8c40 into master May 4, 2018
@tovbinm tovbinm deleted the mt/3.3.0-release branch May 4, 2018 17:41
ericwayman pushed a commit that referenced this pull request Feb 8, 2019
Release 3.3.0
emitc2h added a commit that referenced this pull request Feb 24, 2022
@W-9035244 adding scala compat to jar publishing stage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants