0.7.0 release #481

nicodv · 2020-06-11T21:15:30Z

Bug fixes:

Fix flaky ModelInsight tests #407
Remove logging of tokens of text fields #420, #438, #447, #474
Add validation prepare call before model selection when no DAG is passed #424, #429
Fix Days.daysBetween int overflow #471

New features / updates:

Downsample the number of training samples to maxTrainingSample for regression #413 and multi-class classification #414
Refactor InsightLOCOTest #412
Enable more loss types for OpLinearRegression #421
Add property-based tests for regression model selection #427
Add option to calculate LOCO for dates/texts by leaving out their entire vector #418
Add Chinese and Korean examples to TextTokenizerTest #442
Add support for ignoring text that looks like IDs in SmartTextVectorizer #448, #455
Add a unary estimator for detecting names in text fields and transforming to likely gender #445
Allow result features to be removed by raw feature filter #458
Metadata changes for sensitive feature information #457
Add MinVarianceFilter which checks that computed features have a minimum variance #463, #465
Allow TextStats length distribution to be token-based and refactor for testability #464
Use Spark job grouping to distinguish steps of the machine learning flow #467, #468, #470
Add categorical detection to be coverage based in addition to unique count based #473
Remove duplicate features using sanity checker feature to feature correlations #476, #479
Lift the upper bound on number of hash features #477
Enable Html stripping on text-like features #478

Dependency updates (#402, #466):

Update Apache Spark version to 2.4.5
Avro is a built-in data source in Spark 2.4, so no longer using the spark-avro package
Avro to 1.8.2
XGBoost to 0.90
MLeap to 0.14.0
json4s to 3.5.3
JUnit to 4.12
chill to 0.9.3
gradle-avro-plugin to 0.16.0

Miscellaneous:

Add ROADMAP.md #394

This reverts commit 95a77b1.

… made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'

# Conflicts: # core/src/main/scala/com/salesforce/op/stages/sparkwrappers/specific/OpPredictionModel.scala # features/src/main/scala/org/apache/spark/ml/SparkDefaultParamsReadWrite.scala # gradle.properties

codecov · 2020-06-11T21:32:58Z

Codecov Report

Merging #481 into master will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #481      +/-   ##
==========================================
- Coverage   87.01%   87.00%   -0.01%     
==========================================
  Files         345      345              
  Lines       11680    11680              
  Branches      378      378              
==========================================
- Hits        10163    10162       -1     
- Misses       1517     1518       +1

Impacted Files	Coverage Δ
.../op/features/types/FeatureTypeSparkConverter.scala	`98.24% <0.00%> (-0.88%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e48831a...e292501. Read the comment docs.

leahmcguire

LGTM!

tovbinm · 2020-06-11T22:17:46Z

🥳 🥳 🥳 🥳 🥳

tovbinm and others added 30 commits September 3, 2019 22:10

Revert "Revert back to Spark 2.3 (#399)"

77e2229

This reverts commit 95a77b1.

Update to Spark 2.4.3 and XGBoost 0.90

485fcd5

special double serializer fix

12f5333

fix serialization

a4ee986

fix serialization

47703fa

docs

ea4b11f

fixed missng value for test

ab81e31

meta fix

c017869

Updated DecisionTreeNumericMapBucketizer test to deal with the change…

99ea7e1

… made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'

fix params meta test

daa2672

FIxed failing xgboost test

7d3ebb7

ident

5661640

cleanup

8852d69

added dataframe reader and writer extensions

2ab8924

added const

7c8b988

cherrypick fixes

d98c8a9

added xgboost params + update models to use public predict method

cce0d8f

blarg

8afbae7

double ser test

e8770f6

update mleap and spark testing base

afffc56

Update README.md

ed43719

type fix

8804acd

Merge branch 'master' into ndv/spark2.4

a1461c7

bump minor version

b54d0f5

Merge branch 'master' into ndv/spark2.4

861f862

Merge branch 'master' into revert-399-mt/revert-spark-2.4

41f9dc4

Update Spark version in the README

cb4cb7b

bump version

abca58b

Merge remote-tracking branch 'origin/ndv/spark2.4' into ndv/spark2.4

ab8daf8

Update build.gradle

e58da6c

nicodv added 17 commits January 21, 2020 15:27

Merge branch 'master' into 0.7.0-release

3d2a67b

update release notes

c3067ca

update release notes

74d4ee4

Merge branch 'master' into 0.7.0-release

2569bd8

# Conflicts: # core/src/main/scala/com/salesforce/op/stages/sparkwrappers/specific/OpPredictionModel.scala # features/src/main/scala/org/apache/spark/ml/SparkDefaultParamsReadWrite.scala # gradle.properties

updates to changelog

496837c

Merge branch 'master' into 0.7.0-release

7fd49db

updates to changelog

323fbed

updates to changelog

2e9d226

updates to changelog

2b56541

Merge branch 'master' into 0.7.0-release

63cbc83

Merge branch 'master' into 0.7.0-release

d284b74

updates to changelog

76cb0e6

Merge branch 'master' into 0.7.0-release

ff021ee

updates to changelog

59db865

fix changelog

4480745

fix changelog

be6ad78

keep helloworld on 0.6.1 until release

e292501

nicodv added the release label Jun 11, 2020

nicodv requested review from gerashegalov, Jauntbox, leahmcguire, tovbinm and wsuchy as code owners June 11, 2020 21:15

salesforce-cla bot added the cla:signed label Jun 11, 2020

leahmcguire approved these changes Jun 11, 2020

View reviewed changes

tovbinm approved these changes Jun 11, 2020

View reviewed changes

nicodv merged commit 036d1fc into master Jun 11, 2020

nicodv deleted the 0.7.0-release branch June 11, 2020 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.7.0 release #481

0.7.0 release #481

nicodv commented Jun 11, 2020

codecov bot commented Jun 11, 2020 •

edited

leahmcguire left a comment

tovbinm commented Jun 11, 2020

0.7.0 release #481

0.7.0 release #481

Conversation

nicodv commented Jun 11, 2020

codecov bot commented Jun 11, 2020 • edited

Codecov Report

leahmcguire left a comment

Choose a reason for hiding this comment

tovbinm commented Jun 11, 2020

codecov bot commented Jun 11, 2020 •

edited