Skip to content

0.4.0

Compare
Choose a tag to compare
@tovbinm tovbinm released this 23 Sep 06:35
· 299 commits to master since this release
62aed6e

New features and bug fixes:

  • Allow to specify the formula to compute the text features bin size for RawFeatureFilter (see RawFeatureFilter.textBinsFormula argument) #99
  • Fixed metadata on Geolocation and GeolocationMap so that keep the name of the column in descriptorValue. #100
  • Local scoring (aka Sparkless) using Aardpfark. This enables loading and scoring models without Spark context but locally using Aardpfark (PFA for Spark) and Hadrian libraries instead. This allows orders of magnitude faster scoring times compared to Spark. #41
  • Add distributions calculated in RawFeatureFilter to ModelInsights #103
  • Added binary sequence transformer & estimator: BinarySequenceTransformer and BinarySequenceEstimator + plus the associated base traits #84
  • Added StringIndexerHandleInvalid.Keep option into OpStringIndexer (same as in underlying Spark estimator) #93
  • Allow numbers and underscores in feature names #92
  • Stable key order for map vectorizers #88
  • Keep raw feature distributions calculated in raw feature filter #76
  • Transmogrify to use smart text vectorizer for text types: Text, TextArea, TextMap and TextAreaMap #63
  • Transmogrify circular date representations for date feature types: Date, DateTime, DateMap and DateTimeMap #100
  • Improved test coverage for utils and other modules #50, #53, #67, #69, #70, #71, #72, #73
  • Match feature type map hierarchy with regular feature types #49
  • Redundant and deadlock-prone end listener removal #52
  • OS-neutral filesystem path creation #51
  • Make Feature class public instead hide it's ctor #45
  • Specify categorical variables in metadata #120
  • Fix fill geo location vectorizer values #132
  • Adding feature importance for new model types #128
  • Adding binaryclassification bin score evaluator #119
  • Apply DateToUnitCircleTransformer logic in raw feature filter transformations 130#

Breaking changes:

  • Made case class to deal with model selector metadata #39
  • Made FileOutputCommiter a default and got rid of DirectMapreduceOutputCommitter and DirectOutputCommitter #86
  • Refactored OpVectorColumnMetadata to allow numeric column descriptors #89
  • Renaming JaccardDistance to JaccardSimilarity #80
  • New model selector interface #55. The breaking changes are related to return type and the way the parameters are passed into model selectors. Starting this version model selectors would return a single result feature of type Prediction (instead of a variable number of feature - (pred, raw, prob)). Example:
val (pred, raw, prob) = MultiClassificationModelSelector() // won't compile anymore
val prediction = MultiClassificationModelSelector() // ok!

Another change is the way parameters are passed into model selectors. Example:

BinaryClassificationModelSelector
  .withCrossValidation()
  .setLogisticRegressionRegParam(0.05, 0.1) // won't compile anymore

Instead one should do:

val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.05, 0.1)).build())
BinaryClassificationModelSelector
  .withCrossValidation(modelsAndParameters = models)

For more example on how to use new model selectors please refer to our documentation and helloworld examples.

Dependency upgrades & misc:

  • CI/CD runtime improvements for CircleCI and TravisCI
  • Updated Gradle to 4.10
  • Updated scala-graph to 1.12.5
  • Updated scalafmt to 1.5.1
  • New transmogrifai-local subproject #41 introduces aardpfark and hadrian dependencies.