Releases: sql-machine-learning/sqlflow
Releases · sql-machine-learning/sqlflow
Release v0.4.2
Major Features and Improvements
- Add three pre-made Runnable: extract_ts_features (extract time series features using tsfresh), binning and psi
- Model Meta Design: get the model metadata (such as docker image name for TO TRAIN, model type and so on) when generating prediction workflow step code
- Distinguish XGBoost model when generating prediction workflow code
- Support config https for jupyterhub
Refactorization
- Implement the end-to-end workflow of XGBoost prediction and evaluation
- Implement predict and explain in Alisa submitter at runtime
- Unify the API of local and PAI submitter
- Simplify HDFS parameters
Bug Fixes
- Fix titanic Maxcompute dataset importing when FLOAT data type is not enabled
- Fix generate Couler evaluate step in workflow mode.
- Fix paiio reading table bug when running TO EXPLAIN on PAI.
- Fix XGBoost data compatibility issue: compatible with various CSV format such as a,b,c, and a, b, c, and the string containing /
- Fix explain issue when SHAP values are not listed
Release v0.4.1
Major Features and Improvements
- The model zoo can be used in the playground now.
- CLI supports downloading the model in the model zoo to local.
- Support the GCN model in the official models repo.
- CI has been moved to the Github actions. Travis CI was disabled.
TO RUN
syntax can use the file name instead of using the absolute path.- Non-linear optimization problems are supported by the BARON solver.
CONSTRAINT
clause can be optional in theTO MAXIMIZE|MINIMIZE
statement.
Refactorization
- The end to end XGBoost training on local can run in the workflow mode now.
- Unify the DBMS APIs by the
Connection
andResultSet
interfaces in the Python side.
Bug Fixes
- Fix the bug that XGBoost training cannot have more than 255 feature columns.
- Fix the bug that the TiDB parser cannot parse the
LAG
function.
Release v0.4.0
Major Features and Improvements
- The parser can remove all comments now.
- Support linear programming using
pyomo
andoptflow
. - Add Model zoo default model definitions in image
sqlflow/sqlflow
. - Support custom train loop, predict sample, evaluation loop in custom models.
- Move CI jobs from Travis to GitHub actions to use a pre-setup environment to speed up the build and test.
- Add SQLFlow Playground where users can get a quick experience of SQLFlow.
Refactorization
- WIP: refactoring
sqlflow_submitter
toruntime
. Theruntime
library supports feature derivation, statement verifier, job submitters to various platforms, and executes the workflow step then saves the model into the database. - Remove
is_pai
conditions inruntime.tensorflow
package and move corresponding code runs on PAI toruntime.pai
.
Bug Fixes
- Fix size calculation in fillCSVFieldDesc is always 0 in feature derivation.
Release v0.3.0-rc.1
Major Features and Improvements
- Support
TO EVALUATE
clause to evaluate a model. - SQLFlow model zoo, support publicly share model definitions and models.
- Support mathematical programming using SQL.
- Support feature column in the XGBoost model, including training, evaluating, prediction, and explaining.
- Support incremental training for both TensorFlow and XGBoost models.
- Add logs to record runtime status.
- Command-line Tool support release/remove model/repo .
- Support
SHOW TRAIN
statement go get original SQL. - Create the SQLFlow Playground as a quick-start environment.
Improvements
- Improve the user experience on workflow mode, including improving workflow log structure, return selected rows, and diagnostic message to the GUI system.
- Improve some diagnostic messages on the workflow model.
- Supports passing all the selected columns into the prediction result table.
- Decompose the all-in-one Docker image into separated Docker images.
Release v0.2.0-rc.1
Major Features and Improvements
- Support parsing on SQL programs and arbitrary select statement in extended syntax. #1126
- Support feature derivation. #705
- Support high available SQLFlow server by submitting SQL programs to Kubernetes clusters as a workflow. #1066
- Enhanced REPL functionality.
- Support more training configurations:
- Support configuring optimizers for Tensorflow Estimator models.
- Support configuring optimizers and losses for custom Keras models.
- Support configuring metrics for training Tensorflow Estimator models and Keras models.
- Support explaining TensorFlow BoostedTrees models.
- Support writing EXPLAIN results to a table.
Breaking changes:
- We update the syntax extension from appending TRAIN/PREDICT/ANALYZE to TO TRAIN/PREDICT/EXPLAIN. #998
- Removed ALPS and ElasticDL code generators to adapt current intermediate representation implementation.
Release v0.1.0-rc.1
SQLFlow release v0.1.0-rc.1 is the first release candidate of SQLFlow.
The current version includes the following features:
- Database Support
- MySQL
- Hive: gohive
- MaxCompute: gomaxcompute
- Machine Learning Systems and Models Support
- Tensorflow Pre-made estimators.
- Custom Keras Model: contribute_models.md
- Xgboost models: #765
- Feature Columns Supported When Using Tensorflow or Keras Models:
- numeric_column
- bucket_column
- cross_column
- category_id_column
- sequence_category_id_column
- Column Data Type Support:
- FLOAT/INT/BIGINT
- VARCHAR/TEXT
- CSV formatted DENSE Tensor
- CSV formatted SPARSE Tensor
- Support Standalone Deployment and Session support: #531
- Deploy on Kubernetes Cluster: #537
- Unsupervised Training with Clustering Model: #737
- Analyze the Machine Learning Mode: analyzer_design.md