Tags: jdye64/dask-sql
Tags
Do not depend on pkg not specified in `setup.py` (dask-contrib#214) * Added a test agains missing dependencies * Make sure distributed is installed automatically * Make the joblib dependency optional * Make the sklearn dependency optional
Remove mandatory dask_ml dependencies (dask-contrib#208) * Remove mandatory dask_ml dependencies * Increase coverage
Multiple schemas allowed (dask-contrib#205) * ML model improvement : Adding "SHOW MODELS and DESCRIBE MODEL" Author: rajagurunath <[email protected]> Date: Mon May 24 02:37:40 2021 +0530 * fix typo * ML model improvement : added EXPORT MODEL * ML model improvement : refactoring for PR * ML model improvement : Adding stmts in notebook * ML model improvement : Adding stmts in notebook * ML model improvement : also test the non-happy path * ML model improvement : Added mlflow and <With> in sql for extra params * ML model improvement : Added mlflow and <With> in sql for extra params * Added Test cases for Export MODEL * Added ML documentation about the following: 1. SHOW MODELS 2. DESCRIBE MODEL 3. EXPORT MODEL * refactored based on PR * Added support only for sklearn compatible models * excluded mlflow part from code coverage * install mlflow in test cluster * Added test for non sklearn compatible model * Added: initial draft of referencing multiple schemas * Added schema DDLs 1. Create Schema 2. Use schema 3. Drop schema 4. Added testcases * Use compound identifier for models, tables, experiments, views * Split the compound identifiers - without using the schema so far * Added a schema_name parameter to most functions and actually use the schema * Pass on the schemas to JAVA * Some simplifications and tests * Some cleanup, documentation and more tests (and fixed a bug in aggregation) * Remove unneeded import Co-authored-by: gurunath <[email protected]> Co-authored-by: Gurunath LankupalliVenugopal <[email protected]>
Aggregate improvements and SQL compatibility (dask-contrib#134) * A lot of refactoring the the groupby. Mainly to include both distinct and null-grouping * Test for non-dask aggregations * All NaN data needs to go into the same partition (otherwise we can not sort) * Fix compatibility with SQL on null-joins * Distinct is not needed, as it is optimized away from Calcite * Implement is not distinct * Describe new limitations and remove old ones * Added compatibility test from fugue * Added a test for sorting with multiple partitions and NaNs * Stylefix
0.3.0 Features: * Allow for an sqlalchemy and a hive cursor input (dask-contrib#90) * Allow to register the same function with multiple parameter combinations (dask-contrib#93) * Additional datetime functions (dask-contrib#91) * Server and CMD CLI script (dask-contrib#94) * Split the SQL documentation in subpages and add a lot more documentation (dask-contrib#107) * DROP TABLE and IF NOT EXISTS/REPLACE (dask-contrib#98) * SQL Machine Learning Syntax (dask-contrib#108) * ANALYZE TABLE (dask-contrib#105) * Random sample operators (dask-contrib#115) * Read from Intake Catalogs (dask-contrib#113) * Adding fugue integration and tests (dask-contrib#116) and fsql (dask-contrib#118) Bugfixes: * Keep casing also with unquoted identifiers. Fixes dask-contrib#84. (dask-contrib#88) * Scalar where clauses (dask-contrib#89) * Check for the correct java path on Windows (dask-contrib#86) * Remove # pragma once where it is not needed anymore (dask-contrib#92) * Refactor the hive input handling (dask-contrib#95) * Limit pandas version (dask-contrib#100) * Handle the case of the java version is undefined correctly (dask-contrib#101) * Add datetime[ns, UTC] as understood type (dask-contrib#103) * Make sure to treat integers as integers (dask-contrib#109) * On ORDER BY queries, show the column names of the SELECT query (dask-contrib#110) * Always refer to a function with the name given by the user (dask-contrib#111) * Do not fail on empty SQL commands (dask-contrib#114) * Fix the random sample test (dask-contrib#117)
PreviousNext