Skip to content

Releases: apache/incubator-gluten

v1.2.1-rc0

29 Nov 16:42
1a50a68
Compare
Choose a tag to compare
v1.2.1-rc0 Pre-release
Pre-release

What's Changed

Full Changelog: v1.2.0...v1.2.1-rc0

v1.2.1-preview

25 Nov 08:05
abf0e1d
Compare
Choose a tag to compare
v1.2.1-preview Pre-release
Pre-release

What's Changed

Full Changelog: v1.2.0...v1.2.1-preview

v1.2.0

03 Sep 09:51
c82af60
Compare
Choose a tag to compare

Release Notes - Gluten version 1.2.0

We are pleased to announce that Gluten v1.2.0 has been published as 1st official Apache release.

Highlights (Velox backend only)

  • Support Spark 3.2.2, 3.3.1, 3.4.2, and 3.5.1 with all UTs passed(if data type supported)
  • Support 31 common Spark Operators(based on Spark3.2)
  • Support 266 common Spark Functions(based on Spark3.2)
  • Velox codebase updated to 2024/07/05
  • New RSS support: add Apache Uniffle integration
  • New Data Lake support: Iceberge, Delta Lake
  • New File Format Support: CSV
  • Enhanced CI workflow
  • Refresh Documentations in Gluten website(https://gluten.apache.org/)
  • More Stability in Spill, OOM, and other cases support
  • More Bug Fixing

What's Changed

Read more

v1.2.0-rc3

21 Aug 09:34
c82af60
Compare
Choose a tag to compare
v1.2.0-rc3 Pre-release
Pre-release

What's Changed

  • [CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
  • [GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
  • [GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
  • [CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
  • [GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
  • [VL] Allow replacing installed minio package by @PHILO-HE in #4825
  • [VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
  • [VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
  • [GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
  • [GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
  • [VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
  • [GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
  • [VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
  • [VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
  • [CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
  • [GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
  • [GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
  • [VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
  • [CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
  • [VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
  • [VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
  • [CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
  • [GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
  • [VL] minor change for delta ut by @zhli1142015 in #4869
  • [VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
  • [VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
  • [VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
  • [MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
  • [GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
  • [VL] Support inline function by @marin-ma in #4847
  • [VL] Add flushable decimal sum test case by @liujiayi771 in #4871
  • [CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
  • [VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
  • [VL] Fix and use flattenVector by @marin-ma in #4783
  • [VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
  • [CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
  • [VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
  • [CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
  • [GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
  • Add .asf.yml by @yaooqinn in #4892
  • Update Vulnerability Handling Process by @yaooqinn in #4894
  • [VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
  • [CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
  • [VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
  • [VL] Pass partition id to velox functions by @zhli1142015 in #4344
  • Add Incubation Standard Disclaimer by @yaooqinn in #4911
  • [GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
  • [Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
  • [GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
  • [VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
  • [DOC] Update release & configuration doc by @PHILO-HE in #4910
  • [VL] Support lead window function by @ulysses-you in #4902
  • [VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
  • [Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
  • [GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
  • [Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
  • [Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
  • [VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
  • [GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
  • [DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
  • [VL] Refine log plan/split json into one line by @Yohahaha in #4934
  • [VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
  • [CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
  • [VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
  • [GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
  • [VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
  • [VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
  • [GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
  • [VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
  • [VL] Support skewness aggregate function by @liujiayi771 in #4939
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
  • [VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
  • [VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
  • [VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...
Read more

v1.2.0-rc2

14 Aug 07:04
27e988d
Compare
Choose a tag to compare
v1.2.0-rc2 Pre-release
Pre-release

What's Changed

  • [CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
  • [GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
  • [GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
  • [CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
  • [GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
  • [VL] Allow replacing installed minio package by @PHILO-HE in #4825
  • [VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
  • [VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
  • [GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
  • [GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
  • [VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
  • [GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
  • [VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
  • [VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
  • [CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
  • [GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
  • [GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
  • [VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
  • [CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
  • [VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
  • [VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
  • [CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
  • [GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
  • [VL] minor change for delta ut by @zhli1142015 in #4869
  • [VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
  • [VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
  • [VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
  • [MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
  • [GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
  • [VL] Support inline function by @marin-ma in #4847
  • [VL] Add flushable decimal sum test case by @liujiayi771 in #4871
  • [CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
  • [VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
  • [VL] Fix and use flattenVector by @marin-ma in #4783
  • [VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
  • [CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
  • [VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
  • [CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
  • [GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
  • Add .asf.yml by @yaooqinn in #4892
  • Update Vulnerability Handling Process by @yaooqinn in #4894
  • [VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
  • [CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
  • [VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
  • [VL] Pass partition id to velox functions by @zhli1142015 in #4344
  • Add Incubation Standard Disclaimer by @yaooqinn in #4911
  • [GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
  • [Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
  • [GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
  • [VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
  • [DOC] Update release & configuration doc by @PHILO-HE in #4910
  • [VL] Support lead window function by @ulysses-you in #4902
  • [VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
  • [Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
  • [GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
  • [Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
  • [Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
  • [VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
  • [GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
  • [DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
  • [VL] Refine log plan/split json into one line by @Yohahaha in #4934
  • [VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
  • [CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
  • [VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
  • [GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
  • [VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
  • [VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
  • [GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
  • [VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
  • [VL] Support skewness aggregate function by @liujiayi771 in #4939
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
  • [VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
  • [VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
  • [VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...
Read more

v1.2.0-rc1

25 Jul 23:40
c9f3d89
Compare
Choose a tag to compare
v1.2.0-rc1 Pre-release
Pre-release

What's Changed

  • [CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
  • [GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
  • [GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
  • [CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
  • [GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
  • [VL] Allow replacing installed minio package by @PHILO-HE in #4825
  • [VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
  • [VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
  • [GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
  • [GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
  • [VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
  • [GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
  • [VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
  • [VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
  • [CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
  • [GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
  • [GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
  • [VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
  • [CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
  • [VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
  • [VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
  • [CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
  • [GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
  • [VL] minor change for delta ut by @zhli1142015 in #4869
  • [VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
  • [VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
  • [VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
  • [MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
  • [GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
  • [VL] Support inline function by @marin-ma in #4847
  • [VL] Add flushable decimal sum test case by @liujiayi771 in #4871
  • [CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
  • [VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
  • [VL] Fix and use flattenVector by @marin-ma in #4783
  • [VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
  • [CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
  • [VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
  • [CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
  • [GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
  • Add .asf.yml by @yaooqinn in #4892
  • Update Vulnerability Handling Process by @yaooqinn in #4894
  • [VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
  • [CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
  • [VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
  • [VL] Pass partition id to velox functions by @zhli1142015 in #4344
  • Add Incubation Standard Disclaimer by @yaooqinn in #4911
  • [GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
  • [Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
  • [GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
  • [VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
  • [DOC] Update release & configuration doc by @PHILO-HE in #4910
  • [VL] Support lead window function by @ulysses-you in #4902
  • [VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
  • [Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
  • [GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
  • [Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
  • [Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
  • [VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
  • [GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
  • [DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
  • [VL] Refine log plan/split json into one line by @Yohahaha in #4934
  • [VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
  • [CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
  • [VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
  • [GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
  • [VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
  • [VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
  • [GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
  • [VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
  • [VL] Support skewness aggregate function by @liujiayi771 in #4939
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
  • [VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
  • [VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
  • [VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...
Read more

v1.2.0-rc0

06 Jul 12:35
Compare
Choose a tag to compare
v1.2.0-rc0 Pre-release
Pre-release

What's Changed

  • [CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
  • [GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
  • [GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
  • [CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
  • [GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
  • [VL] Allow replacing installed minio package by @PHILO-HE in #4825
  • [VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
  • [VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
  • [GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
  • [GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
  • [VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
  • [GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
  • [VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
  • [VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
  • [CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
  • [GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
  • [GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
  • [VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
  • [CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
  • [VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
  • [VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
  • [CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
  • [GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
  • [VL] minor change for delta ut by @zhli1142015 in #4869
  • [VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
  • [VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
  • [VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
  • [MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
  • [GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
  • [VL] Support inline function by @marin-ma in #4847
  • [VL] Add flushable decimal sum test case by @liujiayi771 in #4871
  • [CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
  • [VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
  • [VL] Fix and use flattenVector by @marin-ma in #4783
  • [VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
  • [CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
  • [VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
  • [CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
  • [GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
  • Add .asf.yml by @yaooqinn in #4892
  • Update Vulnerability Handling Process by @yaooqinn in #4894
  • [VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
  • [CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
  • [VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
  • [VL] Pass partition id to velox functions by @zhli1142015 in #4344
  • Add Incubation Standard Disclaimer by @yaooqinn in #4911
  • [GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
  • [Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
  • [GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
  • [VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
  • [DOC] Update release & configuration doc by @PHILO-HE in #4910
  • [VL] Support lead window function by @ulysses-you in #4902
  • [VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
  • [Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
  • [GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
  • [Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
  • [Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
  • [VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
  • [GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
  • [DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
  • [VL] Refine log plan/split json into one line by @Yohahaha in #4934
  • [VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
  • [CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
  • [VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
  • [GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
  • [VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
  • [VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
  • [GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
  • [VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
  • [VL] Support skewness aggregate function by @liujiayi771 in #4939
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
  • [VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
  • [VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
  • [VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...
Read more

v1.1.1

02 Mar 05:29
7999b61
Compare
Choose a tag to compare

Release Notes - Gluten - Version 1.1.1

We are pleased to announce that Gluten has been accepted as an Apache Incubating project. Additionally, we are excited to unveil the release of Gluten-1.1.1. This version marks the final release before our transition to Apache.

Highlights (Velox backend only)

  • Support Spark 3.2, 3.3, and 3.4(API only)
  • Support 30 common Spark Operators
  • Support 220 common Spark Functions
  • Velox codebase updated to 2024/02/29
  • Refactor Data Lake API to support Delta Lake Scan and Iceberg read COW table
  • Better S3, GCS support
  • More stability in Spill support
  • Enhance metric support for spill, shuffle, and additional metrics.
  • Enhance fallback case support by expanding coverage for missing cases and updating messages accordingly
  • Enhance Shuffle including merge before compressing, push based shuffle, and more
  • More Bug Fixing

What's Changed

  • [GLUTEN-3855][VL] Fix ORC related failed UT by @chenxu14 in #3805
  • [VL] Support IsNull filter pushdown by @rui-mo in #3791
  • [VL] Update velox-backend-limitations.md by @FelixYBW in #3639
  • [GLUTEN-2169][VL] Enable GlutenEnsureRequirementsSuite in unit tests by @JkSelf in #3860
  • [CH] Fix exception of pb MessageToJsonString by @exmy in #3823
  • [GLUTTEN-3851][VL] Add remaining filter time metric by @zhli1142015 in #3852
  • [VL] Support ignoreNulls for NthValue window function by @PHILO-HE in #3857
  • [VL] Enable using static link for QAT by @marin-ma in #3863
  • [VL] Fix assertion failures when mixing use of partial aggregation spilling and flushing by @zhztheplayer in #3872
  • [GLUTEN-3796][VL][FOLLOW_UP] Correct test name match and move black list to exclude in VeloxTestSettings by @zwangsheng in #3874
  • [GLUTEN-3528][VL] Construct unique & non-overlapping partition/sort keys for window operator by @PHILO-HE in #3883
  • [GLUTEN-3879][CH] salt 1% of TPCH-1 data to NULL instead of 10% by @binmahone in #3880
  • [VL] Doc refresh by @zhouyuan in #3882
  • [GLUTEN-3865][CH] Refactor aggregating without keys by @lgbo-ustc in #3866
  • [GLUTEN-3722][CH] Improve shuffle writer by @taiyang-li in #3728
  • [VL] Map date_format to a Velox function name by @PHILO-HE in #3878
  • [VL]Daily Update Velox Version (20231129) by @yma11 in #3877
  • [CORE] Add InputIteratorTransformer to decouple ReadRel and iterator index by @ulysses-you in #3854
  • [GLUTEN-3732][VL] Use arrow result-returning variants FileWriter::Open API by @yangzhg in #3733
  • [CORE] Move validate methods from TransformerApi to ValidatorApi by @exmy in #3881
  • [GLUTEN-3824][CH]Bug fix hdfs path contains space by @KevinyhZou in #3825
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20231201) by @lwz9103 in #3898
  • [VL] Break up spilling operation to two phases: shrink phase and spill phase by @zhztheplayer in #3895
  • [GLUTEN-1699][VL] Support loadLibFromJar on RedHat 7/8 by @ychris78 in #3893
  • [GLUTEN-3906] [VL] fix: fix package.sh failed for x86 by @lzjqsdd in #3907
  • [GLUTEN-3750][CH]Bug fix json parse error by @KevinyhZou in #3751
  • [GLUTEN-3902][VL] Add documentation to configure the Velox+GCS connector by @tigrux in #3902
  • [DOC] Revise Gluten document by @PHILO-HE in #3892
  • [VL]Daily Update Velox Version (20231203) by @yma11 in #3913
  • [VL] Minor improvements for CI stale bot by @zhztheplayer in #3888
  • [VL] Avoid reapplying code patches for external projects when ENABLE_EP_CACHE=ON by @zhztheplayer in #3916
  • [VL] minor change for fallback log by @zhli1142015 in #3919
  • [VL] Add sort merge join metrics by @ulysses-you in #3920
  • [GLUTEN-3378][CORE] Datasource V2 data lake read support by @liujiayi771 in #3843
  • [VL] ENABLE_EP_CACHE=ON still uses cached Velox build although the build arguments were changed by @zhztheplayer in #3926
  • [VL] Make bloom_filter_agg fall back when might_contain is not transformable by @zhli1142015 in #3917
  • [VL][CI] update docker build script by @zhouyuan in #3904
  • [GLUTEN-3917][FOLLOWUP] Add back SparkShimLoader import by @ulysses-you in #3940
  • [VL] Fix VeloxTPCHV1BhjSuite and VeloxTPCHV2Suite useV1SourceList by @liujiayi771 in #3930
  • [VL] Fix syntax error in stale.yml by @zhztheplayer in #3945
  • [GLUTEN-3854][CORE][FOLLOWUP] Add ColumnarInputAdapter back to recover UI graph by @ulysses-you in #3933
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20231206) by @lwz9103 in #3938
  • [VL] Add output row metric for InputIteratorTransformer by @Yohahaha in #3939
  • [GLUTEN-3927][CH] Improve the performance of element_at by @taiyang-li in #3928
  • [GLUTEN-3908][CH] Improve shuffle split for clickhouse backend by remove ColumnNullable's memcmp by @KevinyhZou in #3909
  • [GLUTEN-3924][CORE] Match hive UDF name in case-insensitive mode during expression transformation by @taiyang-li in #3925
  • [GLUTEN-3958] Use getDeclaredConstructor().newInstance() in ScanTransformerFactory by @liujiayi771 in #3961
  • [GLUTEN-3944][CH]Fix gluten.jar with delta20 when use spark 3.3 by @lwz9103 in #3947
  • [VL] gluten-te: In dockerfiles, use symbolic link for /opt/velox by @zhztheplayer in #3946
  • [VL]Daily Update Velox Version (20231206) by @yma11 in #3954
  • Revert "[GLUTEN-3908][CH] Improve shuffle split for clickhouse backend by remove ColumnNullable's memcmp " by @baibaichen in #3965
  • [GLUTEN-3890][CH] Respect spill_threshold for all buffers in shuffle writer by @taiyang-li in #3891
  • [CORE] Fix wrong fallback cost by @ulysses-you in #3967
  • [GLUTEN-3922][CH] Fix incorrect shuffle hash id value when executing modulo by @zzcclp in #3923
  • [VL] quick fix for static build git conflict by @zhouyuan in #3971
  • [GLUTEN-3486][CH] Fix AQE cannot coalesce shuffle partitions by @exmy in #3941
  • [GLUTEN-3949][CH] Merge small blocks from upstream phase into a large one by @lgbo-ustc in #3952
  • [GLUTEN-3948][CH] Fix exception and diff of trunc function by @exmy in #3968
  • [GLUTEN-3979][CORE] Use exists() instead of map().exists() to improve code readability by @dcoliversun in #3980
  • [VL]Daily Update Velox Version (20231208) by @yma11 in #3973
  • Revert "[VL] Make bloom_filter_agg fall back when might_contain is not transformable (#3917)" by @loneylee in #3977
  • [GLUTEN-3580][VL] support read data from abfs with account key by @gaoyangxiaozhu in #3897
  • [GLUTEN-3991][CH] Fix the incorrect display name for the mergetree file format by @zzcclp in #3992
  • [VL] gluten-te: Enable BuildKit to support --cache-from by @zhztheplayer in #3964
  • [GLUTEN-3841][CH] Support spill in 2nd aggregate stage by @lgbo-ustc in #3772
  • [VL] Daily Update Velox Version (20231211) by @zhztheplayer in #3999
  • [VL] Fix StringToMap test failure by @PHILO-HE in #3995
  • [VL] Make bloom_filter_agg fall back when might_contain is not transformable by @zhli1142015 in #3994
  • [VL] Following #3996, fix CI error "Runtime factory already registered" by @zhztheplayer in #4001
  • [VL] Fix linking simdjson error when building benchmark by @PHILO-HE in #3960
  • [GLUTEN-4002][CH] Update InputIteratorTransformer metrics by @zzcclp in https://github.com/...
Read more

Gluten v1.1.0

30 Nov 10:12
Compare
Choose a tag to compare

Release Notes - Gluten - Version 1.1.0

We are excited to announce the release of Gluten-1.1.0.
This version is the culmination of work from 45 contributors who have worked on features and bug-fixes for a total of over 800 commits since 1.0.0

Highlights (Velox backend only)

  • 20% performance improvement in Decision Support Benchmarks comparing to v1.0.0
  • Support Spark 3.2 and Spark 3.3
  • Support Spark 3.4 (experimental)
  • Run Pass all Velox UTs, Spark 3.2/3.3 SQL related UTs
  • Support Ubuntu 20.04/22.04, CentOS 7/8, alinux 3, Anolis 7/8
  • Support File System: localfs, HDFS, S3, OSS(via s3a), GCS
  • Support File Format: Parquet, ORC
  • Support Data Lake: deltalake (experimental)
  • Support Data Types: Primitive Type, Decimal, Date, Timestamp, Array (partial), Map (partial), Struct (partial)
  • Support 28 common Spark Operators, detail here
  • Support 199 common Spark Functions, detail here
  • Support Dynamic Memory Pool and Spill
  • Support Velox UDF
  • Support Gluten UI to print fallback event in History Server
  • Support Hadoop HA and Kerberos
  • Velox code updated to 20231123(commit-id: aff0cde)
  • Document improvement for support features and configuration

Known Issues

  • Only support static partition write in Spark 3.2 and 3.3

New Features

#3722 [CH] improve mutex usage in shuffle writer
#2063 [CH] Spark sql config load dynamic by task
#3257 [VL] We may need more metrics collected by Velox
#3528 [VL] Construct unique partition/sort keys and removing overlapping sort key for window plan
#3381 [CH]Reuse last WholeStageTransformer instead of creating new one in FileFormatWriter
#2118 [CH] Support hive udtf
#2128 [CH]Support tablesample clause
#2163 [CH] support approx_percentile aggregate function
#2193 [CH] Support some array functions
#2207 [CH] Support function to_utc_timestamp/from_utc_timestamp
#2136 [CH] HiveTransform add metrics readBytes
#2439 [VL] array_aggregate support with lambda function
#2451 [CH] Support StaticInvoke function
#2460 Avoid force check Java thread in native side
#2465 Remove operator level fallback policy
#2472 [CH] Remove BasicScanExecTransformer#getInputFilePaths when CH support more general partition location parsing
#3187 [CH] Implement runtime native bloom filter
#2267 [CH] Support urldecoder which is used in reflect(""java.net.URLDecoder"", ""decode"",event.event_info['currenturl'], ""UTF-8"")
#2309 Implement Streaming Window in Velox backend to reduce the memory usage.
#2323 [CH] Build optimization
#2343 [VL] ShuffleWrite: Larger shuffle size than vanilla spark and long compression time
#2365 [CH] gluten should support setting max bytes for a partition for orc/parquet
#2390 [CH] Aligning the NULL and NaN compare semantics of Spark and CH
#2600 [CH] enhance S3 client caching
#2617 [VL][Spark 3.3+] support pushdown aggregate to native scan insteads of fallback
#2619 [VL][Spark 3.3+] support match columns use filedIds in native insteads of fallback
#2667 [VL] Stacktrace-categorized memory allocation dumping for debugging
#2730 Request for documentation on how to write a backend for 3rd party engines
#2761 [DOC] A doc named index.md share same content with README.md
#2772 [VL] When performance degradation,What factors may affect the performance?
#2783 [VL]Run CI with DEBUG build mode to enhance stability
#2791 [VL] Support spark function: concat_ws
#2793 Code refactor: move some common code to a root module named common
#2807 Code cleanup: FunctionConfig may be useless
#2515 when we will support spark -gpu ,now we need spark -gpu feature to train big model
#2535 UnsupportedOperationException is abused
#2593 List parquet write semantic differents in Spark and gluten
#2804 Handle timeZoneId for TimezoneAwareExpression
#2815 [VL] complex data type support in parquet scan
#2825 [VL] In Java, consolidate GlutenColumnarBatchSerializer and CelebornColumnarBatchSerializer
#2826 [VL] Use a dedicate class to maintain gluten native config
#2845 [VL] Separate each jni wrapper to different files
#2874 [VL] support spark.sql.decimalOperations.allowPrecisionLoss
#2877 [VL] Support read iceberg
#2905 [VL] Support percentile function
#2919 [VL] Support ORC format in HiveTableScanExecTransformer
#2956 [VL] Support NullType in Project
#2975 [VL] Track MemoryManager feature
#3015 [CH] ReusedExchange: Gluten does not touch it or does not support it
#3017 [VL] Allow users to set spill partitions/levels
#3033 [CH] Support aggregation spill for the second stage
#3049 [CORE] Statement level controls whether to use gluten
#3817 [CH] Optimize mergetree prewhwhere
#3704 [CH] support tuple subcolumn pruning for orc/parquet
#3784 DNM
#3144 [CH] Aggregation supports complicate type
#3715 [VL] Add support for GCS
#2106 [VL] CI: allow to benchmark TPCH performance on comment
#3702 [VL] Add sort based window support in velox backend
#2404 [VL] Enable Velox memory reclaimer for auto disk-spilling
#3082 [CORE] Support columnar CollectLimit
#3739 [VL] Add config to disable velox file handle cache
#3055 [VL] Use mixed memory (off-heap and on-heap) for native
#3077 [VL] EP: Centralized lifecycle management for C++ / JNI contextual objects
#3142 [VL] Tight Java-C++ object binding
#3075 [VL] Support static partition write in VL backend
#2533 Degrade Arrow version to 8.0 in VL backend.
#2629 Use Project + Unnest to implement Expand operator
#3132 Add streamingwindow support in velox backend
#3361 Support Spark 3.4 in Gluten.
#3425 [VL] Create Hdfs folder in Gluten side when writing hdfs file
#3541 [VL] Add minimal GHA CI job for debug build
[#3705](https://...
Read more

Gluten v1.0.0

14 Jul 03:07
bfe394b
Compare
Choose a tag to compare

Release Notes - Gluten - Version 1.0.0

Highlights (Velox backend only)

  • Support Spark 3.2 and Spark3.3
  • Run Pass all Velox, Spark3.2 UTs, and partially Spark3.3 UTs
  • Support Ubuntu 20.04/22.04, CentOS 7/8, alinux 3, Anolis 7/8
  • Support FileSystem: localfs, HDFS, S3, OSS (via s3a)
  • Support data types: Primitive type, Decimal, Date, Timestamp
  • Support 20 operators, detail here
  • Support 164 functions, detail here
  • Support native Parquet write
  • Support native ORC read
  • Support Intel® In-memory Analytics Accelerator (IAA/IAX) hardware accelerator in Shuffle compression
  • Support cap-based spill (static memory allocation) for join/agg/sort operator (experimental feature)
  • Support static build method via vcpkg
  • Support local cache (experimental feature)
  • 2.71x speedup in Decision Support Benchmark1 (TPC-H Like) testing
  • 2.29x speedup in Decision Support Benchmark2 (TPC-DS Like) testing
  • Velox code updated to commit
  • Document improvement for support features and configuration

Known Issues

  • Parquet write only support compression.codec, parquet.block.size and parquet.block.rows configurations
  • Velox backend does not support dynamic partition write and bucket write
  • Spill may throw OutOfMemoryExcetpion

New Features

Improvements

Read more