Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6067][CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform #6197

Merged
merged 6 commits into from
Jun 24, 2024

Conversation

baibaichen
Copy link
Contributor

What changes were proposed in this pull request?

We refactor the codes for the following purpose:

  1. Calling SparkShim::genExtendedColumnarPostRules, so we can fallback write to vanilla spark in case of spark 3.5
  2. Add NativeWriteCheck,Make UT failed in spark 3.5
  3. Refactor LocalExecutor, moving pipeline building to SerializedPlanParser, because we can not add sink transform in this class.
  4. Reducing duplcate codes, i.e. CHIteratorApi and SubstraitPlanPrinterUtil

(Fixes: #6067)

How was this patch tested?

Existed UTs

Copy link

#6067

Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen changed the title [GLUTEN-6067][CH] [Part 1] Support Native Writer for Spark 3.5 [GLUTEN-6067][CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform Jun 24, 2024
Copy link
Contributor

@zzcclp zzcclp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zzcclp zzcclp merged commit 1fbdbc4 into apache:main Jun 24, 2024
43 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6197_time.csv log/native_master_06_24_2024_f07e348f4_time.csv difference percentage
q1 34.29 35.39 1.108 103.23%
q2 26.88 23.65 -3.237 87.96%
q3 38.57 40.35 1.776 104.60%
q4 35.48 32.68 -2.803 92.10%
q5 71.43 70.64 -0.786 98.90%
q6 6.31 9.08 2.765 143.83%
q7 85.33 80.58 -4.748 94.44%
q8 89.87 87.90 -1.967 97.81%
q9 119.46 125.50 6.045 105.06%
q10 45.54 48.85 3.312 107.27%
q11 23.56 20.45 -3.109 86.80%
q12 23.44 26.51 3.068 113.09%
q13 38.96 38.63 -0.335 99.14%
q14 19.73 22.31 2.579 113.07%
q15 31.23 31.83 0.608 101.95%
q16 14.67 14.13 -0.546 96.28%
q17 106.57 103.74 -2.832 97.34%
q18 147.02 144.47 -2.550 98.27%
q19 13.81 13.92 0.115 100.83%
q20 27.30 29.16 1.864 106.83%
q21 264.07 264.29 0.221 100.08%
q22 16.54 12.24 -4.306 73.97%
total 1280.06 1276.30 -3.759 99.71%

@baibaichen baibaichen deleted the feature/spark35parquet branch June 25, 2024 01:20
deepashreeraghu pushed a commit to deepashreeraghu/incubator-gluten that referenced this pull request Jun 26, 2024
… for supporting sink transform (apache#6197)

[CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform

* [Refactor] remove duplicate codes

* Add NativeWriteChecker

* [Prepare to commit] getExtendedColumnarPostRules from Spark shim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Support CH backend with Spark3.5
3 participants