-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL][Core] Add InputFileReplaceFallback Rule #6139
[VL][Core] Add InputFileReplaceFallback Rule #6139
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
...ends-velox/src/main/scala/org/apache/gluten/extension/InputFileNameReplaceFallbackRule.scala
Show resolved
Hide resolved
I think we'd follow a criteria when rewriting an expression / operator before entering actual offloading phase (e.g., using genExtendedColumnarValidationRules to do the replacement): to make it compatible with both native lib and vanilla Spark as long as possible and to avoid converting it back when the computation doesn't offload. Can you take a look at some current rules that already follow the criteria, e.g., the ones to offload collet_list, collect_set, hll, bloom-filter, dpp subquery, to see if we can do the same on the expressions used by |
A basic question for #6021 : Why didn't we do the replacement during scan offloading? If it's about the co-offloading for project and scan, what I can imagine is that we can add a individual offloading rule. E.g., |
Run Gluten Clickhouse CI |
@zhztheplayer , thank you! actually i have checked this part but didn't figure out a similar way for And you can check this discussion for why we pick current solution for |
For this question - And for this part - So if you have a good idea to share, I'd like do a following PR after this PR to refactor this part to merge 2 rules together. :) |
Run Gluten Clickhouse CI |
Which works for me but let's turn off the option by default and mark it as experimental? Until we got a more robust design. Thanks. |
BTW I'll take a closer look at the code asap. |
20c45d9
to
9e34524
Compare
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
thanks, done to mark it experimental and also turn off by default @FelixYBW FYI |
thanks, since without this change, user may encounter the failure issue if scan fallback with complex data type and project has |
Run Gluten Clickhouse CI |
@gaoyangxiaozhu, Spark 3.4 tests passed after re-triggered. Let's wait for CH CI. |
@gaoyangxiaozhu I opened #6157 for further work to improve this. I am assign that issue to you. And I suggest to have a clear plan for the new implementation before merging this. |
got! So I send seperate PR #6161 to turn off @jinchengchenghh / @zhztheplayer / @FelixYBW please help review. |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
What changes were proposed in this pull request?
The is a following up PR for
input_file_name
native support, itinputfileNameReplaceFallbackRule
to handle the case if scan fallback due to complex data type(Fixes: facebookincubator/velox#9957)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)