-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for left semi and left anti joins #1182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a (hack-y) way of handling left anti joins that at least works with a very basic example. It only involves modifying the Python side and just slicing out the irrelevant righthand side columns. Not sure if this is the best way to do things or if it needs building upon to work for more complicated queries, but let me know what you think. Happy to help explore the Rust side more if this solution is insufficient.
@sarahyurick thanks for the suggestions! |
What's the current best practice for checking if a dask-sql dataframe is cpu or gpu backed without importing cudf? |
I think Line 49 in f8bf06c
|
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #1182 +/- ##
==========================================
+ Coverage 81.72% 81.89% +0.17%
==========================================
Files 78 78
Lines 4519 4535 +16
Branches 831 837 +6
==========================================
+ Hits 3693 3714 +21
+ Misses 643 634 -9
- Partials 183 187 +4
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ChrisJar! Some minor comments around type annotations / follow up work
if select_names: | ||
cc = dc.column_container | ||
|
||
select_names = select_names[: len(cc.columns)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why getFieldList
and getFieldNames
are returning an extra field that we need to filter out here (and in fix_column_to_row_type
)? I want to believe there's something we could do on the Rust end to avoid needing these workarounds in the Python code, but okay with filing that as a follow up issue to address.
cc @jdye64 if you have any idea what could be going on here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it seems like it's treating left semi and left anti joins like left joins where the column merged on from both left and right tables is included in the result whereas in leftsemi and leftanti joins only the left hand column should be returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind filing an issue around this and linking it here so we could follow up on this later if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Charles Blackmon-Luca <[email protected]>
To-do:
leftanti