Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row UDF scalar arguments #311

Merged
merged 5 commits into from
Nov 16, 2021

Conversation

brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented Nov 11, 2021

Closes #279

@brandon-b-miller
Copy link
Contributor Author

This is ready for review - I thought some code might need to be written to disambiguate when one has a column with a name that could also be a constant, like having a column with name 42 and also passing 42 as a constant parameter. But it turns out we can just write \"42\" in the query to get the column with that name instead.

Copy link
Collaborator

@ayushdg ayushdg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the changes @brandon-b-miller !
Couple of minor comments in addition to the changes suggested:
Would it be possible to also update the docstring to include an example with scalar values?

Also I was able to catch the issues below while testing this with udfs that accept multiple column args, since in the single arg case the output would be correct. Might be worth adding that case to the tests as well.

scalar_args.append(operand)
df = column_args[0].to_frame()
for col in column_args[1:]:
df[col.name] = operand
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df[col.name] = operand
df[col.name] = col

Comment on lines 215 to 217
df = column_args[0].to_frame()
for col in column_args[1:]:
df[col.name] = operand
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this section was meant to be outside the loop.

@brandon-b-miller
Copy link
Contributor Author

Updated! :)

Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here @brandon-b-miller 🙂 just small nitpicks, the code itself LGTM

assert_frame_equal(return_df.reset_index(drop=True), expectation)


# Test row UDFs with one args
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Test row UDFs with one args
# Test row UDFs with two args

@@ -63,6 +65,64 @@ def f(row):
assert_frame_equal(return_df.reset_index(drop=True), expectation)


# Test row UDFs with one args
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Test row UDFs with one args
# Test row UDFs with one arg

Copy link
Collaborator

@ayushdg ayushdg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion otherwise lgtm!

column_args.append(operand)
else:
scalar_args.append(operand)
df = column_args[0].to_frame()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df = column_args[0].to_frame()
df = column_args[0].to_frame()

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2021

Codecov Report

Merging #311 (0c5f740) into main (2194a75) will decrease coverage by 0.05%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #311      +/-   ##
==========================================
- Coverage   95.75%   95.69%   -0.06%     
==========================================
  Files          65       65              
  Lines        2802     2810       +8     
  Branches      418      421       +3     
==========================================
+ Hits         2683     2689       +6     
- Misses         77       78       +1     
- Partials       42       43       +1     
Impacted Files Coverage Δ
dask_sql/datacontainer.py 94.68% <80.00%> (+0.36%) ⬆️
dask_sql/server/responses.py 97.87% <0.00%> (-2.13%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2194a75...0c5f740. Read the comment docs.

Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks for the work @brandon-b-miller 😄

@charlesbluca charlesbluca merged commit fda0f4f into dask-contrib:main Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Support passing scalar args to UDFs
4 participants