Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Public Delta Lake writer #2329

Merged
merged 12 commits into from
Jun 4, 2024
Merged

[FEAT] Public Delta Lake writer #2329

merged 12 commits into from
Jun 4, 2024

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented May 31, 2024

Follow-up on #2304 with additional cleanup, parameters, and testing of the Delta Lake writing functionality, now set to be a public API with this PR!

This PR also renames read_delta_lake to read_deltalake, providing a deprecation warning for the old name

Todo before merging:

@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request labels May 31, 2024
@kevinzwang
Copy link
Member Author

Thank you to @siddharth-gulia for the original work on this!

Copy link

codecov bot commented May 31, 2024

Codecov Report

Attention: Patch coverage is 83.75000% with 26 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@55b0bc4). Learn more about missing BASE report.
Report is 1 commits behind head on main.

Current head be254ca differs from pull request most recent head 1b04a25

Please upload reports for the commit 1b04a25 to get more accurate results.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2329   +/-   ##
=======================================
  Coverage        ?   79.06%           
=======================================
  Files           ?      473           
  Lines           ?    55143           
  Branches        ?        0           
=======================================
  Hits            ?    43597           
  Misses          ?    11546           
  Partials        ?        0           
Files Coverage Δ
daft/__init__.py 22.22% <ø> (ø)
daft/api_annotations.py 89.23% <100.00%> (ø)
daft/context.py 76.22% <ø> (ø)
daft/delta_lake/delta_lake_scan.py 95.34% <100.00%> (ø)
daft/execution/execution_step.py 93.97% <100.00%> (ø)
daft/execution/physical_plan.py 95.04% <ø> (ø)
daft/execution/rust_physical_plan_shim.py 96.34% <ø> (ø)
daft/io/__init__.py 95.83% <100.00%> (ø)
daft/io/_lance.py 92.85% <100.00%> (ø)
daft/logical/builder.py 93.33% <100.00%> (ø)
... and 10 more

@kevinzwang
Copy link
Member Author

Oh one question I had is if it should be kept as write_delta? The read function is called read_delta_lake and they should probably match. I would guess that this was probably named write_delta to match Polars

@kevinzwang kevinzwang mentioned this pull request Jun 3, 2024
4 tasks
daft/dataframe/dataframe.py Outdated Show resolved Hide resolved
daft/dataframe/dataframe.py Outdated Show resolved Hide resolved
daft/execution/execution_step.py Show resolved Hide resolved
@kevinzwang kevinzwang requested a review from samster25 June 3, 2024 23:03
Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, but main thing is to keep the io related changes in daft/io and rely on dynamic imports when io is invoked

@kevinzwang kevinzwang enabled auto-merge (squash) June 4, 2024 00:05
@kevinzwang kevinzwang merged commit 00eb8e1 into main Jun 4, 2024
42 checks passed
@kevinzwang kevinzwang deleted the kevin/write-delta branch June 4, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants