Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add copy_if_not_exists support for AmazonS3 via DynamoDB Lock Support #4880

Closed
tustvold opened this issue Sep 29, 2023 · 2 comments · Fixed by #4918
Closed

Add copy_if_not_exists support for AmazonS3 via DynamoDB Lock Support #4880

tustvold opened this issue Sep 29, 2023 · 2 comments · Fixed by #4918
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface

Comments

@tustvold
Copy link
Contributor

tustvold commented Sep 29, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

#4239 added copy_if_not_exists support for S3-compatible stores that support it, i.e. everything other than S3 😅.

Having native support for this, and potentially other conditional operations in the future (#4879), would reduce user friction and I think unlock a number of exciting use-cases by making support consistent across the backends.

Describe the solution you'd like

I would like to be able to configure a DynamoDB to use in a manner compatible with the AWS DynamoDB Lock Client.

Describe alternatives you've considered

These is a Rust port of the AWS lock client, but it depends on rusoto, and doesn't appear to have very strong fencing guarantees.

Additional context

FYI @wjones127

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface labels Sep 29, 2023
@tustvold tustvold self-assigned this Sep 29, 2023
@wjones127
Copy link
Member

FWIW, the DynamoDB lock protocol hasn't become the preferred protocol for Delta Lake or Lance, mostly because of the dangers of lease-based locks. Spark Delta Lake went with a slightly different protocol, described in this doc. In Lance we did something similar. delta-rs still uses dynamodb-lock, but will eventually move to match the protocol used by Spark Delta Lake.

That being said, offering something decent out-of-the-box for S3 might be nice.

@tustvold
Copy link
Contributor Author

Hmm... Yeah, I spent some time bashing at this, and it isn't easy to provide strong guarantees with this approach 😅

As iceberg also is doing something similar using DynamoDB to store the latest metadata file, I think lets not do this for now.

@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2023
@tustvold tustvold reopened this Oct 11, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Oct 11, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Oct 11, 2023
@alamb alamb changed the title DynamoDB Lock Support Add copy_if_not_exists support for AmazonS3 via DynamoDB Lock Support Oct 16, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Oct 27, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Oct 30, 2023
tustvold added a commit that referenced this issue Dec 26, 2023
…4918)

* Implement DynamoDBLock (#4880)

* Cleanup error handling

* Clippy

* Localstack support

* Clippy

* Handle integration test concurrency

* More docs

* Disable request timeout

* Fix merge conflicts

* Reduce test concurrency

* Increase timeouts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants