Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lakectl presign large presigned download using multipart #7284

Merged
merged 14 commits into from
Jan 17, 2024

Conversation

nopcoder
Copy link
Contributor

Currently supports only single large file download using presign.

  • Enhanced parallel downloads: Break large file into multiple segments for simultaneous downloading, optimizing speed and efficiency.
  • Temporary filenames for reliability: During downloads, use temporary names with unique identifiers to ensure file integrity and prevent overwrites.

Close #7283

@nopcoder nopcoder added the area/lakectl Issues related to lakeFS' command line interface (lakectl) label Jan 14, 2024
@nopcoder nopcoder self-assigned this Jan 14, 2024
@nopcoder nopcoder requested a review from a team January 14, 2024 22:25
@nopcoder nopcoder added the include-changelog PR description should be included in next release changelog label Jan 14, 2024
Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

10 passed

@nopcoder nopcoder removed the request for review from a team January 15, 2024 11:42
@nopcoder nopcoder marked this pull request as draft January 15, 2024 11:42
Copy link

github-actions bot commented Jan 17, 2024

♻️ PR Preview 0974161 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

@nopcoder nopcoder requested a review from a team January 17, 2024 08:55
@nopcoder nopcoder marked this pull request as ready for review January 17, 2024 08:56
)

const (
DefaultDownloadPartSize int64 = 1024 * 1024 * 8 // 8MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nopcoder Don't you think we should add some way for the user to control it?
We can add another task for it if you don't think we should do it here

@nopcoder nopcoder requested a review from idanovo January 17, 2024 10:23
Copy link
Contributor

@idanovo idanovo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concurrency can also be controlled by the user like we do in other APIs/commands.
You can open another issue for it if you prefer to do that in another PR

Comment on lines +82 to +91
// fallback to download if missing size
if statResp.JSON200 == nil || statResp.JSON200.SizeBytes == nil {
return d.downloadObject(ctx, src, dst)
}

// check if the object is small enough to download in one request
sizeBytes := *statResp.JSON200.SizeBytes
if sizeBytes < d.PartSize {
return d.downloadObject(ctx, src, dst)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// fallback to download if missing size
if statResp.JSON200 == nil || statResp.JSON200.SizeBytes == nil {
return d.downloadObject(ctx, src, dst)
}
// check if the object is small enough to download in one request
sizeBytes := *statResp.JSON200.SizeBytes
if sizeBytes < d.PartSize {
return d.downloadObject(ctx, src, dst)
}
// fallback to download if missing size
if statResp.JSON200 == nil || statResp.JSON200.SizeBytes == nil || *statResp.JSON200.SizeBytes < d.PartSize {
return d.downloadObject(ctx, src, dst)
}

And maybe add a log to indicate why we fallback to download

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Don't think I need to log this decision
  • Kept the two if to make the code clear

const (
MinDownloadPartSize int64 = 1024 * 64 // 64KB
DefaultDownloadPartSize int64 = 1024 * 1024 * 8 // 8MB
DefaultDownloadConcurrency = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that one can be controlled by the user as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nopcoder nopcoder merged commit b26921a into master Jan 17, 2024
36 checks passed
@nopcoder nopcoder deleted the feature/lakectl-large-obj-dl branch January 17, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/lakectl Issues related to lakeFS' command line interface (lakectl) include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lakectl fs download - enhance speed for large file (presigned)
2 participants