Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use requests stream and shutil.copyfileobj to constrain memory usage during resource copy #236

Merged

Conversation

whargrove
Copy link
Contributor

Link to Relevant Issue

This pull request resolves #235

Description of Changes

  • When copying resources from a remote origin over HTTP(S) prefer to stream the response body and copy chunks into the destination file insead of loading the entire file into memory first before writing.
  • Before the change, running resource_copy("uri") in python REPL would use unbounded rss (up to file size). After the change, running the same in python REPL uses ~300M rss.

- When copying resources from a remote origin over HTTP(S) prefer
  to stream the response body and copy chunks into the destination
  file insead of loading the entire file into memory first before
  writing.
@codecov
Copy link

codecov bot commented May 13, 2023

Codecov Report

Merging #236 (0964eb4) into main (6c1c1ff) will increase coverage by 0.45%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
+ Coverage   71.67%   72.12%   +0.45%     
==========================================
  Files          50       50              
  Lines        3329     3376      +47     
==========================================
+ Hits         2386     2435      +49     
+ Misses        943      941       -2     
Impacted Files Coverage Δ
cdp_backend/tests/utils/test_file_utils.py 100.00% <100.00%> (ø)
cdp_backend/utils/file_utils.py 90.47% <100.00%> (+2.18%) ⬆️

Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks! I am out right now so I will merge this now but can release later today in like 6 hours.

@evamaxfield evamaxfield merged commit 703d7f1 into CouncilDataProject:main May 14, 2023
@whargrove whargrove deleted the bug/stream-resource-copy branch May 14, 2023 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inefficient usage of requests.get for very large videos causes event gather to fail
2 participants