Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurestore: Buffer upload data on disk instead of in-memory #1070

Merged
merged 1 commit into from
Feb 21, 2024

Conversation

Acconut
Copy link
Member

@Acconut Acconut commented Jan 23, 2024

Closes #1031

Instructions for testing this out locally:

  1. npm install -g azurite
  2. brew install azure-cli or similar depending on OS
  3. Start Azurite: azurite --location ./azurite/ --debug ./azurite/debug.log
  4. Create container: az storage container create --name mycontainer --connection-string "DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=https://127.0.0.1:10000/devstoreaccount1;"
  5. Start tusd: AZURE_STORAGE_ACCOUNT=devstoreaccount1 AZURE_STORAGE_KEY=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== go run cmd/tusd/main.go -azure-endpoint https://127.0.0.1:10000/devstoreaccount1 -azure-storage mycontainer

@quality-leftovers
Copy link

quality-leftovers commented Feb 12, 2024

I'm wondering if writing to disk always makes sense since some azure hosting services might only allow mounting files from azure storage account (not sure, need to check - just got it for container apps env wrong).

Looking at the s3store_part_producer it seems to solve this already since it supports a _memory option for the parts producer. There also seems to be nothing s3 specific to the producer. Wouldn't it be possible to re-use the parts generator here?

@Acconut
Copy link
Member Author

Acconut commented Feb 13, 2024

I'm wondering if writing to disk always makes sense since some azure hosting services might only allow mounting files from azure storage account (not sure, need to check - just got it for container apps env wrong).

It would seem odd to me that no temporary directory is provided for storing intermediate results. I would expect some temporary storage to always be available.

Looking at the s3store_part_producer it seems to solve this already since it supports a _memory option for the parts producer. There also seems to be nothing s3 specific to the producer. Wouldn't it be possible to re-use the parts generator here?

I am not sure about this. For S3, we split a PATCH request into multiple parts of a fixed size (by default 50MB), which are uploaded and then discarded as they come in. There is a limit on how many parts can be buffer concurrently (by default 20), so there is an upper limit on the memory consumption.

For the azurestore, there is not such chunking, but instead the entire PATCH request is read into a temporary storage before being uploaded. If we want to reuse the part generator from S3, the azurestore must be adjusted so that it also uploads in smaller parts or otherwise we will have the same unlimited memory, that this PR tries to fix.

Does that make sense?

@quality-leftovers
Copy link

True, it is unusual to have no local storage, but there are VMs Dasv5 and Easv5 without any temp disk. But it is possible to attach "remote" storage disks so it sould be ok.

Regarding the parts generator also true. I was just wondering why the S3 store has such sophisticated logic while the azurestore is a bit dumb :P.

@Acconut
Copy link
Member Author

Acconut commented Feb 15, 2024

Regarding the parts generator also true. I was just wondering why the S3 store has such sophisticated logic while the azurestore is a bit dumb :P.

Well, the S3 API has a few restrictions (minimal size of upload parts and maximal number of total parts) that make this kind of logic necessary. I would prefer it if we didn't have to do much of this :)

Would you be interested in testing this PR? Maybe you can run it locally and see if it works well for you? I don't use Azure normally, so a second pair of eyes would help catch issues.

@quality-leftovers
Copy link

I did some testing running the container both locally and as Azure Container App and was unable to reproduce any high memory or out-of-memory problems. Existing upplication / test suite also worked "as normal".

@Acconut Acconut marked this pull request as ready for review February 21, 2024 11:35
@Acconut
Copy link
Member Author

Acconut commented Feb 21, 2024

Ok, thank you very much! Then let's get this released!

@Acconut Acconut merged commit 809a8c8 into main Feb 21, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High memory usage when uploading to Azure Storage
2 participants