-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azurestore: Buffer upload data on disk instead of in-memory #1070
Conversation
I'm wondering if writing to disk always makes sense since some azure hosting services might only allow mounting files from azure storage account (not sure, need to check - just got it for container apps env wrong). Looking at the |
It would seem odd to me that no temporary directory is provided for storing intermediate results. I would expect some temporary storage to always be available.
I am not sure about this. For S3, we split a PATCH request into multiple parts of a fixed size (by default 50MB), which are uploaded and then discarded as they come in. There is a limit on how many parts can be buffer concurrently (by default 20), so there is an upper limit on the memory consumption. For the azurestore, there is not such chunking, but instead the entire PATCH request is read into a temporary storage before being uploaded. If we want to reuse the part generator from S3, the azurestore must be adjusted so that it also uploads in smaller parts or otherwise we will have the same unlimited memory, that this PR tries to fix. Does that make sense? |
True, it is unusual to have no local storage, but there are VMs Dasv5 and Easv5 without any temp disk. But it is possible to attach "remote" storage disks so it sould be ok. Regarding the parts generator also true. I was just wondering why the S3 store has such sophisticated logic while the azurestore is a bit dumb :P. |
Well, the S3 API has a few restrictions (minimal size of upload parts and maximal number of total parts) that make this kind of logic necessary. I would prefer it if we didn't have to do much of this :) Would you be interested in testing this PR? Maybe you can run it locally and see if it works well for you? I don't use Azure normally, so a second pair of eyes would help catch issues. |
I did some testing running the container both locally and as Azure Container App and was unable to reproduce any high memory or out-of-memory problems. Existing upplication / test suite also worked "as normal". |
Ok, thank you very much! Then let's get this released! |
Closes #1031
Instructions for testing this out locally:
npm install -g azurite
brew install azure-cli
or similar depending on OSazurite --location ./azurite/ --debug ./azurite/debug.log
az storage container create --name mycontainer --connection-string "DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=https://127.0.0.1:10000/devstoreaccount1;"
AZURE_STORAGE_ACCOUNT=devstoreaccount1 AZURE_STORAGE_KEY=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== go run cmd/tusd/main.go -azure-endpoint https://127.0.0.1:10000/devstoreaccount1 -azure-storage mycontainer