-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate with object storage #8
Conversation
Source code: - Implement method for downloading objects from object storage, with support for both AWS S3 and Backblaze B2 - Add AWS session token support (aioaws PR) - Implement method for uploading objects to object storage, with support for both AWS S3 and Backblaze B2 - Add support for Backblaze B2 non-S3-compatible uploads - TODO: Finish implementing AWS session token support (aioaws PR) - TODO: Fix S3 upload policy conditions for bucket domains (aioaws PR) Tests: - Add pytest parametrized fixture to provide `S3Config`s for testing - Add integration tests that run on actual cloud infrastructure, with support for both AWS S3 and Backblaze B2 - Configure separate integration testing flag for Codecov - Add AWS login step to GitHub Actions workflow, which retrieves temporary credentials using OpenID Connect (OIDC) for testing Docs: - Add preliminary object storage documentation
Reverts commit 2bb9514, and swaps aioaws for just httpx
https://black.readthedocs.io/en/stable/change_log.html pallets/click#2225 psf/black#2964 Black was raising an error when used with pre-commit: `ImportError: cannot import name '_unicodefun' from 'click'`. Black 22.3.0 resolves the `ImportError`.
The object storage integration was initially implemented with aioaws, but fastenv had to augment aioaws in several ways. fastenv had to: - Allow config attributes to be `str | None`, instead of only `str` - Add separate config attributes for bucket host and bucket name, to prevent errors when parsing bucket names that contain `.` - Add AWS session token support to both the config and client classes - Add bucket encryption support to the client - Add a download method to the client - Override the type signature on the client upload method - Add Backblaze B2 support to the client It was faster and simpler to implement an object storage client here, instead of depending on aioaws. The object storage client was enabled by implementing AWS Signature Version 4. With this approach, the only additional dependency required to connect to object storage is HTTPX. Source code: - Refactor the config class with dataclasses - Implement AWS Signature Version 4 on the client class - Include AWS session token support in methods on the client class - Implement method for downloading objects from object storage, with support for both AWS S3 and Backblaze B2 - Implement method for uploading objects to object storage, with support for both AWS S3 and Backblaze B2 - Implement bucket encryption support for managed keys (SSE-B2/SSE-S3) Tests: - Add pytest parametrized fixture to provide configs for testing - Add freezegun to manipulate datetimes for testing - Add AWS login to GitHub Actions workflow, which retrieves temporary credentials using OpenID Connect (OIDC) for integration testing - Add integration tests that run on actual cloud infrastructure, with support for both AWS S3 and Backblaze B2 - Add extended timeouts and retries to the HTTPX clients in the integration tests, to account for potential network issues - Add pytest `--durations=0` flag to measure integration test durations - Configure separate integration testing flag for Codecov Docs: - Provide a basic tutorial on how to use the object storage integration - Include a deep dive on the AWS Signature Version 4 implementation - Compare various object storage platforms - Update the contributing guidelines with info on how to run integration tests locally
This commit will move `fastenv/cloud.py` to a new `fastenv.cloud` subpackage at `fastenv/cloud/object_storage.py`. The module and objects now have names that more specifically represent what they do, and the package will be able to accomodate additional cloud integrations in the future without jamming them all into `cloud.py`.
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/br3ndonland/fastenv/6V9V6SuJ6kgYu6jDnVUGpCQ1ZWSo |
* 'Refactored by Sourcery' * Reject Sourcery `use-dictionary-union` refactoring The dictionary union operator was introduced in Python 3.9. This project still supports Python 3.8. Attempting to use the union operator on Python 3.8 will result in errors, as seen in the failed mypy check on this PR. https://docs.python.org/3/whatsnew/3.9.html https://docs.sourcery.ai/refactorings/use-dictionary-union/ Co-authored-by: Brendon Smith <[email protected]>
* 'Refactored by Sourcery' * Reject Sourcery `use-dictionary-union` refactoring The dictionary union operator was introduced in Python 3.9. This project still supports Python 3.8. Attempting to use the union operator on Python 3.8 will result in errors, as seen in the failed mypy check on this PR. https://docs.python.org/3/whatsnew/3.9.html https://docs.sourcery.ai/refactorings/use-dictionary-union/
2072559
to
a34b9be
Compare
Incomplete URL substring sanitization (High): 'backblazeb2.com' may be at an arbitrary position in the sanitized URL.
Annotations are used in one of the code blocks in the cloud object storage docs. They were duplicated unnecessarily in an additional code block. This commit will remove the duplicate annotations. https://squidfunk.github.io/mkdocs-material/reference/annotations/
Sourcery Code Quality Report❌ Merging this PR will decrease code quality in the affected files by 5.33%.
Here are some functions in these files that still need a tune-up:
Legend and ExplanationThe emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request. Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Help us improve this quality report! |
#8 Presigned URL expiration time must be between one second and one week. https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html AWS does not specify a maxiumum expiration time for presigned POSTs. https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html
#8 Presigned URL expiration time must be between one second and one week. https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html AWS does not specify a maxiumum expiration time for presigned POSTs. https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html
The `fastenv.cloud.object_storage.ObjectStorageConfig` class is used to configure fastenv for connecting to object storage buckets. The class accepts a `bucket_host` in "virtual-hosted-style," like `<BUCKET_NAME>.s3.<REGION>.amazonaws.com` for AWS S3 or `<BUCKET_NAME>.s3.<REGION>.backblazeb2.com` for Backblaze B2. Object storage buckets commonly have HTTPS endpoints, and therefore some users may prepend the scheme ("https://") to their `bucket_host`. The scheme should be removed if present because it is added automatically when generating instances of `httpx.URL()`. This commit will update `fastenv.cloud.object_storage.ObjectStorageConfig` to remove the scheme. #8 https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
The `fastenv.cloud.object_storage.ObjectStorageConfig` class is used to configure fastenv for connecting to object storage buckets. The class accepts a `bucket_host` in "virtual-hosted-style," like `<BUCKET_NAME>.s3.<REGION>.amazonaws.com` for AWS S3 or `<BUCKET_NAME>.s3.<REGION>.backblazeb2.com` for Backblaze B2. It is common to append trailing slashes ("/") to URLs. These should be removed during configuration because some of the configuration logic uses the Python string method `string.endswith()` assuming there is no slash at the end. This commit will update `fastenv.cloud.object_storage.ObjectStorageConfig` to remove trailing slashes from `bucket_host`. #8 https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
Dotenv files are commonly kept in cloud object storage. fastenv provides an object storage client for downloading and uploading dotenv files. S3-compatible object storage allows uploads with either `POST` or `PUT`. This commit will implement uploads with `PUT`. The new `method` argument to `fastenv.ObjectStorageClient.upload()` will accept either `POST` or `PUT`. `POST` was previously the default. #8 `PUT` will now be the default. `PUT` uploads are more widely-supported and standardized. Backblaze B2 does not currently support single-part uploads with `POST` to their S3 API (the B2 native API must be used instead), and Cloudflare R2 does not support uploads with `POST` at all. https://www.backblaze.com/apidocs/b2-upload-file https://developers.cloudflare.com/r2/api/s3/presigned-urls/#supported-http-methods Files will be opened in binary mode and attached with the `content` argument (`httpx_client.put(content=content)`) as suggested in the HTTPX docs (https://www.python-httpx.org/compatibility/). Unlike downloads with `GET`, presigned `PUT` URL query parameters do not necessarily contain all the required information. Additional information may need to be supplied in request headers. In addition to supplying header keys and values with HTTP requests, header keys will be signed into the URL in the `X-Amz-SignedHeaders` query string parameter (alphabetically-sorted, semicolon-separated, lowercased). https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html These request headers can specify: - Object encryption. Encryption information can be specified with headers including `X-Amz-Server-Side-Encryption`. Note that, although similar headers like `X-Amz-Algorithm` are included as query string parameters in presigned URLs, `X-Amz-Server-Side-Encryption` is not. If `X-Amz-Server-Side-Encryption` is included in query string parameters, it may be silently ignored by the object storage platform. AWS S3 and Cloudflare R2 now automatically encrypt all objects, but Backblaze B2 will only automatically encrypt objects if the bucket has default encryption enabled. https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html https://www.backblaze.com/docs/cloud-storage-server-side-encryption https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html - Object integrity checks. The `Content-MD5` header defined by RFC 1864 can supply a base64-encoded MD5 checksum. After upload, the object storage platform server will calculate a checksum for the object in the same manner. If the client and server checksums are the same, all expected information was successfully sent to the server. If the checksums are different, this may mean that object information was lost in transit, and an error will be reported. Note that, although Backblaze B2 accepts and processes the `Content-MD5` header, it will report a SHA1 checksum to align with uploads to the B2-native API. https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html https://www.backblaze.com/docs/en/cloud-storage-file-information - Object metadata. Headers like `Content-Disposition`, `Content-Length`, and `Content-Type` can be supplied in request headers. https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html
Dotenv files are commonly kept in cloud object storage. fastenv provides an object storage client for downloading and uploading dotenv files. https://fastenv.bws.bio/cloud-object-storage The fastenv object storage client currently supports AWS S3 and Backblaze B2. There are many other object storage platforms with "S3-compatible" APIs, including Cloudflare R2. This commit will implement support for Cloudflare R2. https://developers.cloudflare.com/r2/ - Handle Cloudflare R2 bucket hosts - The fastenv object storage client works with "virtual-hosted-style" URLs, as this is the preferred format for AWS S3 - Cloudflare R2 "virtual-hosted-style" URLs, like `https://<BUCKET>.<ACCOUNT_ID>.r2.cloudflarestorage.com`, were implemented on 2022-05-16. - Handle Cloudflare R2 bucket region `auto` https://developers.cloudflare.com/r2/api/s3/api/ - Add Cloudflare R2 credentials to tests - Skip tests that do uploads with `POST` (Cloudflare R2 only supports uploads with `PUT` https://developers.cloudflare.com/r2/api/s3/presigned-urls/ - Update object storage docs with info on Cloudflare R2 #8 #25 [Cloudflare R2](https://developers.cloudflare.com/r2/)
Description
Dotenv files are commonly kept in cloud object storage, but environment variable management packages typically don't integrate with object storage clients. Additional logic is therefore required to download the files from object storage prior to loading environment variables.
This PR will provide an integration with S3-compatible object storage. AWS S3 and Backblaze B2 will be directly supported and tested.
Changes
Credential management
This PR will provide a configuration class to manage credentials and other information related to cloud object storage buckets.
<BUCKET_NAME>.s3.<REGION>.amazonaws.com
for AWS S3 or<BUCKET_NAME>.s3.<REGION>.backblazeb2.com
for Backblaze B2.AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
, andAWS_SESSION_TOKEN
, and the region from eitherAWS_S3_REGION
,AWS_REGION
, orAWS_DEFAULT_REGION
, in that order.AWS Signature Version 4
AWS Signature Version 4 is the secret sauce that allows requests to flow through AWS services. This PR will include a from-scratch implementation of AWS Signature Version 4, using only the Python standard library.
Object storage operations
With the from-scratch implementation of AWS Signature Version 4 included here, the only additional dependency required to connect to object storage is HTTPX.
Downloads
GET
can be authenticated by including AWS Signature Version 4 information either with query parameters or request headers. The code provided by this PR will use query parameters to generate presigned URLs. The advantage to presigned URLs with query parameters is that URLs can be used on their own.Uploads
POST
work differently than downloads withGET
. Instead of using query parameters like presigned URLs, uploads to AWS S3 withPOST
provide authentication information in form fields. An advantage of this approach is that it can also be used for browser-based uploads. This PR will implement uploads withPOST
.POST
upload process.General
AWS_SESSION_TOKEN
/X-Amz-Security-Token
)Tests
--durations
flag to measure integration test durationsDocs
Related
Authorization
headerPOST