Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with object storage #8

Merged
merged 9 commits into from
Apr 9, 2022
Merged

Integrate with object storage #8

merged 9 commits into from
Apr 9, 2022

Conversation

br3ndonland
Copy link
Owner

Description

Dotenv files are commonly kept in cloud object storage, but environment variable management packages typically don't integrate with object storage clients. Additional logic is therefore required to download the files from object storage prior to loading environment variables.

This PR will provide an integration with S3-compatible object storage. AWS S3 and Backblaze B2 will be directly supported and tested.

Changes

Credential management

This PR will provide a configuration class to manage credentials and other information related to cloud object storage buckets.

  • Buckets can be specified in "virtual-hosted-style", like <BUCKET_NAME>.s3.<REGION>.amazonaws.com for AWS S3 or <BUCKET_NAME>.s3.<REGION>.backblazeb2.com for Backblaze B2.
  • If credentials are not provided as arguments, this class will auto-detect configuration from the default AWS environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN, and the region from either AWS_S3_REGION, AWS_REGION, or AWS_DEFAULT_REGION, in that order.
  • Boto3 detects credentials from several other locations, including credential files and instance metadata endpoints. These other locations are not currently supported.

AWS Signature Version 4

AWS Signature Version 4 is the secret sauce that allows requests to flow through AWS services. This PR will include a from-scratch implementation of AWS Signature Version 4, using only the Python standard library.

Object storage operations

With the from-scratch implementation of AWS Signature Version 4 included here, the only additional dependency required to connect to object storage is HTTPX.

Downloads

  • Downloads with GET can be authenticated by including AWS Signature Version 4 information either with query parameters or request headers. The code provided by this PR will use query parameters to generate presigned URLs. The advantage to presigned URLs with query parameters is that URLs can be used on their own.
  • The download method will generate a presigned URL, use it to download file contents, and either save the contents to a file or return the contents as a string.

Uploads

  • Uploads with POST work differently than downloads with GET. Instead of using query parameters like presigned URLs, uploads to AWS S3 with POST provide authentication information in form fields. An advantage of this approach is that it can also be used for browser-based uploads. This PR will implement uploads with POST.
  • Backblaze B2 uploads are different, though there are good reasons for this (helps keep costs low). This PR will also include an implementation of the Backblaze B2 POST upload process.
  • The upload method will upload source contents to an object storage bucket, selecting the appropriate upload strategy based on the cloud platform being used.

General

Tests

  • Add pytest parametrized fixture to provide configs for testing
  • Add freezegun to manipulate datetimes for testing
  • Add AWS login to GitHub Actions workflow, which retrieves temporary credentials using OpenID Connect (OIDC) for integration testing
  • Add integration tests that run on actual cloud infrastructure, with support for both AWS S3 and Backblaze B2
  • Add extended timeouts and retries to the HTTPX clients in the integration tests, to account for potential network issues
  • Add pytest --durations flag to measure integration test durations
  • Configure separate integration testing flag for Codecov

Docs

  • Provide a basic tutorial on how to use the object storage integration
  • Include a deep dive on the AWS Signature Version 4 implementation
  • Compare various object storage platforms
  • Update the contributing guidelines with info on how to run integration tests locally

Related

Source code:

- Implement method for downloading objects from object storage, with
  support for both AWS S3 and Backblaze B2
  - Add AWS session token support  (aioaws PR)
- Implement method for uploading objects to object storage, with
  support for both AWS S3 and Backblaze B2
  - Add support for Backblaze B2 non-S3-compatible uploads
  - TODO: Finish implementing AWS session token support  (aioaws PR)
  - TODO: Fix S3 upload policy conditions for bucket domains (aioaws PR)

Tests:

- Add pytest parametrized fixture to provide `S3Config`s for testing
- Add integration tests that run on actual cloud infrastructure, with
  support for both AWS S3 and Backblaze B2
- Configure separate integration testing flag for Codecov
- Add AWS login step to GitHub Actions workflow, which retrieves
  temporary credentials using OpenID Connect (OIDC) for testing

Docs:

- Add preliminary object storage documentation
Reverts commit 2bb9514, and swaps aioaws for just httpx
https://black.readthedocs.io/en/stable/change_log.html
pallets/click#2225
psf/black#2964

Black was raising an error when used with pre-commit:
`ImportError: cannot import name '_unicodefun' from 'click'`.
Black 22.3.0 resolves the `ImportError`.
The object storage integration was initially implemented with aioaws,
but fastenv had to augment aioaws in several ways. fastenv had to:

- Allow config attributes to be `str | None`, instead of only `str`
- Add separate config attributes for bucket host and bucket name,
  to prevent errors when parsing bucket names that contain `.`
- Add AWS session token support to both the config and client classes
- Add bucket encryption support to the client
- Add a download method to the client
- Override the type signature on the client upload method
- Add Backblaze B2 support to the client

It was faster and simpler to implement an object storage client here,
instead of depending on aioaws. The object storage client was enabled by
implementing AWS Signature Version 4. With this approach, the only
additional dependency required to connect to object storage is HTTPX.

Source code:

- Refactor the config class with dataclasses
- Implement AWS Signature Version 4 on the client class
- Include AWS session token support in methods on the client class
- Implement method for downloading objects from object storage, with
  support for both AWS S3 and Backblaze B2
- Implement method for uploading objects to object storage,
  with support for both AWS S3 and Backblaze B2
- Implement bucket encryption support for managed keys (SSE-B2/SSE-S3)

Tests:

- Add pytest parametrized fixture to provide configs for testing
- Add freezegun to manipulate datetimes for testing
- Add AWS login to GitHub Actions workflow, which retrieves temporary
  credentials using OpenID Connect (OIDC) for integration testing
- Add integration tests that run on actual cloud infrastructure,
  with support for both AWS S3 and Backblaze B2
- Add extended timeouts and retries to the HTTPX clients in the
  integration tests, to account for potential network issues
- Add pytest `--durations=0` flag to measure integration test durations
- Configure separate integration testing flag for Codecov

Docs:

- Provide a basic tutorial on how to use the object storage integration
- Include a deep dive on the AWS Signature Version 4 implementation
- Compare various object storage platforms
- Update the contributing guidelines with info on how to run integration
  tests locally
This commit will move `fastenv/cloud.py` to a new `fastenv.cloud`
subpackage at `fastenv/cloud/object_storage.py`. The module and objects
now have names that more specifically represent what they do, and the
package will be able to accomodate additional cloud integrations in the
future without jamming them all into `cloud.py`.
@vercel
Copy link

vercel bot commented Apr 9, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/br3ndonland/fastenv/6V9V6SuJ6kgYu6jDnVUGpCQ1ZWSo
✅ Preview: https://fastenv-git-object-storage-br3ndonland.vercel.app

br3ndonland added a commit that referenced this pull request Apr 9, 2022
* 'Refactored by Sourcery'

* Reject Sourcery `use-dictionary-union` refactoring

The dictionary union operator was introduced in Python 3.9.
This project still supports Python 3.8.
Attempting to use the union operator on Python 3.8 will
result in errors, as seen in the failed mypy check on this PR.

https://docs.python.org/3/whatsnew/3.9.html
https://docs.sourcery.ai/refactorings/use-dictionary-union/

Co-authored-by: Brendon Smith <[email protected]>
* 'Refactored by Sourcery'

* Reject Sourcery `use-dictionary-union` refactoring

The dictionary union operator was introduced in Python 3.9.
This project still supports Python 3.8.
Attempting to use the union operator on Python 3.8 will
result in errors, as seen in the failed mypy check on this PR.

https://docs.python.org/3/whatsnew/3.9.html
https://docs.sourcery.ai/refactorings/use-dictionary-union/
Incomplete URL substring sanitization (High):
'backblazeb2.com' may be at an arbitrary position in the sanitized URL.
Annotations are used in one of the code blocks in the cloud object
storage docs. They were duplicated unnecessarily in an additional
code block. This commit will remove the duplicate annotations.
https://squidfunk.github.io/mkdocs-material/reference/annotations/
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Apr 9, 2022

Sourcery Code Quality Report

❌  Merging this PR will decrease code quality in the affected files by 5.33%.

Quality metrics Before After Change
Complexity 0.23 ⭐ 1.02 ⭐ 0.79 👎
Method Length 27.64 ⭐ 36.00 ⭐ 8.36 👎
Working memory 5.21 ⭐ 6.07 ⭐ 0.86 👎
Quality 89.92% 84.59% -5.33% 👎
Other metrics Before After Change
Lines 310 610 300
Changed files Quality Before Quality After Quality Change
fastenv/init.py 93.71% ⭐ 91.22% ⭐ -2.49% 👎
tests/conftest.py 89.86% ⭐ 84.43% ⭐ -5.43% 👎

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

@br3ndonland br3ndonland merged commit b06add0 into develop Apr 9, 2022
@br3ndonland br3ndonland deleted the object-storage branch April 9, 2022 21:24
br3ndonland added a commit that referenced this pull request Apr 11, 2022
#8

Presigned URL expiration time must be between one second and one week.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html

AWS does not specify a maxiumum expiration time for presigned POSTs.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html
br3ndonland added a commit that referenced this pull request Apr 11, 2022
#8

Presigned URL expiration time must be between one second and one week.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html

AWS does not specify a maxiumum expiration time for presigned POSTs.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html
br3ndonland added a commit that referenced this pull request Jan 14, 2024
The `fastenv.cloud.object_storage.ObjectStorageConfig` class is used to
configure fastenv for connecting to object storage buckets. The class
accepts a `bucket_host` in "virtual-hosted-style," like
`<BUCKET_NAME>.s3.<REGION>.amazonaws.com` for AWS S3 or
`<BUCKET_NAME>.s3.<REGION>.backblazeb2.com` for Backblaze B2.

Object storage buckets commonly have HTTPS endpoints, and therefore some
users may prepend the scheme ("https://") to their `bucket_host`. The
scheme should be removed if present because it is added automatically
when generating instances of `httpx.URL()`.

This commit will update
`fastenv.cloud.object_storage.ObjectStorageConfig` to remove the scheme.

#8
https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
br3ndonland added a commit that referenced this pull request Jan 15, 2024
The `fastenv.cloud.object_storage.ObjectStorageConfig` class is used to
configure fastenv for connecting to object storage buckets. The class
accepts a `bucket_host` in "virtual-hosted-style," like
`<BUCKET_NAME>.s3.<REGION>.amazonaws.com` for AWS S3 or
`<BUCKET_NAME>.s3.<REGION>.backblazeb2.com` for Backblaze B2.

It is common to append trailing slashes ("/") to URLs. These should be
removed during configuration because some of the configuration logic
uses the Python string method `string.endswith()` assuming there is no
slash at the end.

This commit will update
`fastenv.cloud.object_storage.ObjectStorageConfig`
to remove trailing slashes from `bucket_host`.

#8
https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
br3ndonland added a commit that referenced this pull request Jan 28, 2024
Dotenv files are commonly kept in cloud object storage. fastenv provides
an object storage client for downloading and uploading dotenv files.

S3-compatible object storage allows uploads with either `POST` or `PUT`.
This commit will implement uploads with `PUT`.

The new `method` argument to `fastenv.ObjectStorageClient.upload()` will
accept either `POST` or `PUT`. `POST` was previously the default.
#8

`PUT` will now be the default. `PUT` uploads are more widely-supported
and standardized. Backblaze B2 does not currently support single-part
uploads with `POST` to their S3 API (the B2 native API must be used
instead), and Cloudflare R2 does not support uploads with `POST` at all.
https://www.backblaze.com/apidocs/b2-upload-file
https://developers.cloudflare.com/r2/api/s3/presigned-urls/#supported-http-methods

Files will be opened in binary mode and attached with the `content`
argument (`httpx_client.put(content=content)`) as suggested in the
HTTPX docs (https://www.python-httpx.org/compatibility/).

Unlike downloads with `GET`, presigned `PUT` URL query parameters do not
necessarily contain all the required information. Additional information
may need to be supplied in request headers. In addition to supplying
header keys and values with HTTP requests, header keys will be signed
into the URL in the `X-Amz-SignedHeaders` query string parameter
(alphabetically-sorted, semicolon-separated, lowercased).
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html

These request headers can specify:

- Object encryption. Encryption information can be specified with
  headers including `X-Amz-Server-Side-Encryption`. Note that, although
  similar headers like `X-Amz-Algorithm` are included as query string
  parameters in presigned URLs, `X-Amz-Server-Side-Encryption` is not.
  If `X-Amz-Server-Side-Encryption` is included in query string
  parameters, it may be silently ignored by the object storage platform.
  AWS S3 and Cloudflare R2 now automatically encrypt all objects, but
  Backblaze B2 will only automatically encrypt objects if the bucket
  has default encryption enabled.
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html
  https://www.backblaze.com/docs/cloud-storage-server-side-encryption
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-encryption-faq.html
- Object integrity checks. The `Content-MD5` header defined by RFC 1864
  can supply a base64-encoded MD5 checksum. After upload, the object
  storage platform server will calculate a checksum for the object in
  the same manner. If the client and server checksums are the same, all
  expected information was successfully sent to the server. If the
  checksums are different, this may mean that object information was
  lost in transit, and an error will be reported. Note that, although
  Backblaze B2 accepts and processes the `Content-MD5` header, it will
  report a SHA1 checksum to align with uploads to the B2-native API.
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
  https://www.backblaze.com/docs/en/cloud-storage-file-information
- Object metadata. Headers like `Content-Disposition`, `Content-Length`,
  and `Content-Type` can be supplied in request headers.
  https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html
br3ndonland added a commit that referenced this pull request Jan 29, 2024
Dotenv files are commonly kept in cloud object storage. fastenv provides
an object storage client for downloading and uploading dotenv files.
https://fastenv.bws.bio/cloud-object-storage

The fastenv object storage client currently supports AWS S3 and
Backblaze B2. There are many other object storage platforms with
"S3-compatible" APIs, including Cloudflare R2. This commit will
implement support for Cloudflare R2.
https://developers.cloudflare.com/r2/

- Handle Cloudflare R2 bucket hosts
  - The fastenv object storage client works with "virtual-hosted-style"
    URLs, as this is the preferred format for AWS S3
  - Cloudflare R2 "virtual-hosted-style" URLs, like
    `https://<BUCKET>.<ACCOUNT_ID>.r2.cloudflarestorage.com`, were
    implemented on 2022-05-16.
- Handle Cloudflare R2 bucket region `auto`
  https://developers.cloudflare.com/r2/api/s3/api/
- Add Cloudflare R2 credentials to tests
- Skip tests that do uploads with `POST` (Cloudflare R2 only supports
  uploads with `PUT`
  https://developers.cloudflare.com/r2/api/s3/presigned-urls/
- Update object storage docs with info on Cloudflare R2

#8
#25
[Cloudflare R2](https://developers.cloudflare.com/r2/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant