Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc(decision): Batch multiple files together into single large file to improve network throughput #98
base: main
Are you sure you want to change the base?
rfc(decision): Batch multiple files together into single large file to improve network throughput #98
Changes from 1 commit
912301d
7dfe76c
f82f9e6
721c28c
4ff4376
4ec773a
86c1819
9f9d123
8b25e3b
d34401e
8af3d50
812a110
f5fe8e8
78a0d6e
82f222d
a7ad1d6
d63a30a
354431d
15430b4
71f1715
3759346
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the cost is mostly due to write, at scale is this cost problematic? Like is it a blocker for reaching higher scale or are we simply looking for a more efficient option ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe cost will prevent us from reaching greater scale. Write cost scale linearly. Pricing does not quite scale linearly but if you're happy with how much GCS costs at low levels of demand you will be happy at peak demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since encryption is used here entirely to make deletion of individual parts quicker, could you please expand on:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to support GDPR, project, and point deletes. I consider it a high priority. The alternative is rotating the file with the offending parts removed.
The operations is no particular order:
Repeat for every delete operation. Deletes must be single-threaded per file to prevent concurrent access. You can use kafka and partition on filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KMS ?
Are you talking about GCP KMS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google Key Management Service. Subject to change.