Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc(decision): Batch multiple files together into single large file to improve network throughput #98

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Use clearer language
  • Loading branch information
cmanallen committed May 25, 2023
commit 8b25e3b64e588b2ec4877826cdedf2022c36ae43
15 changes: 7 additions & 8 deletions text/0098-store-multiple-replay-segments-in-a-single-blob.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,18 @@ The response bytes will be decompressed, merged into a single payload, and retur

# Drawbacks

- Deleting recording data from a GDPR request, project deletion, or a user delete request will require downloading the file, overwriting the bytes within the deleted range with null bytes (`\x00`) before re-uploading the file.
- This will reset the retention period.
- This is an expensive operation and depending on the size of the project being deleted a very time consuming operation.
1. Deleting data becomes tricky. See "Unresolved Questions".
cmanallen marked this conversation as resolved.
Show resolved Hide resolved

# Unresolved Questions

1. Can we keep the data in GCS but make it inaccessible?
1. Can we keep deleted data in GCS but make it inaccessible?
mdtro marked this conversation as resolved.
Show resolved Hide resolved

- User and project deletes could leave their data orphaned in GCS.
- User and project deletes:
- We would remove all capability to access it making it functionally deleted.
- GDPR deletes will likely require overwriting the range but if they're limited in scope that should be acceptable.
- Single replays, small projects, or if the mechanism is infrequently used should make this a valid deletion mechanism.
- The data could be encrypted, with its key stored on the metadata row, making it unreadable upon delete.
- GDPR deletes:
- Would this require downloading the file, over-writing the subsequence of bytes, and re-uploading a new file?
- Single replays, small projects, or if the mechanism is infrequently used could make this a valid deletion mechanism.
- The data could be encrypted, with some encryption key stored on the metadata row, making the byte sequence unreadable upon row delete.

2. What datastore should we use to store the byte range information?

Expand Down