Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc(decision): Batch multiple files together into single large file to improve network throughput #98

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Initial commit
  • Loading branch information
cmanallen committed May 24, 2023
commit 7dfe76c17874505a95f9c9821838aea53fd794ac
63 changes: 53 additions & 10 deletions text/0098-store-multiple-replay-segments-in-a-single-blob.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,31 +5,74 @@

# Summary

One paragraph explanation of the feature or document purpose.
Recording data is sent in segments. Currently each segment is written to its own file. Writing files is the most expensive component of our GCS usage. It is also the most expensive component, in terms of time, in our processing pipeline. By merging many segment files together into a single file we can minimize our costs and maximize our Kafka consumer's throughput.

# Motivation

Why are we doing this? What use cases does it support? What is the expected outcome?
1. Minimize costs.
2. Improve throughput.

An ancillary goal is to provide the front-end with more precise slices of data. For example, if the client needs all of the network requests for a given replay it can fetch those offsets without fetching the remaining recording data.

# Background

The reason this decision or document is required. This section might not always exist.
This document exists to inform all relevant stakeholders of our proposal and seek feedback prior to implementation.

# Supporting Data

[Metrics to help support your decision (if applicable).]
Google Cloud Storage lists the costs for writing and storing data as two separate categories. Writing a file costs $0.005 per 1000 files. Storing that file costs $0.02 per gigabyte. For the average file this works out to: $0.000000012 for storage and $0.00000005 for the write.

In practical terms, this means 75% of our spend is allocated to writing new files.
cmanallen marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the cost is mostly due to write, at scale is this cost problematic? Like is it a blocker for reaching higher scale or are we simply looking for a more efficient option ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe cost will prevent us from reaching greater scale. Write cost scale linearly. Pricing does not quite scale linearly but if you're happy with how much GCS costs at low levels of demand you will be happy at peak demand.


# Options Considered

If an RFC does not know yet what the options are, it can propose multiple options. The
preferred model is to propose one option and to provide alternatives.
First, a new table called "recording_byte_range" with the following structure is created:

| replay_id | segment_id | filename | start | stop |
| --------- | ---------- | -------- | ----- | ----- |
| A | 0 | file.txt | 0 | 6241 |
| B | 0 | file.txt | 6242 | 8213 |
| A | 1 | file.txt | 8213 | 12457 |

This table will need to support, at a minimum, one write per segment. Currently, we recieve ~350 segments per second at peak load.

Second, the Session Replay recording consumer will not _commit_ blob data to GCS for each segment. Instead it will buffer many segments and flush them all together as a single blob to GCS. In this step it will also make a bulk insertion into the database.

```mermaid
A[Message Received] --> B[Message Processing]
B --> C[Message Buffer]
C --> D{Has the insert trigger been reached?}
D -- No --> A
D -- Yes --> E[Write Single File to GCS]
E --> F[Bulk Insert Byte Ranges]
```

Third, when a client requests recording data we will look it up in the "recording_byte_range" table. From it's response, we will issue as many fetch requests as there are rows in the response. These requests may target a single file or many files. The files will be fetched with a special header that instructs the service provider to only respond with a subset of the bytes. Specifically, the bytes that related to our replay.
cmanallen marked this conversation as resolved.
Show resolved Hide resolved

The response bytes will be decompressed, merged into a single payload, and returned to the user as they are now.

# Drawbacks

Why should we not do this? What are the drawbacks of this RFC or a particular option if
multiple options are presented.
- Deleting recording data from a GDPR request, project deletion, or a user delete request will require downloading the file, overwriting the bytes within the deleted range with null bytes (`\x00`) before re-uploading the file.
- This will reset the retention period.
- This is an expensive operation and depending on the size of the project being deleted a very time consuming operation.

# Unresolved questions

- What parts of the design do you expect to resolve through this RFC?
- What issues are out of scope for this RFC but are known?
1. Can we keep the data in GCS but make it inaccessible?

- User and project deletes could leave their data orphaned in GCS.
- We would remove all capability to access it making it functionally deleted.
- GDPR deletes will likely require overwriting the range but if they're limited in scope that should be acceptable.
- Single replays, small projects, or if the mechanism is infrequently used should make this a valid deletion mechanism.
- The data could be encrypted, with its key stored on the metadata row, making it unreadable upon delete.

2. What datastore should we use to store the byte range information?

- Cassandra, Postgres, AlloyDB?
- Postgres likely won't be able to keep up long-term.
cmanallen marked this conversation as resolved.
Show resolved Hide resolved
- Especially if we write multiple byte ranges per segment.
- Cassandra could be a good choice but its not clear what operational burden this imposes on SnS and Ops.
- AlloyDB seems popular among the SnS team and could be a good choice.
- It can likely interface with the Django ORM. But its not clear to me at the time of writing.
- Whatever database we use must support deletes.