Add high-level overview

getsentry · cmanallen · May 24, 2023 · May 24, 2023 · May 24, 2023 · May 24, 2023
commit 82f222da818902458342a709868acaa74153b228
diff --git a/text/0098-store-multiple-replay-segments-in-a-single-blob.md b/text/0098-store-multiple-replay-segments-in-a-single-blob.md
@@ -10,7 +10,7 @@ Recording data is sent in segments. Each segment is written to its own file. Wri
 # Motivation
 
 1. Minimize costs.
-2. Improve throughput.
+2. Increase write throughput.
 3. Enable new features in a cost-effective manner.
 
 # Background
@@ -23,6 +23,26 @@ Google Cloud Storage lists the costs for writing and storing data as two separat
 
 In practical terms, this means 75% of our spend is allocated to writing new files.
 
+# High Level Overview
+
+1. We will store multiple "parts" per file.
+ - A "part" is a distinct blob of binary data.
+ - It exists as a subset of bytes within a larger set of bytes (referred to as a "file").
+ - A "part" could refer to a replay segment or to a sourcemap or anything that requires storage in a blob storage service.
+2. Each "part" within a file will be encrypted.
+ - Encryption provides instantaneous deletes (by deleting the row containing the encryption key) and removes the need to remove the byte sub-sequences from a blob.
+ - We will use envelope encryption to protect the contents of every file.
+ - https://cloud.google.com/kms/docs/envelope-encryption
+ - Related, contiguous byte ranges will be encrypted independently of the rest of the file.
+ - We will use KMS to manage our key-encryption-keys.
+ - Data-encryption-keys will be generated locally and will be unique.
+3. Parts will be tracked in a metadata table on an AlloyDB instance(s).
+ - A full table schema is provided in the **Proposal** section.
+ - AlloyDB was chosen because its a managed database with strong point-query performace.
+ - The metadata table will contain the key used to decrypt the byte range.
+4. On read, parts will be fetched without fetching the full file.
+ - More details are provided in the **Technical Details** section.
+
 # Proposal
 
 First, a new table called "file_part_byte_range" with the following structure is created: