getsentry · mitsuhiko · Jul 13, 2023 · Jul 13, 2023 · Jul 13, 2023 · Jul 20, 2023
diff --git a/text/0108-filestore-new.md b/text/0108-filestore-new.md
@@ -0,0 +1,95 @@
+- Start Date: 2023-07-13
+- RFC Type: informational
+- RFC PR: https://github.com/getsentry/rfcs/pull/108
+- RFC Status: draft
+
+# Summary
+
+One of the systems that Sentry internally operates today is an abstract concept referred
+to as "file store". It consists of a postgres level infrastructure to refer to blobs and
+a go service also called "file store" which acts as a stateful proxy in front of GCS to
+deal with latency spikes, write throughput and caching.
+
+This RFC summarizes issues with the current approach, the changed requirements that go
+into this system and proposes a path forward.
+
+# Motivation
+
+Various issues have ocurred over the years with this system so that some decisions were
+made that over time have resulted in new requirements for filestore and alternative
+implementations. Replay for instance operates a seperate infrastructure that goes
+straight to GCS but is running into write throughput issues that file store the Go service
+solves. On the other hand race conditions and complex blob book-keeping in Sentry itself
+prevent expiring of debug files and source maps after a period of time.
+
+The motivation of this RFC is to summarize the current state of affairs, work streams that
+are currently planned are are in motion to come to a better conclusion about what should be
+done with the internal abstractions and how they should be used.
+
+# Background
+
+The primary internal abstraction in Sentry today is the `filestore` service which itself
+is built on top of Django's `files` system. At this level "files" have names and they
+are stored on a specific GCS bucket (or an alternative backend). On top of that the `files`
+models are built. There each file is created out of blobs where each blob is stored
+(deduplicated) just once in the backend of `filestore`.
+
+For this purpose each blob is given a unique filename (a UUID). Blobs are deduplicated
+by content hash and only stored once. This causes some challenge to the system as it
+means that the deletion of blobs has to be driven by the system as auto-expiration is
+thus no longer possible.
+
+# Supporting Data
+
+We currently store petabytes of file assets we would like to delete.
+
+# Possible Changes
+
+These are some plans about what can be done to improve the system:
+
+## Removal of Blob Deduplication
+
+Today it's not possible for us to use GCS side expiration. That's because without the
+knowledge of the usage of blobs from the database it's not save to delete blobs. This
+can be resolved by removing deduplication. Blobs thus would be written more than once.
+This works on the `filestore` level, but it does not work on the `FileBlob` level.
+However `FileBlob` itself is rather well abstracted away from most users. A new model
+could be added to replace the one one. One area where `FileBlob` leaks out is the
+data export system which would need to be considered.
+
+`FileBlobOwner` itself could be fully removed, same with `FileBlobIndex` as once
+deduplication is removed the need of the owner info no longer exists, and the index
+info itself can be stored on the blob itself.
+
+```python
+class FileBlob2(Model):
+ organization_id = BoundedBigIntegerField(db_index=True)
+ path = TextField(null=True)
+ offset = BoundedPositiveIntegerField()
+ size = BoundedPositiveIntegerField()
+ checksum = CharField(max_length=40, unique=True)
+ timestamp = DateTimeField(default=timezone.now, db_index=True)
+```
+
+## TTL Awareness
+
+The abstractions in place today do have any support for storage classes. Once however
+blobs are deduplicated it would be possible to fully rely on GCS to clean up on it's own.
+Because certain operations are going via our filestore proxy service, it would be preferrable
+if the policies were encoded into the URL in one form or another.
+
+## Assemble Staging Area
+
+The chunk upload today depends on the ability to place blob by blob somewhere. Once blobs are
+stored regularly in GCS there is no significant reason to slice them up into small pieces as
+range requests are possible. This means that the assembly of the file needs to be reconsidered.
+
+The easiest solution here would be to allow chunks to be uploaded to a per-org staging area where
+they linger for up to two hours per blob. That gives plenty of time to use these blobs for
+assembly. A cleanup job (or TTL policy if placed in GCS) would then collect the leftovers
+automatically. This also detaches the coupling of external blob sizes from internal blob
+storage which gives us the ability to change blob sizes as we see fit.
+
+# Unresolved questions
+
+TBD