Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc(informational): Combined Replay Envelope Item #82

Merged
merged 9 commits into from
Jun 9, 2023

Conversation

JoshFerge
Copy link
Member

@JoshFerge JoshFerge commented Mar 24, 2023

@JoshFerge JoshFerge force-pushed the rfc/combined-replay-envelope-item branch from 22415f0 to 050fac5 Compare March 24, 2023 20:16
@JoshFerge JoshFerge marked this pull request as ready for review April 11, 2023 19:06
@JoshFerge JoshFerge requested a review from a team April 11, 2023 19:06
@JoshFerge JoshFerge requested a review from a team April 20, 2023 20:05
- ReplayEvent contains indexable metadata that is stored in our clickhouse table.
- ReplayRecording contains a large, usually compressed blob that is written to our Filestore (GCS as of now)

These are always sent in the same envelope by the sdk. We'd like to combine these two envelope item types into one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some very old SDKs from alpha still out there that sent two separate requests. As far as I'm aware none of these SDKs are paying customers and they may have stopped ingesting totally.

I think its worthwhile documenting what happens in the event of a missing envelope item. I assume we should 400 or otherwise reject in some meaningful way. Something that tells the client to stop sending.

text/0082-combined-replay-envelope-item.md Outdated Show resolved Hide resolved

### Replay Recording Consumer Changes

1. Create a new ReplayRecording consumer that can run along-side the existing consumer, as there will be a change-over period
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a new consumer. We can switch on the type field in the kafka payload. Right now the struct name is converted to snake-case and applied as the "type" value of the serialized object. We would update our consumer to accept the new type and then process accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. 👍🏼


# Unresolved questions

- Is there a better format for the combined envelope item type? If we split on \n, this means the replay_event should never have a newline in it. I believe this is acceptable, but is there a better format for sending a combined JSON / binary piece of data?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caught this at the very end. I'm leaving my previous feedback in place :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags could have a new-line character.

# Unresolved questions

- Is there a better format for the combined envelope item type? If we split on \n, this means the replay_event should never have a newline in it. I believe this is acceptable, but is there a better format for sending a combined JSON / binary piece of data?
- If rate limits are applied before processing, it seems like we'll need to add a new rate limit for this combined item. This should be okay, anything else to think about here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. If rate-limits are applied before processing then this merged type doesn't exist yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this would only be the case when people are on new SDKs that send the proposed new combined envelope format

- The combined Payload will have the format of

```
ReplayEventJSON\nCompressedReplayRecording
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this a newline-separated string, could we also send this as a JSON like this:

{
  event: ReplayEventJSON,
  recording: CompressedReplayRecording
}

Not sure about implications for parsing this, but from an SDK perspective this seems a bit nicer/safer than string-concatting this. But not a hard feeling, just a thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a wait, I guess we can't put the compressed stuff into JSON 🤔 but now that I think about this, not sure if we can concat the UInt8Array that is the compressed payload onto a string as well 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this merged type is generated on the client then you will need to serialize with a binary format like msgpack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I see. I guess we can do that, convert the payload into UInt8Array if it isn't compressed, then combine them together into a single UInt8Array.

One downside to note here is that it may make debugging harder - I currently tend to disable compression when debugging issues, in order to see what data is sent. This would not be possible anymore (as the payload would then always be UInt8Array). Not a blocker, just to be aware of this downside!

@JoshFerge
Copy link
Member Author

JoshFerge commented May 25, 2023

update: in the RFC I proposed a combined format using a newline. I'd like to propose an alternative format of which we transmit it on the backend.

The Replays Recording Consumer expects MessagePack bytes in the payload field. I propose that in our processor, we combine the ReplayEvent and ReplayRecording into a single messagepack payload, and change the content type to ContentType::MessagePack.

The messagepack payload will be a key value object like:

{
  replay_event: {...}
  replay_recording: _binary_data_ or uncompressed data
  replay_recording_headers {}
}

In our recordings consumer, we'll parse the message pack which will then give us a dict with these values.

Depending on whether or not we can specify the entire kafka message be compressed or not, we can set the replay recording to the uncompressed value, or as a byte array.

We can also take this opportunity to put the parsed recording headers in the item in relay. This will remove another complication downstream in the consumer.

We could either emit these events onto a new topic with a new consumer, or have the existing consumer use a field to know whether or not the payload is a combined one or not. I think it would be easy to add a field onto the kafka message, like

"combined_event_and_recording": True

I will re-evaulate whether or not its worth it to make SDK changes to combine the envelope client side, but server side I feel good about this solution.

@cmanallen
Copy link
Member

cmanallen commented May 26, 2023

@JoshFerge

I think it would be easy to add a field onto the kafka message, like

Agree. Though I think a version field is better. Since we don't have one currently we can conditionally check if the version field exists and if it does deserialize with the new msgpack format. In the future, we can increment the version number to handle new payload types.

@JoshFerge
Copy link
Member Author

@cmanallen great, i'll get started on this today.

Copy link
Member

@bruno-garcia bruno-garcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Let's do this! Left a note on future considerations regarding SDK changes but doesn't affect anything agreed here


### Replay Recording Consumer Changes

1. Look at the version of the payload to determine if its the new event, and if so, handle additional work for the ReplayEvent, and load the SDK contextual data into the ReplayRecording for events emitted from it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Look at the version of the payload to determine if its the new event, and if so, handle additional work for the ReplayEvent, and load the SDK contextual data into the ReplayRecording for events emitted from it.
1. Look at the version of the payload to determine if it's the new event, and if so, handle additional work for the ReplayEvent, and load the SDK contextual data into the ReplayRecording for events emitted from it.

These are always sent in the same envelope by the sdk. We'd like to combine these two envelope item types into one.

We'd first write a compatibility layer in relay that takes the old format and converts them into a single item type.
We'd then modify the sentry-sdk to send combined item types going forward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider compatibility with Self Hosted (and Single Tenant) before we change the SDK

@JoshFerge JoshFerge merged commit a1e97e7 into main Jun 9, 2023
2 checks passed
@JoshFerge JoshFerge deleted the rfc/combined-replay-envelope-item branch June 9, 2023 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants