Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(replay): combined envelope items #2170

Closed
wants to merge 15 commits into from

Conversation

JoshFerge
Copy link
Member

@JoshFerge JoshFerge commented May 31, 2023

Implements the Combined Replay Envelope item as specified here: getsentry/rfcs#82

  • New ItemType
  • New processing function that takes the two items, combines them in a MsgPack, and re-inserts the item into the envelope
    • This just uses msgpack keys and bytes as the values. Attempting to serialize the nested key/values of the messages proved problematic as Annotated cannot be easily serialized to messagepack.
  • New logic in store to handle this new ItemType. It will emit to a topic already in use.

Other Changes:

  • ReplayEvent can no longer "create an event" as this was putting it in a separate envelope for processing. The Envelope containing a replay should always contain an event_id, so this shouldn't be a problem.

Open Questions:

I'm not quite sure how to think about processing vs non-processing relays and this feature. This is something I could use ingests help on, and I have a commented out test for confirming this behavior once we know what we want. e.g.

  • Should this only run in processing mode? If not, is there anything else we need to add here?

@JoshFerge JoshFerge force-pushed the jferg/combine-replay-env-items branch from 98494cf to e775838 Compare May 31, 2023 22:53
Copy link
Member

@cmanallen cmanallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you explicitly asked for no review which is why this isn't a review! It just looks like one :P

These are just general comments from looking at the code. No questions. No calls for action. Do your thing and let me know if you want me to help at all.

relay-server/src/actors/store.rs Show resolved Hide resolved
relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved
@bruno-garcia
Copy link
Member

Taking note that we'll rely on this change to have the data we need in order to create Dead Click events:

@JoshFerge JoshFerge marked this pull request as ready for review June 7, 2023 01:22
@JoshFerge JoshFerge requested a review from a team June 7, 2023 01:22
@iker-barriocanal
Copy link
Contributor

Hi @JoshFerge! I see getsentry/rfcs#82 is not merged yet so I assume there's still work pending, such as this question. Could we first agree on the RFC before jumping into the implementation?

@bruno-garcia
Copy link
Member

bruno-garcia commented Jun 7, 2023

Hi @JoshFerge! I see getsentry/rfcs#82 is not merged yet so I assume there's still work pending, such as this question. Could we first agree on the RFC before jumping into the implementation?

Iker the comment is about the sdk change and won't affect this PR. Can we get a hand here to help clarify Josh's questions?
We'll merge the rfc in parallel.

Specifically:

Open Questions:

I'm not quite sure how to think about processing vs non-processing relays and this feature. This is something I could use ingests help on, and I have a commented out test for confirming this behavior once we know what we want. e.g.

Should this only run in processing mode? If not, is there anything else we need to add here?

@cmanallen
Copy link
Member

@JoshFerge Updated the pull. We're now splitting the headers and the body. I've also added error handling around serialization failure. There are comments which should explain the intent of the changes. My thought is if the envelope combining process fails we should default to the old style (e.g. two, un-merged envelope items).

@JoshFerge
Copy link
Member Author

Hi @JoshFerge! I see getsentry/rfcs#82 is not merged yet so I assume there's still work pending, such as this question. Could we first agree on the RFC before jumping into the implementation?

Sorry, I wasnt sure how to update RFCs based off of a discussion, but see this comment from me last week, guiding this implementation: getsentry/rfcs#82 (comment). i'll go ahead and update the main Doc.

Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About processing vs non-processing relays (docs), the general approach should be to not target processing actions to processing relays unless there's a reason not to. Non-processing relays include our internal PoP Relays, but also external relays run by customers (including sidecars).

I have only skimmed through the PR, leaving a few questions.

Comment on lines +66 to +68
// The Combined Replay Envelope isn't generated on the client so its size does not need
// to be checked.
ItemType::CombinedReplayEventAndRecording => (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if envelopes are generated in Relays, size requirements may vary and we don't have control over external relays, which could be running an older version and sending bigger envelopes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, will add a size here.

@@ -743,13 +752,14 @@ impl Item {
ItemType::RawSecurity => true,
ItemType::UnrealReport => true,
ItemType::UserReport => true,
ItemType::ReplayEvent => true,
ItemType::ReplayEvent => false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplayEvent can no longer "create an event" as this was putting it in a separate envelope for processing. The Envelope containing a replay should always contain an event_id, so this shouldn't be a problem.

Should we expect older SDKs to continue sending ReplayEvents? This change seems like a breaking change for users not upgrading their SDKs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is a breaking change, as ReplayEvents and ReplayRecordings are always in the same envelope, which has an event_id envelope header.

relay-dynamic-config/src/feature.rs Show resolved Hide resolved
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on whether or not Relay should be able to handle combined items coming from SDKs:

We'd then modify the sentry-sdk to send combined item types going forward

(rfc)

Envelopes will always arrive in an uncombined state.

Primary concern is: in this PR's implementation combined items will bypass data scrubbing, because they are not considered by process_replays. So if we expect SDKs to send combined items, we have to split and re-join these combined items in relay to do the scrubbing.

Should this only run in processing mode? If not, is there anything else we need to add here?

I think it makes sense to run this only in processing mode for now, for two reasons:

  1. In general, it is easier to iterate on experimental changes if they are not performed by external relays, which might run outdated versions.
  2. More specifically, if we change the data scrubbing logic for replays, items combined by outdated external relays would bypass this updated logic (external relays play a role similar to SDKs in this regard).

To do so, wrap the call to process_replays_combined_item in a if_processing!({ condition.

relay-server/src/envelope.rs Show resolved Hide resolved
fn process_replays_combine_items(
&self,
state: &mut ProcessEnvelopeState,
) -> Result<(), ProcessingError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function does not fail, no need to return a Result.

Suggested change
) -> Result<(), ProcessingError> {
)

relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved
relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved
relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved
relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved
relay-dynamic-config/src/feature.rs Show resolved Hide resolved
Comment on lines 1300 to 1302
let (headers, body) =
split_headers_from_body(replay_recording_item.payload().into())
.expect("This is pre-validated and should not fail.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of performing this split twice (once in process_replays and once here), I would unite the two functions and serialize either back into a recording item or into a message pack item:

flowchart TD
    A[event, recording] -->|split| B[event, header, body]
    B -->|scrub| C[event, header, body]
    C --> D{combine?}
    D -->|yes| E[MsgPack: event, header, body]
    D -->|no| F[event, recording]
Loading

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbayer Whats the best way to emit outcomes for dropped items? The only way I've found to do it is with retain_items but it seems like that method won't work for this use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could manually call the function that retain_items calls:

fn track_outcome(&self, outcome: Outcome, category: DataCategory, quantity: usize) {
self.outcome_aggregator.send(TrackOutcome {
timestamp: self.received_at(),
scoping: self.context.scoping,
outcome,
event_id: self.envelope.event_id(),
remote_addr: self.meta().remote_addr(),
category,
// Quantities are usually `usize` which lets us go all the way to 64-bit on our
// machines, but the protocol and data store can only do 32-bit.
quantity: quantity as u32,
});
}

Copy link
Member

@cmanallen cmanallen Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbayer So having done this change (e2782b8) I feel it was not worth it. I'd rather eat the cost of reading <20 bytes. In the original code, there may have been some memory concerns where we drop the merged value in exchange for the split values. That could have had implications (since we're doing it twice) but I leave it to you to decide if its worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbayer I am out for a little while. Josh Ferge will be re-taking over the PR so please direct feedback items to him. Thank you.

@cmanallen
Copy link
Member

cmanallen commented Jun 12, 2023

So if we expect SDKs to send combined items, we have to split and re-join these combined items in relay to do the scrubbing.

Is this true? We call process_replays prior to process_replays_combine_items. process_replays should mutate the envelope items prior to combination?

@cmanallen Prior to combination, yes. But if the SDK sends a CombinedReplayEventAndRecording, process_replays will ignore it because it is neither ReplayEvent nor ReplayRecording, right?

@cmanallen
Copy link
Member

Prior to combination, yes. But if the SDK sends a CombinedReplayEventAndRecording, process_replays will ignore it because it is neither ReplayEvent nor ReplayRecording, right?

@jjbayer The SDK won't send that type so no worries there.

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So having done this change (e2782b8) I feel it was not worth it. I'd rather eat the cost of reading <20 bytes.

@cmanallen I see your point that it does not improve readability, even though the data flow looks cleaner to me. Feel free to revert that change, as long as we don't .unwrap() the second split result.

Still pressing "Request changes" because it looks like process_replays and process_replays_combine_items contain duplicate code right now (e.g. process_recording).


// If combined envelope processing is not enabled default to the original workflow.
if !project_state.has_feature(Feature::SessionReplayCombinedEnvelopeItems) {
return self.process_replays(state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're calling process_replays twice in this case? Should we remove the call to process_replays here:

self.process_replays(state)?;
self.process_replays_combine_items(state)?;

@TBS1996
Copy link
Contributor

TBS1996 commented Aug 16, 2023

@JoshFerge Do you still want to work on this? If not we will close it.

@cmanallen
Copy link
Member

@TBS1996 Sorry we've let this sit for a little bit. Josh and I will sync and see if we need to pursue this still. We'll close it out if so otherwise we'll get it back up to date.

@jan-auer
Copy link
Member

Please feel free to reopen this, once you plan to pursue it. Thanks!

@jan-auer jan-auer closed this Sep 15, 2023
@jan-auer jan-auer deleted the jferg/combine-replay-env-items branch September 15, 2023 11:00
cmanallen added a commit that referenced this pull request Feb 15, 2024
Reopening #2170

- We originally closed it because there wasn't a clear use case. Now
there is. getsentry/sentry#60826


I've resolved merge conflicts, tests pass, and I believe comments from
the previous PR have been addressed.

---------

Co-authored-by: Joris Bayer <[email protected]>
Co-authored-by: Colton Allen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants