rfc(decision): Mobile - Tracing Without Performance V2 #136

philipphofmann · 2024-06-04T08:45:42Z

This RFC aims to find a strategy to update the so traces don’t reference hundreds of unrelated
events.

kahest

Left some first thoughts. In general I don't see a lot of different options here - IMO it's mostly a combination of screen awareness, foreground/background handling and maybe some timeouts. So pretty much exactly your option 1

text/0136-mobile-tracing-without-performance-v-2.md

kahest · 2024-06-04T11:50:13Z

text/0136-mobile-tracing-without-performance-v-2.md

+1. For single-screen applications such as social networks, the lifetime of a trace could still be
+long, and multiple unrelated events could be mapped to one trace.


IIUC this is in line with the JS implementation, and something we can accept for now. In case we have clear pointers we need to address this (and how) we can iterate

A manual API to renew the traceId could help with these edge cases.

fyi, we have TracingUtils.startNewTrace in Java (should probably be exposed through the static API though)

text/0136-mobile-tracing-without-performance-v-2.md

kahest · 2024-06-04T12:23:03Z

text/0136-mobile-tracing-without-performance-v-2.md

+3. It doesn’t work well for declarative UI frameworks as Jetpack Compose and SwiftUI for which the
+SDKs can’t reliably automatically detect when apps load a new screen.


For JPC there might be ways to make the 80% work with some KCP magic I guess (@markushi @romtsn wdyt?). But SwiftUI is a closed box and we'll need to mitigate this by providing clear ways/instructions on where to e.g. wrap something with a SDK method and/or manual API usage

I think the easiest way to get started is by providing a wrapper ~SentryScreen() { <Composable> }, in a later version we could have a @SentryScreen method annotation which utilizes KCP to auto-wrap it.

bitsandfoxes · 2024-06-04T17:04:46Z

What about additionally tying it to foreground/background behaviour like sessions? It doesn't have to be only one rule, right? It could be a set of rules and conditions that lead to the creation of a new traceId.

Co-authored-by: Karl Heinz Struggl <[email protected]>

text/0136-mobile-tracing-without-performance-v-2.md

markushi · 2024-06-07T07:37:17Z

text/0136-mobile-tracing-without-performance-v-2.md

+3. It doesn’t work well for declarative UI frameworks as Jetpack Compose and SwiftUI for which the
+SDKs can’t reliably automatically detect when apps load a new screen.


I think the easiest way to get started is by providing a wrapper ~SentryScreen() { <Composable> }, in a later version we could have a @SentryScreen method annotation which utilizes KCP to auto-wrap it.

text/0136-mobile-tracing-without-performance-v-2.md

buenaflor · 2024-06-07T11:09:05Z

text/0136-mobile-tracing-without-performance-v-2.md

+### Pros <a name="option-1-pros"></a>
+
+1. Similar to [JavaScript]((https://github.com/getsentry/sentry-javascript/issues/11599)) updating
+it based on routes, so it should be easy to implement for React-Native.


for flutter at the very least we would need the user to use our SentryNavigatorObserver and then set a route name, e.g

navigatorObservers: [ SentryNavigatorObserver() ], MaterialPageRoute( settings: const RouteSettings(name: 'AutoCloseScreen'), builder: (context) => const AutoCloseScreen()), ),

Would the default behaviour remain as it is right now if a user didn't use the navigator observer on flutter?

What's the current default behavior in Flutter, @buenaflor?

There is similar issue with React Native. We don't have the screens/routes information without the performance instrumentation (ReactNavigation or ReactNativeNavigation).

We could update the auto instrumentation work without performance (without creating spans). Or get some signal of change from native.

Or having a public API to renew the traceId.

Co-authored-by: Markus Hintersteiner <[email protected]>

Co-authored-by: Roman Zavarnitsyn <[email protected]>

Lms24 · 2024-06-10T10:14:07Z

text/0136-mobile-tracing-without-performance-v-2.md

+3. It doesn’t work well for declarative UI frameworks as Jetpack Compose and SwiftUI for which the
+SDKs can’t reliably automatically detect when apps load a new screen.
+
+# Drawbacks


We were recently made aware that another implication of long-running traces is that it potentially increases transaction/span quota usage. This is because in JS we inherit the sampling decision for the trace in subsequent transactions.

For example:

Pageload transaction is sampled by rolling the dice

PropagationContext stores positive sampling decision

Interaction transaction is started

Interaction transaction is sampled because the propagation context already holds a positive sampling decision

Repeat for every started transaction until next pageload or navigation

So either we accept this and move on for now by continuing with this behaviour or we break trace consistency by again rolling the dice for new root spans/transactions.

@Lms24, please clarify why this wasn't a problem before. I don't understand how this proposed change here will cause this.

I might be missing a bit of context around how Tracing without Performance and the PropagationContext is implemented in mobile SDKs.

If the proposed change is purely intended for TwP scenarios and does not affect the overall trace lifetime in case a root span/transaction is started ("tracing with performance") I think we're good. That is because for TwP, we defer the sampling decision to the downstream service (i.e. send sentry-trace headers without a sampled flag).

In JS however, we changed the trace lifetime not just for TwP but in general, leading to scenarios like the one above. To illustrate further, why this is problematic, I'm gonna adjust the example a bit from above

Initial Pageload transaction is sampled by rolling the dice

PropagationContext stores positive sampling decision

application, still on the same page but after the pageload span ended, makes an http request to a downstream service and propagates the sentry-trace header with the positive sampling decision, forcing the downstream service to positively sample their transaction.

repeat 3 a lot of times (e.g. an application auto-refreshing some state every 5s) and you have a lot of sampled transactions because one initial transaction was sampled positively in the FE.

So even without an active transaction, we'd still propagate a forced sampled flag to downstream services.

Does this make sense?

Thanks for the detailed explanation. Yes, that makes sense, but I guess in the long run, you should have a roughly equal amount of transactions. It shouldn't matter if you roll the dice once for 10 transactions or every time for each transaction. If you roll the dice often enough, an equal amount of transactions should be captured.

Not necessarily unfortunately. This would only hold up if the sample rates on client and server were the same. If users have lower sample rates on the server, they would send significantly less server-side transactions with the previous implementation.

I tried verifying this with a small script: https://gist.github.com/Lms24/9a631295aef58cf22fb8f5307953335c

When a span starts, SDKs should use the traceID on the PropagationContext

We should say explicitly to use only the traceId, but not the sampling decision of the PropagationContext.
Regardless of client/server side sampling, i think we would break the sampling in general, as it doesn't apply to the PropagationContext. The sampler function is particularly problematic imho

stefanosiano · 2024-06-12T10:12:49Z

text/0136-mobile-tracing-without-performance-v-2.md

+mobile SDKs use for determining the end of a session, mobile SDKs renew the `traceId` of the
+PropagationContext. If the app stays in the background for shorter or equal to 30 seconds,
+mobile SDKs must not renew the `traceId` of the PropagationContext when the app moves again to
+the foreground. When a span starts, SDKs should use the traceID on the PropagationContext, but


Wouldn't this break transaction/spans sample rates?
The sampled state of the PropagationContext would overwrite the calculated sampling decision of the span done through tracesSampleRate - or even worse through tracesSampler

or are we just saying to use the traceId of the PropagationContext, retaining the usual sample decision?

continuing below

philipphofmann · 2024-06-13T13:48:55Z

The RFC is on hold for now. We found some problems with the suggested approach in JS and need to revisit our approach. After we decided on how to continue we are either going to reopen this RFC or open another one.

rfc(decision): Mobile - Tracing Without Performance V2

313a0a0

philipphofmann force-pushed the rfc/mobile-tracing-without-performance-v-2 branch from f06832d to 313a0a0 Compare June 4, 2024 08:45

philipphofmann added 2 commits June 4, 2024 11:19

first version

27c44c4

add cons

a68b563

philipphofmann requested a review from a team June 4, 2024 09:29

kahest reviewed Jun 4, 2024

View reviewed changes

philipphofmann and others added 2 commits June 5, 2024 15:06

Update text/0136-mobile-tracing-without-performance-v-2.md

e3be25a

Co-authored-by: Karl Heinz Struggl <[email protected]>

add session logic

3a70e7b

markushi reviewed Jun 7, 2024

View reviewed changes

romtsn reviewed Jun 7, 2024

View reviewed changes

text/0136-mobile-tracing-without-performance-v-2.md Outdated Show resolved Hide resolved

buenaflor reviewed Jun 7, 2024

View reviewed changes

philipphofmann and others added 4 commits June 10, 2024 08:21

Update text/0136-mobile-tracing-without-performance-v-2.md

9c252cc

Co-authored-by: Markus Hintersteiner <[email protected]>

Update text/0136-mobile-tracing-without-performance-v-2.md

fddeeaa

Co-authored-by: Roman Zavarnitsyn <[email protected]>

spans across screens

764196a

add route to the title

cd840eb

philipphofmann marked this pull request as ready for review June 10, 2024 06:38

Lms24 reviewed Jun 10, 2024

View reviewed changes

stefanosiano reviewed Jun 12, 2024

View reviewed changes

philipphofmann closed this Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc(decision): Mobile - Tracing Without Performance V2 #136

rfc(decision): Mobile - Tracing Without Performance V2 #136

philipphofmann commented Jun 4, 2024 •

edited

kahest left a comment

kahest Jun 4, 2024

krystofwoldrich Jun 10, 2024

romtsn Jun 10, 2024

kahest Jun 4, 2024

markushi Jun 7, 2024

bitsandfoxes commented Jun 4, 2024

markushi Jun 7, 2024

buenaflor Jun 7, 2024

philipphofmann Jun 10, 2024 •

edited

krystofwoldrich Jun 10, 2024 •

edited

Lms24 Jun 10, 2024 •

edited

philipphofmann Jun 10, 2024

Lms24 Jun 10, 2024 •

edited

philipphofmann Jun 12, 2024

Lms24 Jun 12, 2024

stefanosiano Jun 12, 2024

stefanosiano Jun 12, 2024 •

edited

stefanosiano Jun 12, 2024

stefanosiano Jun 12, 2024

philipphofmann commented Jun 13, 2024

		1. For single-screen applications such as social networks, the lifetime of a trace could still be
		long, and multiple unrelated events could be mapped to one trace.

		3. It doesn’t work well for declarative UI frameworks as Jetpack Compose and SwiftUI for which the
		SDKs can’t reliably automatically detect when apps load a new screen.

rfc(decision): Mobile - Tracing Without Performance V2 #136

rfc(decision): Mobile - Tracing Without Performance V2 #136

Conversation

philipphofmann commented Jun 4, 2024 • edited

kahest left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitsandfoxes commented Jun 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philipphofmann Jun 10, 2024 • edited

Choose a reason for hiding this comment

krystofwoldrich Jun 10, 2024 • edited

Choose a reason for hiding this comment

Lms24 Jun 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lms24 Jun 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanosiano Jun 12, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philipphofmann commented Jun 13, 2024

philipphofmann commented Jun 4, 2024 •

edited

philipphofmann Jun 10, 2024 •

edited

krystofwoldrich Jun 10, 2024 •

edited

Lms24 Jun 10, 2024 •

edited

Lms24 Jun 10, 2024 •

edited

stefanosiano Jun 12, 2024 •

edited