Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize native crash monitoring in a background thread #540

Merged
merged 4 commits into from
Mar 14, 2024

Conversation

bidetofevil
Copy link
Collaborator

@bidetofevil bidetofevil commented Mar 11, 2024

Goal

Initialize crash monitoring on a background thread. This turns out to be super expensive, so doing it on a background thread, while delaying the signal handler install by ~500ms, dramatically improves SDK startup.

Given this is a bit controversial, I'm putting this up now, but we can talk about whether this is worth it. I personally think so.

@bidetofevil bidetofevil force-pushed the hho/init-native-cash-monitor-async branch from 47b39aa to d37085b Compare March 11, 2024 06:54
@bidetofevil bidetofevil marked this pull request as ready for review March 11, 2024 07:36
@bidetofevil bidetofevil requested a review from a team as a code owner March 11, 2024 07:36
Copy link

codecov bot commented Mar 11, 2024

Codecov Report

Attention: Patch coverage is 69.44444% with 22 lines in your changes are missing coverage. Please review.

Project coverage is 80.12%. Comparing base (8210d0d) to head (0c8dd3a).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #540      +/-   ##
==========================================
- Coverage   80.20%   80.12%   -0.08%     
==========================================
  Files         413      413              
  Lines       11000    11041      +41     
  Branches     1618     1624       +6     
==========================================
+ Hits         8822     8847      +25     
- Misses       1532     1543      +11     
- Partials      646      651       +5     
Files Coverage Δ
...roid/embracesdk/config/behavior/SdkModeBehavior.kt 80.95% <100.00%> (+2.00%) ⬆️
...e/android/embracesdk/config/remote/RemoteConfig.kt 100.00% <100.00%> (ø)
.../io/embrace/android/embracesdk/ndk/NativeModule.kt 100.00% <100.00%> (ø)
...dk/session/orchestrator/SessionOrchestratorImpl.kt 93.69% <100.00%> (ø)
...ce/android/embracesdk/worker/WorkerThreadModule.kt 100.00% <100.00%> (ø)
...oid/embracesdk/injection/ModuleInitBootstrapper.kt 94.59% <94.11%> (-0.06%) ⬇️
...mbrace/android/embracesdk/ndk/EmbraceNdkService.kt 49.04% <86.95%> (+0.48%) ⬆️
...bracesdk/capture/startup/AppStartupTraceEmitter.kt 73.61% <28.00%> (-6.80%) ⬇️

... and 1 file with indirect coverage changes

Copy link
Contributor

@fractalwrench fractalwrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is something we can ship by default. Perhaps it could be included behind an opt-in flag if a customer explicitly wants to make these trade-offs, but I think there's going to be an expectation from most customers that Embrace should capture crashes as soon as possible in the lifecycle.

I'm also pretty concerned about installing on a background thread as we know from ANR monitoring that the amount of CPU time they get can be pretty low in thread-heavy environments.

Base automatically changed from hho/defer-api-service-load to master March 11, 2024 15:15
Copy link
Collaborator Author

Chatted with Jamie off line - I'll write a doc to make the case and likely put this behind a local feature flag

Copy link
Collaborator Author

Added the ability to attach arbitrary intervals to the startup trace which we will use to track the time it takes for a scheduled background job to run when the NDK service is initialized, in addition to the amount of time app startup waits for the current background spans service init to finish.

Copy link
Collaborator Author

Also put the deferral behind a feature flag that doesn't exist yet that will default to false. This ensures this feature will not be enabled in production unless the feature flag is enabled.

@bidetofevil bidetofevil force-pushed the hho/init-native-cash-monitor-async branch from 4c57400 to cf24beb Compare March 14, 2024 05:09
@@ -320,8 +328,18 @@ internal class ModuleInitBootstrapper(
}

postInit(NativeModule::class) {
val ndkService = nativeModule.ndkService
val initWorkerTaskQueueTime = initModule.clock.now()
workerThreadModule.backgroundWorker(WorkerName.SERVICE_INIT).submit {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schedule this to run right after the NDK service initializes, hoping to approximate how a background task on this thread will be scheduled in prod. Looking at the difference between the queue and run times should tell us what to expect

val initWorkerTaskQueueTime = initModule.clock.now()
workerThreadModule.backgroundWorker(WorkerName.SERVICE_INIT).submit {
val taskRuntimeMs = initModule.clock.now()
dataCaptureServiceModule.appStartupTraceEmitter.addTrackedInterval(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will log this interval as a child span of the currently-private startup trace

@@ -55,53 +59,12 @@ import java.util.concurrent.ExecutorService
internal class EmbraceNdkServiceTest {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not resetting the mocks in this class made it so that certain settings were dependent on state set outside of the test, which isn't great. Rewrote this so things are reset at each iteration so tests will also start from a clean state so will run deterministically irrespective of execution order.

appFramework = Embrace.AppFramework.UNITY
initializeService()
assertTrue(activityService.listeners.contains(embraceNdkService))
assertEquals(1, activityService.listeners.size)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making embraceNdkService a spy means the object references won't match. Changed to verify that a listener exists

@@ -499,13 +512,12 @@ internal class EmbraceNdkServiceTest {

appFramework = Embrace.AppFramework.UNITY
initializeService()
val mockedNdkService = spyk(embraceNdkService, recordPrivateCalls = true)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous test is a bit problematic as the invocation could be on either the mock or the actual object the mock wraps. Changing the only reference to the mock removes that potential gap.

every { sharedObjectLoader.loadEmbraceNative() } returns false
every { sharedObjectLoader.loadEmbraceNative() } returns true
initializeService()
verify(exactly = 0) { embraceNdkService["installSignals"]() }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation of this test didn't actually verify that no calls were made to the mock because they were invoked before the mock was even made, so it was always passing. irrespective what the object instantiation did.

Changing this up to actually verified these aren't called.

enableNdk(true)
blockableExecutorService.blockingMode = true
initializeService()
assertNativeSignalHandlerInstalled()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifying the method call on the delegate because there's no good way to inspect the side effects otherwise. Ah well.

@bidetofevil bidetofevil force-pushed the hho/init-native-cash-monitor-async branch from cf24beb to 37473db Compare March 14, 2024 05:24
@bidetofevil bidetofevil force-pushed the hho/init-native-cash-monitor-async branch from 37473db to 0c8dd3a Compare March 14, 2024 05:27
@fractalwrench fractalwrench self-requested a review March 14, 2024 06:13
Copy link
Contributor

@fractalwrench fractalwrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator Author

bidetofevil commented Mar 14, 2024

Merge activity

@bidetofevil bidetofevil merged commit aa3be55 into master Mar 14, 2024
5 checks passed
@bidetofevil bidetofevil deleted the hho/init-native-cash-monitor-async branch March 14, 2024 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants