Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add SIGTERM support #3895

Merged
merged 9 commits into from
May 8, 2024
Merged

feat: Add SIGTERM support #3895

merged 9 commits into from
May 8, 2024

Conversation

naftaly
Copy link
Contributor

@naftaly naftaly commented Apr 24, 2024

📜 Description

Added support for catching the SIGTERM signal.

Screenshot 2024-04-24 at 12 45 08 PM Screenshot 2024-04-24 at 12 45 16 PM

💡 Motivation and Context

All Apple OS's send SIGTERM (like any good unix based system) when they want to request a graceful termination of an application, when the app doesn't quit in those circumstances, it'll receive a SIGKILL. This often gives us the opportunity to get a stack trace where we know Apple wants the app to be terminated but don't know why (watchdog events). This is just one more piece of information that can be added to the puzzle of figuring out unexplained app terminations.

💚 How did you test it?

I ran the tests as well as Tested on a real app.

📝 Checklist

You have to check all boxes before merging:

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

Copy link
Member

@armcknight armcknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @naftaly! I think it's fine if we leave in the SIGTRAP logic, it looks like that was a bug that has since been fixed upstream. Would you mind making that small change?

Copy link

codecov bot commented Apr 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.604%. Comparing base (1267cb0) to head (4932e35).

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##              main     #3895       +/-   ##
=============================================
- Coverage   90.815%   90.604%   -0.211%     
=============================================
  Files          590       589        -1     
  Lines        45946     45854       -92     
  Branches     16380     16273      -107     
=============================================
- Hits         41726     41546      -180     
- Misses        4040      4217      +177     
+ Partials       180        91       -89     
Files Coverage Δ
Sources/SentryCrash/Recording/SentryCrashDoctor.m 55.597% <100.000%> (+2.012%) ⬆️
...entryCrash/Recording/Tools/SentryCrashSignalInfo.c 100.000% <ø> (ø)
...ntryTests/SentryCrash/SentryCrashDoctorTests.swift 100.000% <100.000%> (ø)

... and 42 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1267cb0...4932e35. Read the comment docs.

Copy link
Member

@armcknight armcknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me, we might just want to push this diff up in one of our own branches to get all the other tests running.

@naftaly
Copy link
Contributor Author

naftaly commented May 6, 2024

@armcknight is there anything you need me to do to get this on in?

@armcknight armcknight mentioned this pull request May 6, 2024
@armcknight
Copy link
Member

I'm just running the other tests really quick in #3946, and then we can merge this! Thanks @naftaly for your patience.

Copy link
Member

@armcknight armcknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the results of the tests in #3946! 🙏🏻

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some missing tests. Apart from that LGTM. Thanks for doing this @naftaly 💯

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the test @naftaly 👏

@philipphofmann philipphofmann changed the title Add SIGTERM support feat: Add SIGTERM support May 8, 2024
@philipphofmann philipphofmann merged commit ad404de into getsentry:main May 8, 2024
63 of 68 checks passed
philipphofmann added a commit that referenced this pull request May 10, 2024
Add support for catching sigterm signals.

Co-authored-by: Philipp Hofmann <[email protected]>
threema-matteo pushed a commit to threema-ch/sentry-cocoa that referenced this pull request May 21, 2024
Add support for catching sigterm signals.

Co-authored-by: Philipp Hofmann <[email protected]>
@eric
Copy link

eric commented May 21, 2024

What is this telling me when I get this on iOS and tvOS? Is this a good or a bad thing?

What would Sentry have been reporting otherwise? The Watchdog event?

@naftaly
Copy link
Contributor Author

naftaly commented May 21, 2024

What is this telling me when I get this on iOS and tvOS? Is this a good or a bad thing?

It’s the OS trying to tell you that it wants apps to exit gracefully. I’ve seen it happen in many different scenarios. Some of the interesting ones are when the device is shutdown, when an app will be terminated to update it and when something changes in traits that might require the OS to restart apps.

What’s interesting is that previously this would all go unnoticed to us the developers. If these happen when the user can see them, or perceive them, then we have an issue that need to be resolved. If they happen in the background, then we just hope that state restauration is well implemented and users don’t notice it.

What would Sentry have been reporting otherwise? The Watchdog event?

Likely nothing, it would have gone unnoticed.

@eric
Copy link

eric commented May 21, 2024

We've always had a large number of WatchdogTermination events that didn't correspond to memory issues. Could it have been that these were actually receiving SIGTERM and it was not handled/reported and thus turned into the WatchdogTermination event?

@naftaly
Copy link
Contributor Author

naftaly commented May 21, 2024

We've always had a large number of WatchdogTermination events that didn't correspond to memory issues. Could it have been that these were actually receiving SIGTERM and it was not handled/reported and thus turned into the WatchdogTermination event?

I guess it’s possible since the Watchdog code in Sentry (if I’m not mistaken) simply returns at the end of its heuristics and calls it an OOM/Watchdog event. So if the actual signal isn’t caught, then that might be the default.

FWIW, I’ve also proposed a PR to add real OOM recognition so that could help with other false positives. It might end up in KSCrash first though which is what Sentry uses for crash reporting.

@naftaly
Copy link
Contributor Author

naftaly commented May 21, 2024

@eric if you’re getting SIGTERM’s now, it would be interesting to know what type of event is going down (if any). That would help understand exactly what is happening or will happen for your app.

@eric
Copy link

eric commented May 21, 2024

I just received the first SIGTERM on our Testflight beta today, and so wasn't sure what to do with it... it didn't really feel actionable.

@eric
Copy link

eric commented May 21, 2024

screenshot-2024-05-21-15 55 38@2x

We do currently send memory usage with all of our breadcrumbs to try to give us a better idea of where we have actual memory issues. It always seemed silly to me that Sentry couldn't at least sample the memory usage once per second and store it somewhere to give a more informed message related to the Watchdog events.

@naftaly
Copy link
Contributor Author

naftaly commented May 21, 2024

I just received the first SIGTERM on our Testflight beta today, and so wasn't sure what to do with it... it didn't really feel actionable.

If it’s background and it feels unactionable, then I would not prioritize it, but still keep in mind it’s there and happening. If it’s foreground, then I usually start adding breadcrumbs to make it actionable if the stack isn’t enough to give you an idea of why it is going on. These issues are definitely going to harder than your run of the mill exception since these happen based on what happened in the past vs. A mistake that might have happened at that point in time. If you can share the stack (of all threads) then I would be happy to take a look and see if I spot anything.

@eric
Copy link

eric commented May 21, 2024

It doesn't seem like much of anything is happening here: https://fancy-bits-llc.sentry.io/issues/5386894502/

What is the signal that is sent when the OS decides that it's time for your app to stop running in the background when it needs to get more memory? Is SIGTERM ever going to be sent during normal operation?

@naftaly
Copy link
Contributor Author

naftaly commented May 21, 2024

It doesn't seem like much of anything is happening here: https://fancy-bits-llc.sentry.io/issues/5386894502/

I don’t have access. Maybe copy it to a gist or something.

What is the signal that is sent when the OS decides that it's time for your app to stop running in the background when it needs to get more memory?

That would be SIGKILL, we can’t catch it. Sometimes, if we’re really lucky, a SIGTERM comes before it, but I’ve not seen it often.

Is SIGTERM ever going to be sent during normal operation?

Depends what normal operation is, but yes, it can happen at any time. It’s a vestige of unix where it’s meant to tell apps to do their cleanup and exit otherwise be “exited”.

@philipphofmann
Copy link
Member

We do currently send memory usage with all of our breadcrumbs to try to give us a better idea of where we have actual memory issues.

We have that planned here: #3518 (comment), but we haven't gotten to it yet, sorry.

It always seemed silly to me that Sentry couldn't at least sample the memory usage once per second and store it somewhere to give a more informed message related to the Watchdog events.

Yes, that makes sense. I created an issue. Thanks for the input 👏 , @eric. #4003

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants