Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mac: Produce coredumps on segfault #46157

Merged
merged 1 commit into from
Jul 25, 2022
Merged

mac: Produce coredumps on segfault #46157

merged 1 commit into from
Jul 25, 2022

Conversation

Keno
Copy link
Member

@Keno Keno commented Jul 24, 2022

This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug #46152.

This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug #46152.
@DilumAluthge
Copy link
Member

Want to stick a ccall(:raise, Cint, (Cint,), 11) somewhere in one of the test sets? (We'll remove it before merging this PR.)

@Keno
Copy link
Member Author

Keno commented Jul 24, 2022

I've verified that this produces core dumps locally. That doesn't mean the buildkite version works, but I'd rather merge this and then do that experiment in a separate PR.

@DilumAluthge
Copy link
Member

FWIW, it seems like this doesn't currently work on Buildkite, because there are no artifacts uploaded for https://buildkite.com/julialang/julia-master/builds/14263#018232a5-5eb0-4f3e-adb3-2a1ec111f781.

But I agree that let's first merge this PR, and then work separately to get this working on Buildkite.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 25, 2022

Will this break the default stacktrace printer?

@DilumAluthge
Copy link
Member

I've verified that this produces core dumps locally.

@Keno:

  1. On your machine, what is the output of sysctl -n kern.corefile?
  2. In what directory were the coredump files created?

@DilumAluthge DilumAluthge added the system:mac Affects only macOS label Jul 25, 2022
@Keno
Copy link
Member Author

Keno commented Jul 25, 2022

Will this break the default stacktrace printer?

No. The POSIX signal handling will print the stack trace:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.0-DEV.1040 (2022-07-23)
 _/ |\__'_|_|_|\__'_|  |  Commit cee90db76a* (1 day old master)
|__/                   |

julia> unsafe_load(Ptr{Cint}(C_NULL))

[13573] signal (11): Segmentation fault: 11
in expression starting at REPL[1]:1
unsafe_load at ./pointer.jl:110 [inlined]
unsafe_load at ./pointer.jl:110
unknown function (ip: 0x10099001f)
Allocations: 2875 (Pool: 2864; Big: 11); GC: 0
zsh: segmentation fault (core dumped)  ~/julia/julia

@Keno
Copy link
Member Author

Keno commented Jul 25, 2022

  • On your machine, what is the output of sysctl -n kern.corefile?
% sysctl -n kern.corefile
/cores/core.%P
  • In what directory were the coredump files created?

/cores

@Keno Keno merged commit 1addb84 into master Jul 25, 2022
@Keno Keno deleted the kf/segvcoredump branch July 25, 2022 19:45
Keno added a commit that referenced this pull request Jul 31, 2022
Similar in spirit to #46157. We'd like to collect crashdumps
when julia crashes on CI, but currently we just cleanly exit
the pocess in this case. By continuing exception handling into
the global crash handler, the OS's crash reporter gets invoked
and can dump out a minidump for us. In the future, we may want
to add our own crash reported, but this should hopefully help
debug crashes for the moment.
DilumAluthge pushed a commit that referenced this pull request Jul 31, 2022
Similar in spirit to #46157. We'd like to collect crashdumps
when julia crashes on CI, but currently we just cleanly exit
the pocess in this case. By continuing exception handling into
the global crash handler, the OS's crash reporter gets invoked
and can dump out a minidump for us. In the future, we may want
to add our own crash reported, but this should hopefully help
debug crashes for the moment.
ffucci pushed a commit to ffucci/julia that referenced this pull request Aug 11, 2022
This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug JuliaLang#46152.
pcjentsch pushed a commit to pcjentsch/julia that referenced this pull request Aug 18, 2022
This changes the mach exception server to ignore fatal SIGSEGVs,
letting regular kernel processing handle it (by performing POSIX
signal delivery and then subsequently coredumping), rather than
quitting the process directly. There's probably some way to
induce the kernel to perform core dumping directly from the
exception server, but I think it'll be less confusing all around
to just have segfaults take the standard path.

Hoping this will help debug JuliaLang#46152.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:mac Affects only macOS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants