Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/traceback: segmentation violation failures from unwinding crash #64030

Open
vlasisPit opened this issue Nov 9, 2023 · 9 comments
Open
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@vlasisPit
Copy link

vlasisPit commented Nov 9, 2023

What version of Go are you using (go version)?

$ go version
go version go1.21.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/rest/.cache/go-build'
GOENV='/home/rest/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2287139495=/tmp/go-build -gno-record-gcc-switches'	

What did you do?

We build a Go application which also include some CGO requests and very frequently we see SIGSEGV: segmentation violation errors produced by an unwinding crash. Unfortunately, we are not able to reproduce the crash but the crashes occurred often without executing any specific request to produce this error. The crashes are happening frequently (more than 100 on a single day). We have checked the core dumps produced, but nothing there to indicate what causes the issue. Also, from core dumps, we see that there are no CGO requests during the time of the crash.

We see the following stacktrace

SIGSEGV: segmentation violation  
PC=0x46df85 m=22 sigcode=1	

goroutine 0 [idle]:	
runtime.(*unwinder).next(0x7fa986bfc060)	
	runtime/traceback.go:463 +0x105 fp=0x7fa986bfbeb0 sp=0x7fa986bfbe38 pc=0x46df85	
runtime.scanstack(0xc0013341a0, 0x100000081?)	
	runtime/mgcmark.go:802 +0x272 fp=0x7fa986bfc1e8 sp=0x7fa986bfbeb0 pc=0x42e392	
runtime.markroot.func1()	
	runtime/mgcmark.go:240 +0xb5 fp=0x7fa986bfc238 sp=0x7fa986bfc1e8 pc=0x42d215
runtime.markroot(0xc00007a140, 0x3fe, 0x1)		
	runtime/mgcmark.go:214 +0x1a8 fp=0x7fa986bfc2e0 sp=0x7fa986bfc238 pc=0x42cea8	
runtime.gcDrain(0xc00007a140, 0x7)	
	runtime/mgcmark.go:1069 +0x37d fp=0x7fa986bfc340 sp=0x7fa986bfc2e0 pc=0x42edfd	
runtime.gcBgMarkWorker.func2()	
	runtime/mgc.go:1385 +0x6f fp=0x7fa986bfc390 sp=0x7fa986bfc340 pc=0x42b52f		
traceback: unexpected SPWRITE function runtime.systemstack		
runtime.systemstack()	
	runtime/asm_amd64.s:509 +0x4a fp=0x7fa986bfc3a0 sp=0x7fa986bfc390 pc=0x47bc4a	

We also downgrade golang version to go1.21.0 linux/amd64, but we see the same behaviour (same crashes), with a bit different stacktrace now.

SIGSEGV: segmentation violation
PC=0x46dea5 m=23 sigcode=1	

goroutine 0 [idle]:	
runtime.(*unwinder).next(0x7f89011f3060)	
	runtime/traceback.go:453 +0x105 fp=0x7f89011f2eb0 sp=0x7f89011f2e38 pc=0x46dea5	
runtime.scanstack(0xc0031d44e0, 0x2a?)	
	runtime/mgcmark.go:802 +0x272 fp=0x7f89011f31e8 sp=0x7f89011f2eb0 pc=0x42e272		
runtime.markroot.func1()	
	runtime/mgcmark.go:240 +0xb5 fp=0x7f89011f3238 sp=0x7f89011f31e8 pc=0x42d0f5	
runtime.markroot(0xc00006e640, 0x1bb, 0x1)	
	runtime/mgcmark.go:214 +0x1a8 fp=0x7f89011f32e0 sp=0x7f89011f3238 pc=0x42cd88	
runtime.gcDrain(0xc00006e640, 0x3)	
	runtime/mgcmark.go:1069 +0x37d fp=0x7f89011f3340 sp=0x7f89011f32e0 pc=0x42ecdd	
runtime.gcBgMarkWorker.func2()	
	runtime/mgc.go:1366 +0xa5 fp=0x7f89011f3390 sp=0x7f89011f3340 pc=0x42b445	

What did you expect to see?

No crashes

What did you see instead?

segmentation violation errors

@mauri870 mauri870 added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. compiler/runtime Issues related to the Go compiler and/or runtime. labels Nov 9, 2023
@mauri870
Copy link
Member

mauri870 commented Nov 9, 2023

I feel like gp or gp.m might be nil here which is causing the segfault, given that we don't check for nillness in (*unwinder).next

if doPrint && gp.m.incgo && f.funcID == abi.FuncID_sigpanic {

If this issue is related to the new unwinder you could try go 1.20.10 that does not have it.

Would be helpful to get some more insights on what your program does with cgo, to try and reproduce the issue.

/cc @golang/runtime

@prattmic
Copy link
Member

prattmic commented Nov 9, 2023

I agree it would be interesting to know if the same crashes occur with the old unwinder. Thanks @mauri870.

Also, if you have a core dump, could you report what the values are of everything in unwinder? https://cs.opensource.google/go/go/+/master:src/runtime/traceback.go;drc=1cc19e5ba0a008df7baeb78e076e43f9d8e0abf2;bpv=1;bpt=1;l=94

@cherrymui
Copy link
Member

Interesting. In both cases it is scanning a goroutine stack. If gp or gp.m is nil, things would be very wrong...

@ihnorton
Copy link

ihnorton commented Nov 9, 2023

could you report what the values are of everything in unwinder

(dlv) p u
(*runtime.unwinder)(0xc000f58fa0)
*runtime.unwinder {
        frame: runtime.stkframe {
                fn: (*runtime.funcInfo)(0xc000f58fa0),
                pc: 7432264,
                continpc: 7432264,
                lr: 281470681751457,
                sp: 824665366456,
                fp: 824665366496,
                varp: 824665366480,
                argp: 824665366496,},
        g: 824634780928,
        cgoCtxt: -1,
        calleeFuncID: FuncIDWrapper (21),
        flags: unwindPrintErrors (1),
        cache: runtime.pcvalueCache {
                entries: [2][8]runtime.pcvalueCacheEnt [
                        [
                                (*runtime.pcvalueCacheEnt)(0xc000f59000),
                                (*runtime.pcvalueCacheEnt)(0xc000f59010),
                                (*runtime.pcvalueCacheEnt)(0xc000f59020),
                                (*runtime.pcvalueCacheEnt)(0xc000f59030),
                                (*runtime.pcvalueCacheEnt)(0xc000f59040),
                                (*runtime.pcvalueCacheEnt)(0xc000f59050),
                                (*runtime.pcvalueCacheEnt)(0xc000f59060),
                                (*runtime.pcvalueCacheEnt)(0xc000f59070),
                        ],
                        [
                                (*runtime.pcvalueCacheEnt)(0xc000f59080),
                                (*runtime.pcvalueCacheEnt)(0xc000f59090),
                                (*runtime.pcvalueCacheEnt)(0xc000f590a0),
                                (*runtime.pcvalueCacheEnt)(0xc000f590b0),
                                (*runtime.pcvalueCacheEnt)(0xc000f590c0),
                                (*runtime.pcvalueCacheEnt)(0xc000f590d0),
                                (*runtime.pcvalueCacheEnt)(0xc000f590e0),
                                (*runtime.pcvalueCacheEnt)(0xc000f590f0),
                        ],
                ],},}

Corresponding backtrace (different core dump than the initial report):

(dlv) bt
 0  0x000000000046df85 in runtime.(*unwinder).next
    at runtime/traceback.go:461
 1  0x000000000046f825 in runtime.traceback2
    at runtime/traceback.go:1023
 2  0x000000000046f5e5 in runtime.traceback1.func1
    at runtime/traceback.go:932
 3  0x000000000046f452 in runtime.traceback1
    at runtime/traceback.go:947
 4  0x000000c000102b60 in ???
    at ?:-1
 5  0x0000000000470d65 in runtime.tracebackHexdump
    at runtime/traceback.go:1282
 6  0x0000000000470c5b in runtime.tracebackothers.func1
    at runtime/traceback.go:1241
 7  0x0000000000470c80 in runtime.traceback
    at runtime/traceback.go:813
 8  0x0000000000470c80 in runtime.tracebackothers.func1
    at runtime/traceback.go:1244
 9  0x000000c001dbe000 in ???
    at ?:-1
10  0x000000000045d66e in runtime.(*sigctxt).regs
    at runtime/signal_linux_amd64.go:20
11  0x000000000045d66e in runtime.(*sigctxt).rip
    at runtime/signal_linux_amd64.go:42
12  0x000000000045d66e in runtime.(*sigctxt).sigpc
    at runtime/signal_amd64.go:41
13  0x000000000045d66e in runtime.sigtrampgo
    at runtime/signal_unix.go:443
14  0x000000000046df85 in runtime.(*unwinder).next
    at runtime/traceback.go:461
15  0x000000000042e392 in runtime.(*stackScanState).buildIndex
    at runtime/mgcstack.go:305
16  0x000000000042e392 in runtime.scanstack
    at runtime/mgcmark.go:838
17  0x000000000042d215 in runtime.markrootBlock
    at runtime/mgcmark.go:264
18  0x000000c000102d00 in ???
    at ?:-1
19  0x000000000042cea8 in runtime.markroot
    at runtime/mgcmark.go:200
20  0x000000000042edfd in runtime.gcDrainN
    at runtime/mgcmark.go:1165
21  0x000000c0002376c0 in ???
    at ?:-1
22  0x000000000042b565 in runtime.gcMark
    at runtime/mgc.go:1451
23  0x000000c001dbe000 in ???
    at ?:-1
24  0x000000000042b1f2 in runtime.releasem
    at runtime/runtime1.go:581
25  0x000000000042b1f2 in runtime.gcBgMarkWorker
    at runtime/mgc.go:1422
26  0x000000000047dbc1 in runtime.gcWriteBarrier6
    at runtime/asm_amd64.s:1782

@ihnorton
Copy link

ihnorton commented Nov 9, 2023

Would be helpful to get some more insights on what your program does with cgo, to try and reproduce the issue.

The CGo usage is via the wrappers in this repository, (e.g.). Unfortunately, we don't have a reduced example yet.

We are currently testing the downgrade to 1.20.10 to see if that helps.

@vser1
Copy link

vser1 commented Nov 16, 2023

I encountered the exact same stack on go1.21.3 linux/amd64. I rolled back to go1.20.11 and got a similar segfault.
We might need to rollback to 1.19.5 which was our previous production version (from which we bumped to 1.21)

@mknyszek mknyszek added this to the Backlog milestone Nov 29, 2023
@vser1
Copy link

vser1 commented Jan 5, 2024

Also see #64781

@cherrymui
Copy link
Member

Is this still happening, with more recent version of Go? If so, could you share more information about the program, like, what stack does it scanning? Could you include a full crash log if possible? Thanks.

@shaj13
Copy link

shaj13 commented Jun 24, 2024

@cherrymui
Our application occasionally throws this panic after upgrading to Go version 1.22 (specifically built with go1.22.1 linux/amd64). Here is a snippet of the panic message and the backtrace from Delve (dlv), let me know what other information is needed.

PANIC

2024-06-19T17:08:59.101284384Z SIGSEGV: segmentation violation
2024-06-19T17:08:59.101305416Z PC=0x468865 m=5 sigcode=1 addr=0x118
2024-06-19T17:08:59.101308921Z 
2024-06-19T17:08:59.101313011Z goroutine 0 gp=0xc000103180 m=5 mp=0xc000100808 [idle]:
2024-06-19T17:08:59.101316543Z runtime.(*unwinder).next(0xc000157d28)
2024-06-19T17:08:59.101320330Z 	/usr/local/go/src/runtime/traceback.go:457 +0x105 fp=0xc000157ce8 sp=0xc000157c70 pc=0x468865
2024-06-19T17:08:59.101323626Z runtime.scanstack(0xc001225c00, 0xc000072168)
2024-06-19T17:08:59.101326778Z 	/usr/local/go/src/runtime/mgcmark.go:899 +0x271 fp=0xc000157e18 sp=0xc000157ce8 pc=0x423ed1
2024-06-19T17:08:59.101330416Z runtime.markroot.func1()
2024-06-19T17:08:59.101333574Z 	/usr/local/go/src/runtime/mgcmark.go:241 +0xb5 fp=0xc000157e68 sp=0xc000157e18 pc=0x422b95
2024-06-19T17:08:59.101336817Z runtime.markroot(0xc000072168, 0x3d6, 0x1)
2024-06-19T17:08:59.101340034Z 	/usr/local/go/src/runtime/mgcmark.go:215 +0x1a8 fp=0xc000157f10 sp=0xc000157e68 pc=0x422828
2024-06-19T17:08:59.101343201Z runtime.gcDrain(0xc000072168, 0x7)
2024-06-19T17:08:59.101346283Z 	/usr/local/go/src/runtime/mgcmark.go:1200 +0x3d4 fp=0xc000157f78 sp=0xc000157f10 pc=0x4249f4
2024-06-19T17:08:59.101349526Z runtime.gcDrainMarkWorkerIdle(...)
2024-06-19T17:08:59.101352636Z 	/usr/local/go/src/runtime/mgcmark.go:1114
2024-06-19T17:08:59.101355693Z runtime.gcBgMarkWorker.func2()
2024-06-19T17:08:59.101358906Z 	/usr/local/go/src/runtime/mgc.go:1406 +0x6f fp=0xc000157fc8 sp=0xc000157f78 pc=0x420e4f
2024-06-19T17:08:59.101362075Z runtime.systemstack(0x0)
2024-06-19T17:08:59.101365136Z 	/usr/local/go/src/runtime/asm_amd64.s:509 +0x4a fp=0xc000157fd8 sp=0xc000157fc8 pc=0x475e8a

BT

(dlv) bt
 0  0x0000000000468865 in runtime.(*unwinder).next
    at /snap/go/10630/src/runtime/traceback.go:513
 1  0x000000000046a145 in runtime.guintptr.ptr
    at /snap/go/10630/src/runtime/runtime2.go:266
 2  0x000000000046a145 in runtime.traceback2
    at /snap/go/10630/src/runtime/traceback.go:978
 3  0x0000000000469ee6 in runtime.traceback1.func1
    at /snap/go/10630/src/runtime/traceback.go:914
 4  0x0000000000469d4f in runtime.traceback1
    at /snap/go/10630/src/runtime/traceback.go:914
 5  0x000000000046b7c5 in runtime.tracebackothers.func1
    at /snap/go/10630/src/runtime/traceback.go:1255
 6  0x0000000000440ea9 in runtime.forEachGRace
    at /snap/go/10630/src/runtime/proc.go:673
 7  0x00000000000003b2 in ???
    at ?:-1
 8  0x000000000046b6bb in runtime.tracebackothers
    at /snap/go/10630/src/runtime/traceback.go:1245
 9  0x00000000004556a5 in runtime.sighandler
    at /snap/go/10630/src/runtime/signal_unix.go:743
10  0x0000000000454e4e in runtime.sigtrampgo
    at /snap/go/10630/src/runtime/signal_unix.go:482
11  0x0000000000468865 in runtime.(*unwinder).next
    at /snap/go/10630/src/runtime/traceback.go:512
12  0x0000000000423ed1 in runtime.scanstack
    at /snap/go/10630/src/runtime/mgcmark.go:839
13  0x0000000000422b95 in runtime.casGToWaiting
    at /snap/go/10630/src/runtime/proc.go:1207
Sending output to pager...
 0  0x0000000000468865 in runtime.(*unwinder).next
    at /snap/go/10630/src/runtime/traceback.go:513
 1  0x000000000046a145 in runtime.guintptr.ptr
    at /snap/go/10630/src/runtime/runtime2.go:266
 2  0x000000000046a145 in runtime.traceback2
    at /snap/go/10630/src/runtime/traceback.go:978
 3  0x0000000000469ee6 in runtime.traceback1.func1
    at /snap/go/10630/src/runtime/traceback.go:914
 4  0x0000000000469d4f in runtime.traceback1
    at /snap/go/10630/src/runtime/traceback.go:914
 5  0x000000000046b7c5 in runtime.tracebackothers.func1
    at /snap/go/10630/src/runtime/traceback.go:1255
 6  0x0000000000440ea9 in runtime.forEachGRace
    at /snap/go/10630/src/runtime/proc.go:673
 7  0x00000000000003b2 in ???
    at ?:-1
 8  0x000000000046b6bb in runtime.tracebackothers
    at /snap/go/10630/src/runtime/traceback.go:1245
 9  0x00000000004556a5 in runtime.sighandler
    at /snap/go/10630/src/runtime/signal_unix.go:743
10  0x0000000000454e4e in runtime.sigtrampgo
    at /snap/go/10630/src/runtime/signal_unix.go:482
11  0x0000000000468865 in runtime.(*unwinder).next
    at /snap/go/10630/src/runtime/traceback.go:512
12  0x0000000000423ed1 in runtime.scanstack
    at /snap/go/10630/src/runtime/mgcmark.go:839
13  0x0000000000422b95 in runtime.casGToWaiting
    at /snap/go/10630/src/runtime/proc.go:1207
14  0x0000000000422b95 in runtime.markroot.func1
    at /snap/go/10630/src/runtime/mgcmark.go:223
15  0x0000000000422828 in runtime.markroot
    at /snap/go/10630/src/runtime/mgcmark.go:215
16  0x00000000004249f4 in runtime.gcDrain
    at /snap/go/10630/src/runtime/mgcmark.go:1195
17  0x0000000000420e4f in runtime.casGToWaiting
    at /snap/go/10630/src/runtime/proc.go:1207
18  0x0000000000420e4f in runtime.gcBgMarkWorker.func2
    at /snap/go/10630/src/runtime/mgc.go:1382
19  0x0000000000475e8a in runtime.systemstack
    at /snap/go/10630/src/runtime/asm_amd64.s:509
20  0x0000000000475e28 in runtime.systemstack_switch
    at /snap/go/10630/src/runtime/asm_amd64.s:474
21  0x0000000000420b12 in runtime.nanotime
    at /snap/go/10630/src/runtime/time_nofake.go:19
22  0x0000000000420b12 in runtime.gcBgMarkWorker
    at /snap/go/10630/src/runtime/mgc.go:1357
23  0x0000000000477cc1 in runtime.goexit
    at /snap/go/10630/src/runtime/asm_amd64.s:1695

P U

(dlv) p u 
(*runtime.unwinder)(0xc00014ef90)
*runtime.unwinder {
	frame: runtime.stkframe {
		fn: (*runtime.funcInfo)(0xc00014ef90),
		pc: 4457134,
		continpc: 4457134,
		lr: 2164653622083375972,
		sp: 824713172096,
		fp: 824713172128,
		varp: 824713172112,
		argp: 824713172128,},
	g: 824652749824,
	cgoCtxt: -1,
	calleeFuncID: FuncIDNormal (0),
	flags: unwindPrintErrors (1),}

LIST

> runtime.(*unwinder).next() /snap/go/10630/src/runtime/traceback.go:513 (PC: 0x468865)
Warning: debugging optimized function
   508:				frame.lr = x
   509:			}
   510:		}
   511:	
   512:		u.resolveInternal(false, false)
=> 513:	}
   514:	
   515:	// finishInternal is an unwinder-internal helper called after the stack has been
   516:	// exhausted. It sets the unwinder to an invalid state and checks that it
   517:	// successfully unwound the entire stack.
   518:	func (u *unwinder) finishInternal() {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

8 participants