proposal: runtime: allow access to stopTheWorld/startTheWorld via go:linkname #68167

fjl · 2024-06-25T11:54:44Z

Proposal Details

My module github.com/fjl/memsize walks a data structure and computes its total referred memory size in bytes, as well as producing a report of per-type memory usage. This is meant to be used for debugging memory leaks in live programs in a similar manner to taking a profile over HTTP with pprof, and to my knowledge there exists no other comparable tool.

memsize requires access to runtime.stopTheWorld and runtime.startTheWorld in order to safely walk the object graph without racing with the program that is being debugged. Up until Go version 1.22, I was able to call these functions via go:linkname. Go 1.23 disallows this, and as per #67401 only specifically approved entry points can be linkname'd.

Please allow access to runtime.stopTheWorld and runtime.startTheWorld by adding a go:linkname directive for Go 1.23.

Alternatively, we could consider adding an exposed API, perhaps in package runtime/debug, to perform a similar task. I could envision something like:

func WithWorldStopped(reason string, f func())

Perhaps access to stopTheWorld/startTheWorld could be granted temporarily during the Go 1.23 cycle while we work out a replacement API.

xref fjl/memsize#4
xref #67401

The text was updated successfully, but these errors were encountered:

fjl · 2024-06-25T12:01:00Z

BTW, I wasn't sure which issue category was appropriate here. Originally went with Bug, but it didn't really fit either. Proposal seems kind of strongly-worded, I really just want to get my request in.

seankhliao · 2024-06-25T12:22:31Z

given it is for debugging use, I didn't see an issue with requiring the checklinkname flag during a build.

fjl · 2024-06-25T13:22:01Z

memsize is meant to be integrated into production executables, i.e. you typically set up the memsize web interface handler on the pprof endpoint and then access it the same way you'd access that. It's not like you'd specificallly build a debugging executable to use pprof. Most people just have it always enabled so they can go in and debug their service when it is in a state of failure.

ianlancetaylor · 2024-06-25T13:36:06Z

CC @golang/runtime

navytux · 2024-06-26T17:28:36Z

For the reference: go123's xruntime also hit this regression when trying to add support for go1.23: there stopTheWorld is used to serialize modification of program tracepoints wrt the program running itself:

https://lab.nexedi.com/kirr/go123/-/blob/8299741f/tracing/internal/xruntime/runtime.go#L24-38
https://lab.nexedi.com/kirr/go123/-/blob/8299741f/tracing/tracing.go#L211-225
https://pkg.go.dev/lab.nexedi.com/kirr/[email protected]/tracing

lab.nexedi.com/kirr/go123/tracing was working ok since 2017 and it would be good to preserve support for that.

Thanks beforehand,
Kirill

Generate g for today's state of Go 1.23 (go1.23rc1-0-g7dff7439dc). Compared to Go1.22 many new fields and several new types were introduced as shown by the diff below. Regenerated files stay without changes for Go1.22 and previous releases. Support for Go1.23 remains incomplete because currently there is no way to access runtime.stopTheWorld via go:linkname : golang/go#68167 (comment) ---- 8< ---- diff --git a/zruntime_g_go1.22.go b/zruntime_g_go1.23.go index 910382b..4aa6799 100644 --- a/zruntime_g_go1.22.go +++ b/zruntime_g_go1.23.go @@ -1,7 +1,7 @@ // Code generated by g_typedef; DO NOT EDIT. -//go:build go1.22 && !go1.23 -// +build go1.22,!go1.23 +//go:build go1.23 && !go1.24 +// +build go1.23,!go1.24 package xruntime @@ -26,6 +26,7 @@ type g struct { sched gobuf syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc + syscallbp uintptr // if status==Gsyscall, syscallbp = sched.bp to use in fpTraceback stktopsp uintptr // expected sp at top of stack, to check in traceback // param is a generic pointer parameter field used to pass // values in particular contexts where other storage for the @@ -95,14 +96,15 @@ type g struct { cgoCtxt []uintptr // cgo traceback context labels unsafe.Pointer // profiler labels timer *timer // cached timer for time.Sleep + sleepWhen int64 // when to sleep until selectDone atomic.Uint32 // are we participating in a select and did someone win the race? - coroarg *coro // argument during coroutine transfers - // goroutineProfiled indicates the status of this goroutine's stack for the // current in-progress goroutine profile goroutineProfiled goroutineProfileStateHolder + coroarg *coro // argument during coroutine transfers + // Per-G tracer state. trace gTraceState @@ -182,27 +184,51 @@ type funcval struct { fn uintptr } type timer struct { - // If this timer is on a heap, which P's heap it is on. - // puintptr rather than *p to match uintptr in the versions - // of this struct defined in other packages. - pp puintptr + // mu protects reads and writes to all fields, with exceptions noted below. + mu mutex + + astate uint8 // atomic copy of state bits at last unlock + state uint8 // state bits + isChan bool // timer has a channel; immutable; can be read without lock + blocked uint32 // number of goroutines blocked on timer's channel // Timer wakes up at when, and then at when+period, ... (period > 0 only) - // each time calling f(arg, now) in the timer goroutine, so f must be + // each time calling f(arg, seq, delay) in the timer goroutine, so f must be // a well-behaved function and not block. // - // when must be positive on an active timer. + // The arg and seq are client-specified opaque arguments passed back to f. + // When used from netpoll, arg and seq have meanings defined by netpoll + // and are completely opaque to this code; in that context, seq is a sequence + // number to recognize and squech stale function invocations. + // When used from package time, arg is a channel (for After, NewTicker) + // or the function to call (for AfterFunc) and seq is unused (0). + // + // Package time does not know about seq, but if this is a channel timer (t.isChan == true), + // this file uses t.seq as a sequence number to recognize and squelch + // sends that correspond to an earlier (stale) timer configuration, + // similar to its use in netpoll. In this usage (that is, when t.isChan == true), + // writes to seq are protected by both t.mu and t.sendLock, + // so reads are allowed when holding either of the two mutexes. + // + // The delay argument is nanotime() - t.when, meaning the delay in ns between + // when the timer should have gone off and now. Normally that amount is + // small enough not to matter, but for channel timers that are fed lazily, + // the delay can be arbitrarily long; package time subtracts it out to make + // it look like the send happened earlier than it actually did. + // (No one looked at the channel since then, or the send would have + // not happened so late, so no one can tell the difference.) when int64 period int64 - f func(interface{}, uintptr) + f func(arg interface{}, seq uintptr, delay int64) arg interface{} seq uintptr - // What to set the when field to in timerModifiedXX status. - nextwhen int64 + // If non-nil, the timers containing t. + ts *timers - // The status field holds one of the values below. - status atomic.Uint32 + // sendLock protects sends on the timer's channel. + // Not used for async (pre-Go 1.23) behavior when debug.asynctimerchan.Load() != 0. + sendLock mutex } type guintptr uintptr type puintptr uintptr @@ -221,6 +247,11 @@ type traceTime uint64 type coro struct { gp guintptr f func(*coro) + + // State for validating thread-lock interactions. + mp *m + lockedExt uint32 // mp's external LockOSThread counter at coro creation time. + lockedInt uint32 // mp's internal lockOSThread counter at coro creation time. } type traceSchedResourceState struct { // statusTraced indicates whether a status event was traced for this resource @@ -240,7 +271,18 @@ type traceSchedResourceState struct { // GoStatus and GoCreate events to omit a sequence number (implicitly 0). seq [2]uint64 } +type mutex struct { + // Empty struct if lock ranking is disabled, otherwise includes the lock rank + lockRankStruct + // Futex-based impl treats it as uint32 key, + // while sema-based impl as M* waitm. + // Used to be a union, but unions break precise GC. + key uintptr +} +type lockRankStruct struct { +} type uintreg uint // FIXME wrong on amd64p32 type m struct{} // FIXME stub type sudog struct{} // FIXME stub type timersBucket struct{} // FIXME stub +type timers struct{} // FIXME stub

rsc · 2024-06-27T02:46:41Z

stopTheWorld is very sensitive and difficult to use correctly.
We are very unlikely to support calling it even using linkname.
This is the kind of dangerous, difficult-to-keep-working use that
the new linkname restrictions are meant to catch.

aktau · 2024-06-27T08:28:20Z

Would an enhanced viewcore be an alternative? The desire to improve this is mentioned in #43930 (comment) (cc @mknyszek). If I'm not mistaken, the tracking issue is either #57447 or #45631.

Enhanced viewecore (gocore) would not support live debugging, but in my experience this is often just a nice-to-have. Most deployments have multiple concurrent tasks, managed by a cluster scheduler (Kubernetes, ...). Tasks go up and down for various reasons (machine reboot, crash, lack of resources, ...) and the job in its entirety should be tolerant to tasks disappearing. Hence, sending a signal to crash and produce a core dump (or heap dump) would be a decent way to identify the exact nature of leaks, for us.

Until this type of functionality is available, we're relegated to reading the code from the allocation stack of the leaked object (likely identified by diffing two heap profiles). This is (far) less productive for identifying leaks than the proposal or an improved core viewer.

fjl · 2024-06-27T08:35:33Z

Enhanced viewcore would be nice, but it's kind of orthogonal and doesn't cover some use cases. I often use memsize by to take snapshot reports over time, checking for trends in the number of reachable objects. Can't do that if the program has to crash to get the heap.

navytux · 2024-06-27T08:50:43Z

@rsc, thanks for feedback.

While I completely understand the desire of Go team to close access to private
functionality, my reading of #67401 is that
the access will not be closed to what is already used from outside to prevent
breaking existing applications and libraries. In other words for practical
reason the set of symbols, that is already used from outside, will continue to
be provided via go:linkname, and it is only new symbols, and symbols that were
not used previously, that become inaccessible via go:linkname to prevent
growing the set of private API that is exported in reality.

I agree that in ideal world the set of private API that is exported in reality
should be empty, and that reaching that empty set is the long-term goal.

However here we are not starting from scratch. Over the years people used to
have go:linkname working and started to depend on functionality, for which
there is usually no public equivalent. For example in my tracing library with
Go1.23 dropping access to stopTheWorld I'll have a hard time to find out what
to do.

So I suggest to please reconsider providing access to stopTheWorld via go:linkname.

It has been only a few days since go1.23rc1 and it is already two projects
speaking here due to breakage. I'm sure there will be more over time.

Kirill

rsc · 2024-06-27T11:36:59Z

@fjl I don't fully understand why pprof isn't enough to catch memory leaks. Usually when there's a memory leak it shows up as the dominant memory source in a pprof heap profile.

fjl · 2024-06-27T11:43:54Z

It's more for finding memory leaks where a large object graph is kept alive by a forgotten reference stuck in a slice backing array or channel buffer. AFAIK pprof heap profiles track new allocations of objects by location and type, which is useful, but it doesn't really give an absolute indication of what's still live. You can have lots of allocation activity which is all quickly reclaimed, and that would show up the same way, right?

aktau · 2024-06-27T11:50:36Z

Enhanced viewcore would be nice, but it's kind of orthogonal and doesn't cover some use cases. I often use memsize by to take snapshot reports over time, checking for trends in the number of reachable objects. Can't do that if the program has to crash to get the heap.

I don't think that's true. You could use a combination of heap profiles and viewcore to find it:

Take heap profile at time X
Take heap profile at time X+10 (say, with 2x RSS)
Diff both profiles.
Observe largest remaining stack(s)
Kill a process that's big in a way that leaves a core/heapdump.
Using enhanced viewcore(1), find what's keeping the objects identified in step 4 alive.

It's more for finding memory leaks where a large object graph is kept alive by a forgotten reference stuck in a slice backing array or channel buffer. AFAIK pprof heap profiles track new allocations of objects by location and type, which is useful, but it doesn't really give an absolute indication of what's still live. You can have lots of allocation activity which is all quickly reclaimed, and that would show up the same way, right?

No, the heap profile contains four views:

inuse_space, inuse_objects: objects live as of the end of the last completed GC
alloc_space, alloc_objects: objects allocated, as of the end of the last completed GC, since start of process (this is what you were referring to)

Using inuse_space and inuse_objects, you could fulfill step 4 above.

rsc · 2024-06-27T11:52:15Z

The pprof profile samples one allocation per half megabyte and records (number of bytes, number objects) x (allocated, freed). Based on those numbers, there are four profiles you can access: -inuse_space, -inuse_objects, -alloc_space, -alloc_objects. The inuse ones show only live data [allocated minus freed] as of the last GC (useful for identifying leaks), while the alloc ones show total allocation ever (useful for identifying sources of garbage). At this moment I believe the default that pprof shows is -inuse_space but I may be wrong about that.

fjl · 2024-06-27T11:59:39Z

Yeah, it's fine, leak debugging could probably be done in a different way. I'm not really here to argue my tool is the best. Just brought up because it got broken by Go 1.23rc and I think I won't be the only one missing access to STW functionality in runtime. It's an important facility.

Do you think there is any chance we can find a way to expose access to STW, possibly with the new API I suggested?

dominikh · 2024-06-27T12:22:41Z

Can't do that if the program has to crash to get the heap.

It doesn't. You can use tools such as gdb or gcore to get a coredump of a running process.

fjl · 2024-06-27T12:33:30Z

Yeah, that's certainly a possibility.

Regarding viewcore, it's a bit funny, I wrote memsize originally as a workaround because the heapdump viewer tool (https://github.com/randall77/heapdump14) got broken by runtime changes in Go 1.6 (#16410 (comment)). heapdump14 itself was an attempt to fix an even earlier tool for changes made by Go 1.4 (removal of full type information in dumps). Seems like my workaround was good enough for 6 whole releases :).

aclements · 2024-07-02T18:26:29Z

With memsize, what's the problem if you just don't stop the world? (Our best guess is that this prevents races that shear interface values, but wanted to ask.)

aclements · 2024-07-02T18:28:42Z

Our policy with exposing linknames has been to look at transitive imports as a proxy for "importance". It seems like memsize doesn't meet this bar, but I can also imagine it's the type of package you import temporarily when debugging, so it wouldn't show up in an ecosystem analysis anyway.

ianlancetaylor · 2024-07-02T18:39:26Z

Note that if you only import the package while debugging, it's possible to use the go build or go test option -ldflags=-checklinkname=0 while debugging.

fjl · 2024-07-03T07:50:16Z

With memsize, what's the problem if you just don't stop the world? (Our best guess is that this prevents races that shear interface values, but wanted to ask.)

It would just race with everything (including map updates, slice growth, interfaces...), and would give an inconsistent view. The idea was to have a tool that reports on the memory usage of a snapshot of a data structure. And it can also be used to compare such snapshots.

We use it in go-ethereum by registering some of the 'root objects' that hold references to most things in the system. So when it looks like there is a memory-related problem, we can go to the debugging HTTP endpoint, click the button, and check the object counts. There is no special 'debug build', it's just good to have this capability built in.

If this proposal turns out rejected, we'll disable this and move on. It's not the end of the world for me :). But I do think STW is a useful primitive for debugging tools, and the runtime should either expose it in some way, or at least not prevent access to it.

DemiMarie · 2024-07-06T18:32:05Z

Enhanced viewecore (gocore) would not support live debugging, but in my experience this is often just a nice-to-have. Most deployments have multiple concurrent tasks, managed by a cluster scheduler (Kubernetes, ...). Tasks go up and down for various reasons (machine reboot, crash, lack of resources, ...) and the job in its entirety should be tolerant to tasks disappearing. Hence, sending a signal to crash and produce a core dump (or heap dump) would be a decent way to identify the exact nature of leaks, for us.

I don’t think it is reasonable to assume that everryone is using a cluster scheduler, or even that most people are. For small-scale deployments, it adds pointless complexity.

aktau · 2024-07-08T07:45:39Z

I don’t think it is reasonable to assume that everryone is using a cluster scheduler, or even that most people are. For small-scale deployments, it adds pointless complexity.

I shouldn't have said cluster scheduler. What I meant is that we're talking about debugging a task that has a memory leak. It's going to fail at some point. Making it fail in a way such that it produces a core dump may not be all that different. Additionally, while I haven't done this myself, @dominikh mentions it's possible to extract a core from a running process in
#68167 (comment).

We use it in go-ethereum by registering some of the 'root objects' that hold references to most things in the system. So when it looks like there is a memory-related problem, we can go to the debugging HTTP endpoint, click the button, and check the object counts. There is no special 'debug build', it's just good to have this capability built in.

It does sound useful, one can view objects by retention stack instead of allocation stack (which normal heap profiles do). It reminds me of the difference between mutex profiles and blocking profiles. It'd be nice to have a "retention profile". Perhaps it could be a feature request? However I don't see a situation where a core/heap viewer couldn't yield the same information. Importantly though, such a viewer doesn't currently exist, so it's not a real alternative yet. Perhaps -ldflags=-checklinkname=0 isn't a bad workaround for tool/servers you are running until viewcore is fixed.

rsc · 2024-07-25T09:26:33Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2024-07-31T19:16:51Z

I wrote on June 26 that:

stopTheWorld is very sensitive and difficult to use correctly.
We are very unlikely to support calling it even using linkname.
This is the kind of dangerous, difficult-to-keep-working use that
the new linkname restrictions are meant to catch.

The discussion so far has not convinced me otherwise. If you have a memory leak, pprof still seems like the right first step. That's always been enough for programs I've debugged. And if you really need to reach for memsize, you can import it and then build with -ldflags=-checklinkname=0.

The downsides of exposing stop-the-world to user code just seem too risky.

fjl added the Proposal label Jun 25, 2024

gopherbot added this to the Proposal milestone Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: runtime: allow access to stopTheWorld/startTheWorld via go:linkname #68167

proposal: runtime: allow access to stopTheWorld/startTheWorld via go:linkname #68167

fjl commented Jun 25, 2024 •

edited

Loading

fjl commented Jun 25, 2024 •

edited

Loading

seankhliao commented Jun 25, 2024

fjl commented Jun 25, 2024 •

edited

Loading

ianlancetaylor commented Jun 25, 2024

navytux commented Jun 26, 2024 •

edited

Loading

rsc commented Jun 27, 2024

aktau commented Jun 27, 2024

fjl commented Jun 27, 2024

navytux commented Jun 27, 2024

rsc commented Jun 27, 2024

fjl commented Jun 27, 2024

aktau commented Jun 27, 2024

rsc commented Jun 27, 2024 •

edited

Loading

fjl commented Jun 27, 2024

dominikh commented Jun 27, 2024

fjl commented Jun 27, 2024 •

edited

Loading

aclements commented Jul 2, 2024

aclements commented Jul 2, 2024

ianlancetaylor commented Jul 2, 2024

fjl commented Jul 3, 2024 •

edited

Loading

DemiMarie commented Jul 6, 2024

aktau commented Jul 8, 2024

rsc commented Jul 25, 2024

rsc commented Jul 31, 2024

proposal: runtime: allow access to stopTheWorld/startTheWorld via go:linkname #68167

proposal: runtime: allow access to stopTheWorld/startTheWorld via go:linkname #68167

Comments

fjl commented Jun 25, 2024 • edited Loading

Proposal Details

fjl commented Jun 25, 2024 • edited Loading

seankhliao commented Jun 25, 2024

fjl commented Jun 25, 2024 • edited Loading

ianlancetaylor commented Jun 25, 2024

navytux commented Jun 26, 2024 • edited Loading

rsc commented Jun 27, 2024

aktau commented Jun 27, 2024

fjl commented Jun 27, 2024

navytux commented Jun 27, 2024

rsc commented Jun 27, 2024

fjl commented Jun 27, 2024

aktau commented Jun 27, 2024

rsc commented Jun 27, 2024 • edited Loading

fjl commented Jun 27, 2024

dominikh commented Jun 27, 2024

fjl commented Jun 27, 2024 • edited Loading

aclements commented Jul 2, 2024

aclements commented Jul 2, 2024

ianlancetaylor commented Jul 2, 2024

fjl commented Jul 3, 2024 • edited Loading

DemiMarie commented Jul 6, 2024

aktau commented Jul 8, 2024

rsc commented Jul 25, 2024

rsc commented Jul 31, 2024

fjl commented Jun 25, 2024 •

edited

Loading

fjl commented Jun 25, 2024 •

edited

Loading

fjl commented Jun 25, 2024 •

edited

Loading

navytux commented Jun 26, 2024 •

edited

Loading

rsc commented Jun 27, 2024 •

edited

Loading

fjl commented Jun 27, 2024 •

edited

Loading

fjl commented Jul 3, 2024 •

edited

Loading