Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: allow access to stopTheWorld/startTheWorld via go:linkname #68167

Open
fjl opened this issue Jun 25, 2024 · 24 comments
Open
Labels
Milestone

Comments

@fjl
Copy link

fjl commented Jun 25, 2024

Proposal Details

My module github.com/fjl/memsize walks a data structure and computes its total referred memory size in bytes, as well as producing a report of per-type memory usage. This is meant to be used for debugging memory leaks in live programs in a similar manner to taking a profile over HTTP with pprof, and to my knowledge there exists no other comparable tool.

memsize requires access to runtime.stopTheWorld and runtime.startTheWorld in order to safely walk the object graph without racing with the program that is being debugged. Up until Go version 1.22, I was able to call these functions via go:linkname. Go 1.23 disallows this, and as per #67401 only specifically approved entry points can be linkname'd.

Please allow access to runtime.stopTheWorld and runtime.startTheWorld by adding a go:linkname directive for Go 1.23.

Alternatively, we could consider adding an exposed API, perhaps in package runtime/debug, to perform a similar task. I could envision something like:

func WithWorldStopped(reason string, f func())

Perhaps access to stopTheWorld/startTheWorld could be granted temporarily during the Go 1.23 cycle while we work out a replacement API.

xref fjl/memsize#4
xref #67401

@fjl fjl added the Proposal label Jun 25, 2024
@gopherbot gopherbot added this to the Proposal milestone Jun 25, 2024
@fjl
Copy link
Author

fjl commented Jun 25, 2024

BTW, I wasn't sure which issue category was appropriate here. Originally went with Bug, but it didn't really fit either. Proposal seems kind of strongly-worded, I really just want to get my request in.

@seankhliao
Copy link
Member

given it is for debugging use, I didn't see an issue with requiring the checklinkname flag during a build.

@fjl
Copy link
Author

fjl commented Jun 25, 2024

memsize is meant to be integrated into production executables, i.e. you typically set up the memsize web interface handler on the pprof endpoint and then access it the same way you'd access that. It's not like you'd specificallly build a debugging executable to use pprof. Most people just have it always enabled so they can go in and debug their service when it is in a state of failure.

@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

@navytux
Copy link
Contributor

navytux commented Jun 26, 2024

For the reference: go123's xruntime also hit this regression when trying to add support for go1.23: there stopTheWorld is used to serialize modification of program tracepoints wrt the program running itself:

https://lab.nexedi.com/kirr/go123/-/blob/8299741f/tracing/internal/xruntime/runtime.go#L24-38
https://lab.nexedi.com/kirr/go123/-/blob/8299741f/tracing/tracing.go#L211-225
https://pkg.go.dev/lab.nexedi.com/kirr/[email protected]/tracing

lab.nexedi.com/kirr/go123/tracing was working ok since 2017 and it would be good to preserve support for that.

Thanks beforehand,
Kirill

navytux added a commit to navytux/go123 that referenced this issue Jun 26, 2024
Generate g for today's state of Go 1.23 (go1.23rc1-0-g7dff7439dc).
Compared to Go1.22 many new fields and several new types were
introduced as shown by the diff below.

Regenerated files stay without changes for Go1.22 and previous releases.

Support for Go1.23 remains incomplete because currently there is no way to
access runtime.stopTheWorld via go:linkname :

golang/go#68167 (comment)

---- 8< ----
diff --git a/zruntime_g_go1.22.go b/zruntime_g_go1.23.go
index 910382b..4aa6799 100644
--- a/zruntime_g_go1.22.go
+++ b/zruntime_g_go1.23.go
@@ -1,7 +1,7 @@
 // Code generated by g_typedef; DO NOT EDIT.

-//go:build go1.22 && !go1.23
-// +build go1.22,!go1.23
+//go:build go1.23 && !go1.24
+// +build go1.23,!go1.24

 package xruntime

@@ -26,6 +26,7 @@ type g struct {
 	sched     gobuf
 	syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc
 	syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc
+	syscallbp uintptr // if status==Gsyscall, syscallbp = sched.bp to use in fpTraceback
 	stktopsp  uintptr // expected sp at top of stack, to check in traceback
 	// param is a generic pointer parameter field used to pass
 	// values in particular contexts where other storage for the
@@ -95,14 +96,15 @@ type g struct {
 	cgoCtxt       []uintptr      // cgo traceback context
 	labels        unsafe.Pointer // profiler labels
 	timer         *timer         // cached timer for time.Sleep
+	sleepWhen     int64          // when to sleep until
 	selectDone    atomic.Uint32  // are we participating in a select and did someone win the race?

-	coroarg *coro // argument during coroutine transfers
-
 	// goroutineProfiled indicates the status of this goroutine's stack for the
 	// current in-progress goroutine profile
 	goroutineProfiled goroutineProfileStateHolder

+	coroarg *coro // argument during coroutine transfers
+
 	// Per-G tracer state.
 	trace gTraceState

@@ -182,27 +184,51 @@ type funcval struct {
 	fn uintptr
 }
 type timer struct {
-	// If this timer is on a heap, which P's heap it is on.
-	// puintptr rather than *p to match uintptr in the versions
-	// of this struct defined in other packages.
-	pp puintptr
+	// mu protects reads and writes to all fields, with exceptions noted below.
+	mu mutex
+
+	astate  uint8  // atomic copy of state bits at last unlock
+	state   uint8  // state bits
+	isChan  bool   // timer has a channel; immutable; can be read without lock
+	blocked uint32 // number of goroutines blocked on timer's channel

 	// Timer wakes up at when, and then at when+period, ... (period > 0 only)
-	// each time calling f(arg, now) in the timer goroutine, so f must be
+	// each time calling f(arg, seq, delay) in the timer goroutine, so f must be
 	// a well-behaved function and not block.
 	//
-	// when must be positive on an active timer.
+	// The arg and seq are client-specified opaque arguments passed back to f.
+	// When used from netpoll, arg and seq have meanings defined by netpoll
+	// and are completely opaque to this code; in that context, seq is a sequence
+	// number to recognize and squech stale function invocations.
+	// When used from package time, arg is a channel (for After, NewTicker)
+	// or the function to call (for AfterFunc) and seq is unused (0).
+	//
+	// Package time does not know about seq, but if this is a channel timer (t.isChan == true),
+	// this file uses t.seq as a sequence number to recognize and squelch
+	// sends that correspond to an earlier (stale) timer configuration,
+	// similar to its use in netpoll. In this usage (that is, when t.isChan == true),
+	// writes to seq are protected by both t.mu and t.sendLock,
+	// so reads are allowed when holding either of the two mutexes.
+	//
+	// The delay argument is nanotime() - t.when, meaning the delay in ns between
+	// when the timer should have gone off and now. Normally that amount is
+	// small enough not to matter, but for channel timers that are fed lazily,
+	// the delay can be arbitrarily long; package time subtracts it out to make
+	// it look like the send happened earlier than it actually did.
+	// (No one looked at the channel since then, or the send would have
+	// not happened so late, so no one can tell the difference.)
 	when   int64
 	period int64
-	f      func(interface{}, uintptr)
+	f      func(arg interface{}, seq uintptr, delay int64)
 	arg    interface{}
 	seq    uintptr

-	// What to set the when field to in timerModifiedXX status.
-	nextwhen int64
+	// If non-nil, the timers containing t.
+	ts *timers

-	// The status field holds one of the values below.
-	status atomic.Uint32
+	// sendLock protects sends on the timer's channel.
+	// Not used for async (pre-Go 1.23) behavior when debug.asynctimerchan.Load() != 0.
+	sendLock mutex
 }
 type guintptr uintptr
 type puintptr uintptr
@@ -221,6 +247,11 @@ type traceTime uint64
 type coro struct {
 	gp guintptr
 	f  func(*coro)
+
+	// State for validating thread-lock interactions.
+	mp        *m
+	lockedExt uint32 // mp's external LockOSThread counter at coro creation time.
+	lockedInt uint32 // mp's internal lockOSThread counter at coro creation time.
 }
 type traceSchedResourceState struct {
 	// statusTraced indicates whether a status event was traced for this resource
@@ -240,7 +271,18 @@ type traceSchedResourceState struct {
 	// GoStatus and GoCreate events to omit a sequence number (implicitly 0).
 	seq [2]uint64
 }
+type mutex struct {
+	// Empty struct if lock ranking is disabled, otherwise includes the lock rank
+	lockRankStruct
+	// Futex-based impl treats it as uint32 key,
+	// while sema-based impl as M* waitm.
+	// Used to be a union, but unions break precise GC.
+	key uintptr
+}
+type lockRankStruct struct {
+}
 type uintreg uint          // FIXME wrong on amd64p32
 type m struct{}            // FIXME stub
 type sudog struct{}        // FIXME stub
 type timersBucket struct{} // FIXME stub
+type timers struct{}       // FIXME stub
@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

stopTheWorld is very sensitive and difficult to use correctly.
We are very unlikely to support calling it even using linkname.
This is the kind of dangerous, difficult-to-keep-working use that
the new linkname restrictions are meant to catch.

@aktau
Copy link
Contributor

aktau commented Jun 27, 2024

Would an enhanced viewcore be an alternative? The desire to improve this is mentioned in #43930 (comment) (cc @mknyszek). If I'm not mistaken, the tracking issue is either #57447 or #45631.

Enhanced viewecore (gocore) would not support live debugging, but in my experience this is often just a nice-to-have. Most deployments have multiple concurrent tasks, managed by a cluster scheduler (Kubernetes, ...). Tasks go up and down for various reasons (machine reboot, crash, lack of resources, ...) and the job in its entirety should be tolerant to tasks disappearing. Hence, sending a signal to crash and produce a core dump (or heap dump) would be a decent way to identify the exact nature of leaks, for us.

Until this type of functionality is available, we're relegated to reading the code from the allocation stack of the leaked object (likely identified by diffing two heap profiles). This is (far) less productive for identifying leaks than the proposal or an improved core viewer.

@fjl
Copy link
Author

fjl commented Jun 27, 2024

Enhanced viewcore would be nice, but it's kind of orthogonal and doesn't cover some use cases. I often use memsize by to take snapshot reports over time, checking for trends in the number of reachable objects. Can't do that if the program has to crash to get the heap.

@navytux
Copy link
Contributor

navytux commented Jun 27, 2024

@rsc, thanks for feedback.

While I completely understand the desire of Go team to close access to private
functionality, my reading of #67401 is that
the access will not be closed to what is already used from outside to prevent
breaking existing applications and libraries. In other words for practical
reason the set of symbols, that is already used from outside, will continue to
be provided via go:linkname, and it is only new symbols, and symbols that were
not used previously, that become inaccessible via go:linkname to prevent
growing the set of private API that is exported in reality.

I agree that in ideal world the set of private API that is exported in reality
should be empty, and that reaching that empty set is the long-term goal.

However here we are not starting from scratch. Over the years people used to
have go:linkname working and started to depend on functionality, for which
there is usually no public equivalent. For example in my tracing library with
Go1.23 dropping access to stopTheWorld I'll have a hard time to find out what
to do.

So I suggest to please reconsider providing access to stopTheWorld via go:linkname.

It has been only a few days since go1.23rc1 and it is already two projects
speaking here due to breakage. I'm sure there will be more over time.

Kirill

@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

@fjl I don't fully understand why pprof isn't enough to catch memory leaks. Usually when there's a memory leak it shows up as the dominant memory source in a pprof heap profile.

@fjl
Copy link
Author

fjl commented Jun 27, 2024

It's more for finding memory leaks where a large object graph is kept alive by a forgotten reference stuck in a slice backing array or channel buffer. AFAIK pprof heap profiles track new allocations of objects by location and type, which is useful, but it doesn't really give an absolute indication of what's still live. You can have lots of allocation activity which is all quickly reclaimed, and that would show up the same way, right?

@aktau
Copy link
Contributor

aktau commented Jun 27, 2024

Enhanced viewcore would be nice, but it's kind of orthogonal and doesn't cover some use cases. I often use memsize by to take snapshot reports over time, checking for trends in the number of reachable objects. Can't do that if the program has to crash to get the heap.

I don't think that's true. You could use a combination of heap profiles and viewcore to find it:

  1. Take heap profile at time X
  2. Take heap profile at time X+10 (say, with 2x RSS)
  3. Diff both profiles.
  4. Observe largest remaining stack(s)
  5. Kill a process that's big in a way that leaves a core/heapdump.
  6. Using enhanced viewcore(1), find what's keeping the objects identified in step 4 alive.

It's more for finding memory leaks where a large object graph is kept alive by a forgotten reference stuck in a slice backing array or channel buffer. AFAIK pprof heap profiles track new allocations of objects by location and type, which is useful, but it doesn't really give an absolute indication of what's still live. You can have lots of allocation activity which is all quickly reclaimed, and that would show up the same way, right?

No, the heap profile contains four views:

  • inuse_space, inuse_objects: objects live as of the end of the last completed GC
  • alloc_space, alloc_objects: objects allocated, as of the end of the last completed GC, since start of process (this is what you were referring to)

Using inuse_space and inuse_objects, you could fulfill step 4 above.

@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

The pprof profile samples one allocation per half megabyte and records (number of bytes, number objects) x (allocated, freed). Based on those numbers, there are four profiles you can access: -inuse_space, -inuse_objects, -alloc_space, -alloc_objects. The inuse ones show only live data [allocated minus freed] as of the last GC (useful for identifying leaks), while the alloc ones show total allocation ever (useful for identifying sources of garbage). At this moment I believe the default that pprof shows is -inuse_space but I may be wrong about that.

@fjl
Copy link
Author

fjl commented Jun 27, 2024

Yeah, it's fine, leak debugging could probably be done in a different way. I'm not really here to argue my tool is the best. Just brought up because it got broken by Go 1.23rc and I think I won't be the only one missing access to STW functionality in runtime. It's an important facility.

Do you think there is any chance we can find a way to expose access to STW, possibly with the new API I suggested?

@dominikh
Copy link
Member

Can't do that if the program has to crash to get the heap.

It doesn't. You can use tools such as gdb or gcore to get a coredump of a running process.

@fjl
Copy link
Author

fjl commented Jun 27, 2024

Yeah, that's certainly a possibility.

Regarding viewcore, it's a bit funny, I wrote memsize originally as a workaround because the heapdump viewer tool (https://github.com/randall77/heapdump14) got broken by runtime changes in Go 1.6 (#16410 (comment)). heapdump14 itself was an attempt to fix an even earlier tool for changes made by Go 1.4 (removal of full type information in dumps). Seems like my workaround was good enough for 6 whole releases :).

@aclements
Copy link
Member

With memsize, what's the problem if you just don't stop the world? (Our best guess is that this prevents races that shear interface values, but wanted to ask.)

@aclements
Copy link
Member

Our policy with exposing linknames has been to look at transitive imports as a proxy for "importance". It seems like memsize doesn't meet this bar, but I can also imagine it's the type of package you import temporarily when debugging, so it wouldn't show up in an ecosystem analysis anyway.

@ianlancetaylor
Copy link
Contributor

Note that if you only import the package while debugging, it's possible to use the go build or go test option -ldflags=-checklinkname=0 while debugging.

@fjl
Copy link
Author

fjl commented Jul 3, 2024

With memsize, what's the problem if you just don't stop the world? (Our best guess is that this prevents races that shear interface values, but wanted to ask.)

It would just race with everything (including map updates, slice growth, interfaces...), and would give an inconsistent view. The idea was to have a tool that reports on the memory usage of a snapshot of a data structure. And it can also be used to compare such snapshots.

We use it in go-ethereum by registering some of the 'root objects' that hold references to most things in the system. So when it looks like there is a memory-related problem, we can go to the debugging HTTP endpoint, click the button, and check the object counts. There is no special 'debug build', it's just good to have this capability built in.

If this proposal turns out rejected, we'll disable this and move on. It's not the end of the world for me :). But I do think STW is a useful primitive for debugging tools, and the runtime should either expose it in some way, or at least not prevent access to it.

@DemiMarie
Copy link

Enhanced viewecore (gocore) would not support live debugging, but in my experience this is often just a nice-to-have. Most deployments have multiple concurrent tasks, managed by a cluster scheduler (Kubernetes, ...). Tasks go up and down for various reasons (machine reboot, crash, lack of resources, ...) and the job in its entirety should be tolerant to tasks disappearing. Hence, sending a signal to crash and produce a core dump (or heap dump) would be a decent way to identify the exact nature of leaks, for us.

I don’t think it is reasonable to assume that everryone is using a cluster scheduler, or even that most people are. For small-scale deployments, it adds pointless complexity.

@aktau
Copy link
Contributor

aktau commented Jul 8, 2024

I don’t think it is reasonable to assume that everryone is using a cluster scheduler, or even that most people are. For small-scale deployments, it adds pointless complexity.

I shouldn't have said cluster scheduler. What I meant is that we're talking about debugging a task that has a memory leak. It's going to fail at some point. Making it fail in a way such that it produces a core dump may not be all that different. Additionally, while I haven't done this myself, @dominikh mentions it's possible to extract a core from a running process in
#68167 (comment).

We use it in go-ethereum by registering some of the 'root objects' that hold references to most things in the system. So when it looks like there is a memory-related problem, we can go to the debugging HTTP endpoint, click the button, and check the object counts. There is no special 'debug build', it's just good to have this capability built in.

It does sound useful, one can view objects by retention stack instead of allocation stack (which normal heap profiles do). It reminds me of the difference between mutex profiles and blocking profiles. It'd be nice to have a "retention profile". Perhaps it could be a feature request? However I don't see a situation where a core/heap viewer couldn't yield the same information. Importantly though, such a viewer doesn't currently exist, so it's not a real alternative yet. Perhaps -ldflags=-checklinkname=0 isn't a bad workaround for tool/servers you are running until viewcore is fixed.

@rsc
Copy link
Contributor

rsc commented Jul 25, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jul 31, 2024

I wrote on June 26 that:

stopTheWorld is very sensitive and difficult to use correctly.
We are very unlikely to support calling it even using linkname.
This is the kind of dangerous, difficult-to-keep-working use that
the new linkname restrictions are meant to catch.

The discussion so far has not convinced me otherwise. If you have a memory leak, pprof still seems like the right first step. That's always been enough for programs I've debugged. And if you really need to reach for memsize, you can import it and then build with -ldflags=-checklinkname=0.

The downsides of exposing stop-the-world to user code just seem too risky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Active
Development

No branches or pull requests

10 participants