Count GC's time waiting for a safepoint as GC time #53441

mbauman · 2024-02-22T19:33:10Z

This is a very simple change that will hopefully help folks identify things like https://discourse.julialang.org/t/inconsistent-cpu-utilisation-in-threads-loops/110512 a little faster.

Before:

julia> @time @sync begin
           Threads.@spawn @ccall sleep(1::Cuint)::Cuint
           sleep(.5)
           Threads.@spawn GC.gc()
       end
  1.078393 seconds (927 allocations: 65.078 KiB, 5.85% gc time, 1.19% compilation time)

After:

julia> @time @sync begin
           Threads.@spawn @ccall sleep(1::Cuint)::Cuint
           sleep(.5)
           Threads.@spawn GC.gc()
       end
  1.040344 seconds (1.35 k allocations: 65.266 KiB, 50.85% gc time: 94% of which was waiting for safepoints, 1.09% compilation time)

vchuravy · 2024-02-22T19:39:55Z

base/timing.jl

@@ -57,7 +57,7 @@ function GC_Diff(new::GC_Num, old::GC_Num)
 new.poolalloc - old.poolalloc,
 new.bigalloc - old.bigalloc,
 new.freecall - old.freecall,
- new.total_time  - old.total_time,
+ (new.total_time - old.total_time) + (new.total_time_to_safepoint - old.total_time_to_safepoint),


I would prefer introducing a new field in GC_Diff that is the safepoint time, and break the top-level report into two sections. As an example when I run a scaling test, I want to see that total_time stays constant (e.g. the work of GC scales well with the number of threads and GC workers). If gc time starts creeping up I might use that as a decision point to use more GC threads.

On the other hand if the latency to start GC is high (time to safepoint) I can use that as a decision point to look with a profiler if I spend to much time in certain ccalls.

Munging them together yields a simple: "GC is slow" but no actionable information.

Yeah, that's fair. In my view, "GC is slow" is very actionable information — and throwing more information at people isn't always helpful. How would you want this to appear? Perhaps we take inspiration from compilation time?

2.39% compilation time: 12% of which was recompilation

KristofferC · 2024-02-22T21:02:45Z

How many people that see this message will know what "waiting for safe point" means? It's quite a lot of text to add.

vchuravy · 2024-02-22T21:39:24Z

So instead of "waiting for safe point", I would call it latency. Or we could just print it as (30%|10%) GC eg. Total is the sum of those two values.

Another way of thinking about TTSP is that it is like the GC lock.

adienes · 2024-02-22T22:03:57Z

the information certainly seems useful and 👍 to the functionality, but I'm just noting that #52883 also recently lengthened the message in the parenthetical

maybe at this point @time contains enough information that it could be appropriate to print over multiple lines?

mbauman · 2024-02-22T22:05:48Z

My only goal here is to match what "gc time" means to me (and I'd wager the vast majority of Julia users) — and that's the total time spent inside GC.gc(), sleeping or not.

I really don't want yet another way of interpreting a particular category's breakdown. So in my view we only have two choices, and I'm happy with either:

Match what we do with compilation/recompilation: X% category time: Y% of which was caveat
Only print the total time spent inside GC.gc(), but have the details available in @timed's output

It does only print out this extra info if more than 0.5% of the GC time was spent waiting to start. I could make the caveat shorter with just "waiting" but when I wrote "waiting for safepoint" I chose to do so because I figured it'd make for good googlefodder.

raminammour · 2024-02-22T22:28:51Z

I am not an expert, but as people want more an more information going into @time (compilation, GC,...) , which has been very useful, then good heuristics on when to print the fine grained info may be worthwhile: say in this case, only print the "waiting" (+1 to omitting "for safepoint") time only if it is more than 10% (for example) of total time.
It may be arbitrary, but it preserves the concise output in most cases, and only alerts the user that something "else" is happening when it becomes important. The full info can always go in @timed for the gurus :)

Edit: ignore above, I just saw in the code that @mbauman already only shows this if it is > 50%

Thanks for doing this, I learned something new reading this!

vchuravy · 2024-02-22T22:58:33Z

One challenge with using total=TTSP+GC is that TTSP and useful time overlap.

So you could get a measurement where with two threads total is equivalent to 100% (one thread immediately request GC and the other ignores that request until the end).

Right now GC % is a measurement of something automatic, but useful. Mixing in TTSP makes the time reported less useful for me. Which is why I likened it to the recently added lock measurement.

Put another way, currently GC time is a measurement of work and how efficiently that work is being processed, whereas TTSP is a measurement of work not occuring.

I should say that I am all in favor of surfacing this information since it I strongly believe that this is an issue we as a community are not aware of enough.

oscardssmith · 2024-02-22T23:03:46Z

The counterpoint to this is that we really do need to make it so that TTSP is always ~0. Right now ccall is the main time where this doesn't happen, but we already basically know what we need to do to fix this.

mbauman · 2024-02-22T23:07:19Z

Put another way, currently GC time is a measurement of work and how efficiently that work is being processed, whereas TTSP is a measurement of work not occuring.

Ok. But it's the combined number that is the measurement of time my work is not happening. That's why it's what's useful to me — and I'd wager most folks.

vchuravy · 2024-02-22T23:18:59Z

But the action to take in either case is drastically different. In one case I might need to add some strategical GC.safepoint calls or annotate ccall as GC safe, with the goal of reducing latency and in the other case I might need to preallocate, reduce heap usage in general...

Maybe we report TTSP in the lock section? The more I think about that the more I like the idea. GC is waiting on a lock to acquire exclusive access to the heap.

d-netto · 2024-02-23T00:24:37Z

Want to note that there seems to be some history in the interpretation "GC time = mark time + sweep time".

Note that GCBenchmarks displays/interprets GC time like "mark time + sweep time" (doesn't count TTSP in it), and pretty much all dashboards we have internally at RAI do so.

Having GC times being defined in two ways in GCBenchmarks (which is arguably our main open-source tools for GC performance analysis now) and @time seems inconsistent.

mbauman · 2024-02-23T14:44:45Z

I'm just confused how to concretely — and succinctly and understandably — say that this is due to the GC without saying it's GC time. I suppose it could be in its own section:

1.040344 seconds (1.35 k allocations: 65.266 KiB, 5.85% gc time,  49.85% time waiting for gc, 1.09% compilation time)

One additional challenge is in determining the "floor" for elision — this time is always nonzero.

I did also think about using the locks section, but there are two issues there: the existing metric is different (time vs count), and even if we started counting counts, I'd still want to separate out GC locks vs. ReentrantLocks which means that we still need to add a new descriptor.

base/timing.jl

vchuravy · 2024-02-23T15:19:36Z

At the core of the issue for me are units. Currently @time measures real time.
The time a human would experience. In programming we often refer to cpu time, which is a useful concept in concurrent programming.

Take as an example the program below it uses 10 units of real-time and 20 units of cpu-time. If we look at the time G the GC takes to run it would be 2 units of real time and 4 units of cpu time. In both cases GC takes 20% of the program execution.

T1: ----RxxxGG
T2: --------GG

Now what about TTSP? In the ascii diagram above R is the moment we requested GC and x is the time waited on the other mutator. If we use the real-time measurement it would seem like we are using 4 units or 40% of our time on waiting, but using CPU time we can see that this is not the case and TTSP takes 20% of our CPU time, a much more accurate representation of the cost of getting to the safepoint.

This problem gets even worse the more threads we are adding:

T1: ----RxxxGG
T2: --------GG
T3: --------GG
T4: --------GG
T5: --------GG

Still 10 units of real time, but 50 units of cpu time. GC takes 20%, but TTSP takes 40% in real time and 8% of cpu-time.

But it's the combined number that is the measurement of time my work is not happening. That's why it's what's useful to me — and I'd wager most folks.

I don't disagree (besides being cute and stating that GC is still doing your work, but work you procrastinated on).

But I wager that you actually care about the cpu-time ratio and not the real-time ratio, and phrasing a multi-core measurement in real-time is not useful.
It doesn't convey the "how big of a problem is this". I want to make sure that the information we present is actionable.

I don't think we currently track CPU-time accurately throughout the system,
but that is a separate issue.

mbauman · 2024-02-23T17:09:37Z

So what's your suggestion on how to present this information? In my mind, the overarching goal of @time's parenthetical is to summarize some overheads that you might not have explicitly asked for and might not expect — and which might be highly variable. I put time-to-safepoint in that bin. And I still say it fits the category of a GC-related overhead.

I hear you on the units, but I must confess I don't have a solid understanding of what the percentages are supposed to mean in the context of multithreading. If you were to have pressed me (before I had looked at implementation yesterday) I might have guessed they're numerated by CPU time and denominated by wall time — but maybe that's just because compilation time sometimes reports >100%.

I still haven't quite grokked how the GC tracks multiple threads waiting for a safepoint — but it seems that it only counts the time from the first/worst one? That is, this diagram:

T1: --RxxxxxxxxxGG
T2: ----RxxxxxxxGG
T3: ------RxxxxxGG
T4: ---------RxxGG

The GC stats — as they currently exist — seem like they do the real-world timing, it's just the worst-case-thread one. (edit: I'm guessing that's because the other threads start waiting on behalf of the first thread's collection before they can even try to start their own, and then we end up with 4 synchronized collections back-to-back, but the effect is that this number is more real-time-like than I expected):

julia> @time @sync begin
           Threads.@spawn (@ccall sleep(1::Cuint)::Cuint; GC.gc())
           Threads.@spawn (@ccall sleep(2::Cuint)::Cuint; GC.gc())
           Threads.@spawn (@ccall sleep(3::Cuint)::Cuint; GC.gc())
           Threads.@spawn (@ccall sleep(4::Cuint)::Cuint; GC.gc())
       end
  4.050298 seconds (3.43 k allocations: 163.219 KiB, 75.02% gc time: 99% of which was waiting for safepoints, 0.73% compilation time)

vchuravy · 2024-02-23T17:40:25Z

The GC stats — as they currently exist — seem like they do the real-world timing, it's just the worst-case-thread one:

Yes we currently only track the time the thread requesting GC spends waiting for all the other threads to reach a safe point and it gains exclusive access to the heap and thus object graph.

The real time vs cpu time is probably also the reason why compile time can exceed total time

mbauman · 2024-02-23T17:58:41Z

So what's your suggestion on how to present this information?

:)

Under the assumption that folks are trying to get all threads evenly chuggingly along, I think this PR is capturing the situation quite well. It's that worst-case thread that you end up waiting for:

julia> @time @sync begin
           Threads.@spawn (@ccall sleep(1::Cuint)::Cuint; GC.gc(); @ccall sleep(4::Cuint)::Cuint)
           Threads.@spawn (@ccall sleep(2::Cuint)::Cuint; GC.safepoint(); @ccall sleep(3::Cuint)::Cuint)
           Threads.@spawn (@ccall sleep(3::Cuint)::Cuint; GC.safepoint(); @ccall sleep(2::Cuint)::Cuint)
           Threads.@spawn (@ccall sleep(4::Cuint)::Cuint; GC.safepoint(); @ccall sleep(1::Cuint)::Cuint)
       end
  8.071955 seconds (3.96 k allocations: 190.641 KiB, 37.86% gc time: 98% of which was waiting for safepoints, 0.33% compilation time)
Task (done) @0x000000010a344650

julia> 8.071955 * (1-.3786)
5.015912837

Count GC's time waiting for a safepoint as GC time

8b663b0

vchuravy reviewed Feb 22, 2024

View reviewed changes

track and report waiting time separately

c481fe7

mbauman added GC Garbage collector observability metrics, timing, understandability, reflection, logging, ... labels Feb 22, 2024

mbauman commented Feb 23, 2024

View reviewed changes

base/timing.jl Outdated Show resolved Hide resolved

use the threshold I intended here

1af819d

vchuravy mentioned this pull request Apr 25, 2024

GC and multithreading. Strange behaviour. #54247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Count GC's time waiting for a safepoint as GC time #53441

Count GC's time waiting for a safepoint as GC time #53441

mbauman commented Feb 22, 2024 •

edited

Loading

vchuravy Feb 22, 2024

mbauman Feb 22, 2024

KristofferC commented Feb 22, 2024

vchuravy commented Feb 22, 2024

adienes commented Feb 22, 2024

mbauman commented Feb 22, 2024

raminammour commented Feb 22, 2024 •

edited

Loading

vchuravy commented Feb 22, 2024 •

edited

Loading

oscardssmith commented Feb 22, 2024

mbauman commented Feb 22, 2024

vchuravy commented Feb 22, 2024

d-netto commented Feb 23, 2024

mbauman commented Feb 23, 2024

vchuravy commented Feb 23, 2024

mbauman commented Feb 23, 2024 •

edited

Loading

vchuravy commented Feb 23, 2024 •

edited

Loading

mbauman commented Feb 23, 2024

Count GC's time waiting for a safepoint as GC time #53441

Are you sure you want to change the base?

Count GC's time waiting for a safepoint as GC time #53441

Conversation

mbauman commented Feb 22, 2024 • edited Loading

vchuravy Feb 22, 2024

Choose a reason for hiding this comment

mbauman Feb 22, 2024

Choose a reason for hiding this comment

KristofferC commented Feb 22, 2024

vchuravy commented Feb 22, 2024

adienes commented Feb 22, 2024

mbauman commented Feb 22, 2024

raminammour commented Feb 22, 2024 • edited Loading

vchuravy commented Feb 22, 2024 • edited Loading

oscardssmith commented Feb 22, 2024

mbauman commented Feb 22, 2024

vchuravy commented Feb 22, 2024

d-netto commented Feb 23, 2024

mbauman commented Feb 23, 2024

vchuravy commented Feb 23, 2024

mbauman commented Feb 23, 2024 • edited Loading

vchuravy commented Feb 23, 2024 • edited Loading

mbauman commented Feb 23, 2024

mbauman commented Feb 22, 2024 •

edited

Loading

raminammour commented Feb 22, 2024 •

edited

Loading

vchuravy commented Feb 22, 2024 •

edited

Loading

mbauman commented Feb 23, 2024 •

edited

Loading

vchuravy commented Feb 23, 2024 •

edited

Loading