fix use of jl_n_threads in gc; scale default collect interval #32633

JeffBezanson · 2019-07-19T21:22:50Z

The first commit fixes a bug where jl_gc_initsawjl_n_threads == 0` because it hadn't been set yet. The second commit scales the default collect interval by the number of threads. Otherwise, adding more threads basically translates directly to spending a higher % of runtime in GC.

This makes the collect interval fully thread-local, allowing it to naturally scale (if allocation happens on n threads, the effective interval is n times bigger). This also removes the dependence of gc_init on the number of threads (which wasn't working, since jl_n_threads wasn't initialized yet).

vchuravy · 2019-07-19T23:53:59Z

src/gc.c

@@ -3029,6 +3029,9 @@ void jl_gc_init(void)
 arraylist_new(&finalizer_list_marked, 0);
 arraylist_new(&to_finalize, 0);

+ assert(jl_n_threads > 0);
+ default_collect_interval = default_collect_interval * jl_n_threads;


Is linear scaling advisable? Now we have much longer interval in serial sections of the code, leading to generally more memory usage. Maybe some form of sublinear scaling would make sense.

Just depends how much time you'd like to spend in GC :) So far I've found that anything less than this means more threads => greater % GC time and it's quite depressing when you're trying to get something to scale.

chethega · 2019-07-21T16:51:13Z

If I understand this issue right, the collect_interval describes the amount of memory slack (how much memory are we willing to waste on maybe-dead objects). Only one thread can run the gc, and typically all other cores idle in this time (unless we can feed them with something like BLAS). If we move up from e.g. 1 to 8 threads without changing the slack, then the amount and percentage of core-cycles spent on gc will remain the same, but these will represent a larger fraction of real-time. In effect, the cost of gc has gone up 8x, because we additionally burn all the possible work that the other 7 cores could have done in this time.

So you propose to simply linearly scale up the amount of slack by the number of cores; then the effective cost of gc stays the same.

Did I understand this right?

This looks weird to me: If we can afford the larger slack, then we should use it regardless of the number of threads. If we cannot afford the larger slack, then a larger number of threads does not make it more affordable.

In other words, maybe we want a sensible way for users to tell julia how much they value memory against real time against core-time. Especially since extra memory slack is often almost free until it suddenly becomes almost unaffordable (if the system has N gig of ram and is not doing anything else, then there is not a lot of downside to using up this memory; but we cannot use more memory than we have, and at some threshold the kernel will start doing stuff like swapping or evicting stuff from very useful caches).

vtjnash · 2019-07-21T18:59:13Z

I think there’s a couple mitigating factors. When the threads are in use, that also implies that other processes on the system aren’t using that time/memory. But more significantly, reclaiming space makes allocation much cheaper, but gc scales with the amount of memory still in use. With more threads, it may take more work to reach the same GC operational ratio.

JeffBezanson · 2019-07-21T20:16:23Z

Yes good points. I'm going to redo this; I think increasing the interval itself is not quite right, since that also causes code that uses only one thread (out of many available) to use more memory, which shouldn't be strictly necessary. I think the thing to do is really simple: just make the allocation counters thread-local, and GC as soon as any one thread hits the interval. That way single-thread code behaves the same as before, but if more threads are running and allocating then the effective interval naturally scales with the number of threads.

vtjnash · 2019-07-22T15:15:42Z

src/gc.c

- gc_num.allocd = -(int64_t)gc_num.interval;
+ size_t allocd = 0;
+ for (int i = 0; i < jl_n_threads; i++) {
+ jl_ptls_t ptls2 = jl_all_tls_states[i];


you aren't sync'd here with the other threads--it's invalid to write through jl_all_tls_states

vtjnash · 2019-07-22T15:23:44Z

src/gc.c

@@ -2622,8 +2602,7 @@ JL_DLLEXPORT int64_t jl_gc_total_bytes(void)
 jl_gc_num_t num = gc_num;
 combine_thread_gc_counts(&num);
 // Sync this logic with `base/util.jl:GC_Diff`


This allows it to expand naturally with the number of threads, giving much better scalability in GC-heavy workloads. This way gc_init doesn't need to know nthreads. fixes #32472

JeffBezanson added performance Must go faster domain:multithreading Base.Threads and related functionality GC Garbage collector labels Jul 19, 2019

vchuravy reviewed Jul 19, 2019

View reviewed changes

JeffBezanson force-pushed the jb/gcnthreads branch from dfdb627 to f4d6992 Compare July 21, 2019 20:54

vtjnash reviewed Jul 22, 2019

View reviewed changes

make collect interval thread-local only

83037dd

This allows it to expand naturally with the number of threads, giving much better scalability in GC-heavy workloads. This way gc_init doesn't need to know nthreads. fixes #32472

JeffBezanson force-pushed the jb/gcnthreads branch from e9ba28d to 83037dd Compare July 22, 2019 16:23

JeffBezanson merged commit 684973e into master Jul 22, 2019

JeffBezanson deleted the jb/gcnthreads branch July 22, 2019 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix use of jl_n_threads in gc; scale default collect interval #32633

fix use of jl_n_threads in gc; scale default collect interval #32633

JeffBezanson commented Jul 19, 2019 •

edited

Loading

vchuravy Jul 19, 2019

JeffBezanson Jul 20, 2019

chethega commented Jul 21, 2019

vtjnash commented Jul 21, 2019

JeffBezanson commented Jul 21, 2019

vtjnash Jul 22, 2019

vtjnash Jul 22, 2019

fix use of jl_n_threads in gc; scale default collect interval #32633

fix use of jl_n_threads in gc; scale default collect interval #32633

Conversation

JeffBezanson commented Jul 19, 2019 • edited Loading

vchuravy Jul 19, 2019

Choose a reason for hiding this comment

JeffBezanson Jul 20, 2019

Choose a reason for hiding this comment

chethega commented Jul 21, 2019

vtjnash commented Jul 21, 2019

JeffBezanson commented Jul 21, 2019

vtjnash Jul 22, 2019

Choose a reason for hiding this comment

vtjnash Jul 22, 2019

Choose a reason for hiding this comment

JeffBezanson commented Jul 19, 2019 •

edited

Loading