Skip to content

Commit

Permalink
Implement parallel marking (#48600)
Browse files Browse the repository at this point in the history
Using a work-stealing queue after Chase and Lev, optimized for
weak memory models by Le et al.

Default number of GC threads is half the number of compute threads.

Co-authored-by: Gabriel Baraldi <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
  • Loading branch information
3 people committed Apr 28, 2023
1 parent 4158640 commit abeecee
Show file tree
Hide file tree
Showing 30 changed files with 722 additions and 156 deletions.
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,15 @@ Language changes

Compiler/Runtime improvements
-----------------------------

* The `@pure` macro is now deprecated. Use `Base.@assume_effects :foldable` instead ([#48682]).
* The mark phase of the Garbage Collector is now multi-threaded ([#48600]).

Command-line option changes
---------------------------

* New option `--gcthreads` to set how many threads will be used by the Garbage Collector ([#48600]).
The default is set to `N/2` where `N` is the amount of worker threads (`--threads`) used by Julia.

Multi-threading changes
-----------------------
Expand Down
1 change: 1 addition & 0 deletions base/options.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ struct JLOptions
cpu_target::Ptr{UInt8}
nthreadpools::Int16
nthreads::Int16
ngcthreads::Int16
nthreads_per_pool::Ptr{Int16}
nprocs::Int32
machine_file::Ptr{UInt8}
Expand Down
7 changes: 7 additions & 0 deletions base/threadingconstructs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,13 @@ function threadpooltids(pool::Symbol)
end
end

"""
Threads.ngcthreads() -> Int
Returns the number of GC threads currently configured.
"""
ngcthreads() = Int(unsafe_load(cglobal(:jl_n_gcthreads, Cint))) + 1

function threading_run(fun, static)
ccall(:jl_enter_threaded_region, Cvoid, ())
n = threadpoolsize()
Expand Down
5 changes: 5 additions & 0 deletions doc/man/julia.1
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,11 @@ supported (Linux and Windows). If this is not supported (macOS) or
process affinity is not configured, it uses the number of CPU
threads.

.TP
--gcthreads <n>
Enable n GC threads; If unspecified is set to half of the
compute worker threads.

.TP
-p, --procs {N|auto}
Integer value N launches N additional local worker processes `auto` launches as many workers
Expand Down
1 change: 1 addition & 0 deletions doc/src/base/multi-threading.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Base.Threads.nthreads
Base.Threads.threadpool
Base.Threads.nthreadpools
Base.Threads.threadpoolsize
Base.Threads.ngcthreads
```

See also [Multi-Threading](@ref man-multithreading).
Expand Down
1 change: 1 addition & 0 deletions doc/src/manual/command-line-interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ The following is a complete list of command-line switches available when launchi
|`-E`, `--print <expr>` |Evaluate `<expr>` and display the result|
|`-L`, `--load <file>` |Load `<file>` immediately on all processors|
|`-t`, `--threads {N\|auto}` |Enable N threads; `auto` tries to infer a useful default number of threads to use but the exact behavior might change in the future. Currently, `auto` uses the number of CPUs assigned to this julia process based on the OS-specific affinity assignment interface, if supported (Linux and Windows). If this is not supported (macOS) or process affinity is not configured, it uses the number of CPU threads.|
| `--gcthreads {N}` |Enable N GC threads; If unspecified is set to half of the compute worker threads.|
|`-p`, `--procs {N\|auto}` |Integer value N launches N additional local worker processes; `auto` launches as many workers as the number of local CPU threads (logical cores)|
|`--machine-file <file>` |Run processes on hosts listed in `<file>`|
|`-i` |Interactive mode; REPL runs and `isinteractive()` is true|
Expand Down
8 changes: 8 additions & 0 deletions doc/src/manual/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,14 @@ then spinning threads never sleep. Otherwise, `$JULIA_THREAD_SLEEP_THRESHOLD` is
interpreted as an unsigned 64-bit integer (`uint64_t`) and gives, in
nanoseconds, the amount of time after which spinning threads should sleep.

### [`JULIA_NUM_GC_THREADS`](@id env-gc-threads)

Sets the number of threads used by Garbage Collection. If unspecified is set to
half of the number of worker threads.

!!! compat "Julia 1.10"
The environment variable was added in 1.10

### [`JULIA_IMAGE_THREADS`](@id env-image-threads)

An unsigned 32-bit integer that sets the number of threads used by image
Expand Down
9 changes: 9 additions & 0 deletions doc/src/manual/multi-threading.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,15 @@ julia> Threads.threadid()
three processes have 2 threads enabled. For more fine grained control over worker
threads use [`addprocs`](@ref) and pass `-t`/`--threads` as `exeflags`.

### Multiple GC Threads

The Garbage Collector (GC) can use multiple threads. The amount used is either half the number
of compute worker threads or configured by either the `--gcthreads` command line argument or by using the
[`JULIA_NUM_GC_THREADS`](@ref env-gc-threads) environment variable.

!!! compat "Julia 1.10"
The `--gcthreads` command line argument requires at least Julia 1.10.

## [Threadpools](@id man-threadpools)

When a program's threads are busy with many tasks to run, tasks may experience
Expand Down
2 changes: 1 addition & 1 deletion src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ ifeq ($(USE_SYSTEM_LIBUV),0)
UV_HEADERS += uv.h
UV_HEADERS += uv/*.h
endif
PUBLIC_HEADERS := $(BUILDDIR)/julia_version.h $(wildcard $(SRCDIR)/support/*.h) $(addprefix $(SRCDIR)/,julia.h julia_assert.h julia_threads.h julia_fasttls.h julia_locks.h julia_atomics.h jloptions.h)
PUBLIC_HEADERS := $(BUILDDIR)/julia_version.h $(wildcard $(SRCDIR)/support/*.h) $(addprefix $(SRCDIR)/,work-stealing-queue.h julia.h julia_assert.h julia_threads.h julia_fasttls.h julia_locks.h julia_atomics.h jloptions.h)
ifeq ($(OS),WINNT)
PUBLIC_HEADERS += $(addprefix $(SRCDIR)/,win32_ucontext.h)
endif
Expand Down
64 changes: 30 additions & 34 deletions src/gc-debug.c
Original file line number Diff line number Diff line change
Expand Up @@ -198,12 +198,21 @@ static void restore(void)

static void gc_verify_track(jl_ptls_t ptls)
{
// `gc_verify_track` is limited to single-threaded GC
if (jl_n_gcthreads != 0)
return;
do {
jl_gc_markqueue_t mq;
mq.current = mq.start = ptls->mark_queue.start;
mq.end = ptls->mark_queue.end;
mq.current_chunk = mq.chunk_start = ptls->mark_queue.chunk_start;
mq.chunk_end = ptls->mark_queue.chunk_end;
jl_gc_markqueue_t *mq2 = &ptls->mark_queue;
ws_queue_t *cq = &mq.chunk_queue;
ws_queue_t *q = &mq.ptr_queue;
jl_atomic_store_relaxed(&cq->top, 0);
jl_atomic_store_relaxed(&cq->bottom, 0);
jl_atomic_store_relaxed(&cq->array, jl_atomic_load_relaxed(&mq2->chunk_queue.array));
jl_atomic_store_relaxed(&q->top, 0);
jl_atomic_store_relaxed(&q->bottom, 0);
jl_atomic_store_relaxed(&q->array, jl_atomic_load_relaxed(&mq2->ptr_queue.array));
arraylist_new(&mq.reclaim_set, 32);
arraylist_push(&lostval_parents_done, lostval);
jl_safe_printf("Now looking for %p =======\n", lostval);
clear_mark(GC_CLEAN);
Expand All @@ -214,7 +223,7 @@ static void gc_verify_track(jl_ptls_t ptls)
gc_mark_finlist(&mq, &ptls2->finalizers, 0);
}
gc_mark_finlist(&mq, &finalizer_list_marked, 0);
gc_mark_loop_(ptls, &mq);
gc_mark_loop_serial_(ptls, &mq);
if (lostval_parents.len == 0) {
jl_safe_printf("Could not find the missing link. We missed a toplevel root. This is odd.\n");
break;
Expand Down Expand Up @@ -248,11 +257,22 @@ static void gc_verify_track(jl_ptls_t ptls)

void gc_verify(jl_ptls_t ptls)
{
// `gc_verify` is limited to single-threaded GC
if (jl_n_gcthreads != 0) {
jl_safe_printf("Warn. GC verify disabled in multi-threaded GC\n");
return;
}
jl_gc_markqueue_t mq;
mq.current = mq.start = ptls->mark_queue.start;
mq.end = ptls->mark_queue.end;
mq.current_chunk = mq.chunk_start = ptls->mark_queue.chunk_start;
mq.chunk_end = ptls->mark_queue.chunk_end;
jl_gc_markqueue_t *mq2 = &ptls->mark_queue;
ws_queue_t *cq = &mq.chunk_queue;
ws_queue_t *q = &mq.ptr_queue;
jl_atomic_store_relaxed(&cq->top, 0);
jl_atomic_store_relaxed(&cq->bottom, 0);
jl_atomic_store_relaxed(&cq->array, jl_atomic_load_relaxed(&mq2->chunk_queue.array));
jl_atomic_store_relaxed(&q->top, 0);
jl_atomic_store_relaxed(&q->bottom, 0);
jl_atomic_store_relaxed(&q->array, jl_atomic_load_relaxed(&mq2->ptr_queue.array));
arraylist_new(&mq.reclaim_set, 32);
lostval = NULL;
lostval_parents.len = 0;
lostval_parents_done.len = 0;
Expand All @@ -265,7 +285,7 @@ void gc_verify(jl_ptls_t ptls)
gc_mark_finlist(&mq, &ptls2->finalizers, 0);
}
gc_mark_finlist(&mq, &finalizer_list_marked, 0);
gc_mark_loop_(ptls, &mq);
gc_mark_loop_serial_(ptls, &mq);
int clean_len = bits_save[GC_CLEAN].len;
for(int i = 0; i < clean_len + bits_save[GC_OLD].len; i++) {
jl_taggedvalue_t *v = (jl_taggedvalue_t*)bits_save[i >= clean_len ? GC_OLD : GC_CLEAN].items[i >= clean_len ? i - clean_len : i];
Expand Down Expand Up @@ -1268,30 +1288,6 @@ int gc_slot_to_arrayidx(void *obj, void *_slot) JL_NOTSAFEPOINT
return (slot - start) / elsize;
}

// Print a backtrace from the `mq->start` of the mark queue up to `mq->current`
// `offset` will be added to `mq->current` for convenience in the debugger.
NOINLINE void gc_mark_loop_unwind(jl_ptls_t ptls, jl_gc_markqueue_t *mq, int offset)
{
jl_jmp_buf *old_buf = jl_get_safe_restore();
jl_jmp_buf buf;
jl_set_safe_restore(&buf);
if (jl_setjmp(buf, 0) != 0) {
jl_safe_printf("\n!!! ERROR when unwinding gc mark loop -- ABORTING !!!\n");
jl_set_safe_restore(old_buf);
return;
}
jl_value_t **start = mq->start;
jl_value_t **end = mq->current + offset;
for (; start < end; start++) {
jl_value_t *obj = *start;
jl_taggedvalue_t *o = jl_astaggedvalue(obj);
jl_safe_printf("Queued object: %p :: (tag: %zu) (bits: %zu)\n", obj,
(uintptr_t)o->header, ((uintptr_t)o->header & 3));
jl_((void*)(jl_datatype_t *)(o->header & ~(uintptr_t)0xf));
}
jl_set_safe_restore(old_buf);
}

static int gc_logging_enabled = 0;

JL_DLLEXPORT void jl_enable_gc_logging(int enable) {
Expand Down
Loading

0 comments on commit abeecee

Please sign in to comment.