New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler: enable parallel codegen with MT #14748

Draft

ysbaddaden wants to merge 9 commits into crystal-lang:master from ysbaddaden:feature/compiler-mt-codegen-2

Contributor

ysbaddaden commented Jun 25, 2024 •

edited

Loading

Implements parallel codegen of object files when MT is enabled in the compiler (-Dpreview_mt).

It only impacts codegen for compilations with more than one compilation unit (module), that is when neither of --single-module, --release or --cross-compile is specified. This behavior is identical to the fork based codegen.

Advantages:

allows parallel codegen on Windows (untested);
no need to fork many processes;
no repeated GC collections in each forked processes;
a simple Channel to distribute work efficiently (no need for IPC).

The main points are increased portability and simpler logic, despite having to take care of LLVM thread safety quirks (see comments).

Issues:

The threads arg actually depicts the number of fibers, not threads, which is confusing and problematic: increasing threads but not CRYSTAL_WORKERS will lead to more fibers than threads, with fibers being sheduled on the same threads, which won't bring any improvement.

In fact CRYSTAL_WORKERS defaults to 4, when threads defaulted to 8. With this patch it defaults to CRYSTAL_WORKERS, so MT can end up being slower if we don't specify CRYSTAL_WORKERS=8.
This is still not as efficient as it could be. The main fiber (that feeds the worker fibers) can get blocked by a worker fiber doing codegen, leading the other workers to starve. This is easily noticeable when compiling with -O1 for example.

Both issues will be fixable with RFC 2 where we can start an explicit context to run the worker fibers or start N isolated contexts (maybe a better idea). Until then, one should increase CRYSTAL_WORKERS.

Supersedes #14227 and doesn't segfault (so far) with LLVM 12 or LLVM 18.1 🤞

TODO:

wait for Compiler: refactor codegen #14760
cleanup
rename the method as mt_parallel(units, n_threads)
figure out thread safety of LLVM legacy pass manager (it's thread unsafe 💥)
consider increasing the channel size (until we can use ExecutionContext)
consider a CRYSTAL_CONFIG_WORKERS to configure the default number of workers at compile time instead of the hardcoded 4 (in a distinct PR)

ysbaddaden added topic:multithreading topic:compiler:codegen labels

ysbaddaden self-assigned this

This was referenced Jun 25, 2024

Compiler: parallel codegen with MT #14227

Closed

Enqueue doesn't interrupt event loop run (race condition) ysbaddaden/execution_context#18

Closed

Member

straight-shoota commented Jun 25, 2024

This looks great. But it also seems to be a mix of different changes. Could we extract the independent refactorings (such as extracting sequential_codegen and fork_codegen, memoization of some methods) to their own PRs?

ysbaddaden mentioned this pull request

Compiler: refactor codegen #14760

Merged

ysbaddaden marked this pull request as draft

June 28, 2024 12:49

ysbaddaden added 6 commits

July 2, 2024 11:33


Compiler: extract sequential_codegen and fork_codegen methods

eb89b9b


Extract parallel_codegen method

eddb6a2


Merge codegen_many_units into codegen

dbf3f10


Stop collecting reused compilation units

b17db55

We only need to update unit.reused_compilation_unit? from forked process
then have print_codegen_stats count & filter the units.


Add LLVM::Module.parse(memory_buffer, context)

95a604a


Add Crystal::Compiler#mt_codegen

ac91f7c

When compiled with -Dpreview_mt the compiler will take advantage of the
MT environment to codegen the compilation units in parallel, avoiding
fork (that's not supported with MT) and allowing parallel codegen on
Windows.

ysbaddaden force-pushed the feature/compiler-mt-codegen-2 branch from 3a08d9e to ac91f7c Compare

July 2, 2024 11:26

Contributor Author

ysbaddaden commented Jul 2, 2024

Rebased on top of #14760.


fixup

7f8b821

Sija reviewed

View reviewed changes

src/compiler/crystal/compiler.cr Show resolved Hide resolved

src/compiler/crystal/compiler.cr

Comment on lines +550 to +559

+ # We generate the bitcode in the main thread because LLVM contexts
+ # must be unique per compilation unit, but we share different contexts
+ # across many modules (or rely on the global context); trying to
+ # codegen in parallel would segfault!
+ #
+ # Luckily generating the bitcode is quick and once the bitcode is
+ # generated we don't need the global LLVM contexts anymore but can
+ # parse the bitcode in an isolated context and we can parallelize the
+ # slowest part: the optimization pass & compiling the object file.
+ unit.generate_bitcode

Contributor

Sija Jul 2, 2024

Docs ❤️

Contributor Author

ysbaddaden Jul 2, 2024

Perfect example: behavior is so odd, document it, otherwise it will be incomprehensible.

src/compiler/crystal/compiler.cr Outdated Show resolved Hide resolved

src/compiler/crystal/compiler.cr Outdated Show resolved Hide resolved


Review suggestions from Sija

a4457a5

ysbaddaden commented

View reviewed changes

src/compiler/crystal/compiler.cr

+ private def compile_to_object
+ temporary_object_name = self.temporary_object_name
+ target_machine = compiler.create_target_machine
+ compiler.optimize llvm_mod, target_machine unless compiler.optimization_mode.o0?

Contributor Author

ysbaddaden Jul 2, 2024 •

edited

Loading

This line above may be what fixed the LLVM segfault compared to the previous attempt: the target machine is local to the compilation unit, since as found out by @ggiraldez in #14496 (comment) the target machine itself is thread unsafe (not just the llvm context).

ysbaddaden commented

View reviewed changes

src/compiler/crystal/compiler.cr Show resolved Hide resolved


Fix: thread safety of LLVM legacy pass manager

5303d59

straight-shoota pushed a commit that referenced this pull request


Compiler: refactor codegen (#14760)

b954dd7

Refactors `Crystal::Compiler`:

1. extracts `#sequential_codegen`, `#parallel_codegen` and `#fork_codegen` methods;
2. merges `#codegen_many_units` into `#codegen` directly;
3. stops collecting reused units: `#fork_codegen` now updates `CompilationUnit#reused_compilation_unit?` state as reported by the forked processes, and `#print_codegen_stats` now counts & filters the reused units.

Prerequisite for #14748 that will introduce `#mt_codegen`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment