rustdoc: Poor CPU usage on Windows during thousands of doctests #63638

alexcrichton · 2019-08-16T15:52:31Z

When running the doctests for the core crate on my 28-core system the system unfortunately never really gets above 40% cpu usage, yet the doctests being compiled are embarassingly parallel and should result in full usage of the cores available to execute.

For now this can be reproduced with ./x.py test src/libcore --doc and watching CPU usage via python src/ci/cpu-usage-over-time.py

One suspicion I have for this is that rustdoc has always use in-process compilation for all the doctests, but there may be an excessive amount of synchronization in LLVM or rustc which prevents all CPU cores from being used. I'm not sure if this is the cause though.

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2019-08-16T19:57:57Z

I've explored replacing the in-proceess rustc functionality for doctests with writing out doctests and spawning rustc (though not fully fleshed out, e.g. linker arguments and such are not passed down). With my patch, and cpu-usage-over-time reports ~10% idle; libcore doctests finish in 80 seconds. Without, cpu-usage reports ~40% idle, and libcore doctests complete in 140 seconds.

This seems like a viable strategy (nearly 2x improvement in runtime on 8 core/16 thread machine). I can work on fleshing out the patch if we think this is a viable approach, initial work is up here: Mark-Simulacrum@d8fee19.

I also tried to do a little investigating into where the slowdown is, but wasn't able to conclude much. It does seem like we're seeing a lot of threads blocked on acquiring jobserver tokens, and there are seemingly dozens of threads being spawned per second, and there seems to be an abnormally high futex syscall count (based on strace -fc) but no firm conclusions that I could reach. It is perhaps worth noting that increasing jobserver tokens (e.g., via -j128) does lead to more simultaneous attempts to compile rustc but no similar gains in performance as with out-of-process rustc.

andjo403 · 2019-08-16T22:32:39Z

something that I noticed when I was looing in to this two years ago was that if I decreased the RUST_TEST_THREADS to 6 for my 32 thread machine I got a 2x runtime improvement.

at that time (before measureme) I created som chrome trace files for this, links to the log files can be found on the https://andjo403.github.io/rs_tracing/ page (I can only open the log files in chrome).
I think that it is interesting to see that alot of threads was created for 42 tests at least 547 threads was created. but also the difference between 6 and 32 test threads where eg. expansion avg duration of 199 vs 924 ms

but maybe is better to make new traces with measureme if it is possible to do in rustdoc or maybe this type of traces do not giva any actionable information.

andjo403 · 2019-08-18T20:51:16Z

created some new traces with hacking the rustc self-profile flag to get it working in rustdoc you can find the traces in https://andjo403.github.io/rustdoc_selfprofile/
tested rustdoc from master(e910be8) and also tested to add the rustc out of process fix that Mark-Simulacrum linked.

the out of process looks super nice I got way more than 2x (5,4s to 0,7s) in my small tests with my 32 threads cpu. and 100% cpu usage.

with the out of process fix the part of the compilation that varies the most depending on how many concurrent test that is running is the "running_linker" part.

also tried to add compiler.arg("-Ccodegen-units=1"); that gave me 4% faster builds.

with the out of process solution shall rustdoc start a jobbserver as it before was only one process it was ok the session created an jobbserver with 32 tokens but now that will be done for each test (rustc call).
it is a bit strange to have bout the jobbservers in rustc and the concurrency from libtest maybe the libtest shall use the jobbserver also.

alexcrichton · 2019-08-19T13:31:19Z

Nice work @Mark-Simulacrum and thanks for helping to confirm @andjo403!

Sounds like an out-of-process solution would fix the lion's share of issues here perhaps, and so long as it jumps back up to close to 100% cpu usage I think that's safe to say it'd close this issue!

It's a good point that the jobserver and/or limiting parallelism isn't done very well today. I think though that clamping the codegen units to 1 is probably a good idea since doctests are almost always very small and don't really benefit from parallelism anyway. Additionally clamping to one codegen unit should avoid the need to deal with the jobserver for now since parallelism only comes from codegen units. Eventually with a parallel rustc we'll need to handle the jobserver, but it's safe to postpone that!

Mark-Simulacrum · 2019-08-19T17:54:18Z

I've marked this issue as E-mentor since I think I can provide sufficient guidance for someone interested in solving this (I probably won't get around to it for some time). What we want to do here is take the patch in Mark-Simulacrum@d8fee19 and finish threading through all of the options to rustc via command line arguments. There's quite a few, but I largely suspect this should be a long but relatively simple task. We'll probably discover a regression here or there once we land this for esoteric configurations where we don't quite match up with argument parsing and what we were passing in directly, but we can deal with that as they arise I suspect.

Once the options are threaded through I think that patch will be mostly done -- we'll want to perhaps land the codegen units change in a separate patch.

One outstanding question that will need some thought is identifying the rustc binary to run; right now the patch looks at std::env::current_exe() for a rustc binary in the same directory, but that might not be tenable to land; I couldn't come up with any good alternatives though. Maybe @alexcrichton has thoughts (I don't think we have a good solution in general for this, perhaps we could look at RUSTC in the environment, or take a command line argument, but both of those seem worse since they're not (as) backwards compatible).

Happy to provide more guidance here if it'd be helpful!

@andjo403 Would you perhaps be interested in taking this patch on? If not, no worries!

andjo403 · 2019-08-19T21:21:09Z

I can to try implement this.
@Mark-Simulacrum do you have an idea for how to thread trough all the options?
my fist taught was that as the rustdoc command line args is parsed by the rustc code maybe it is possible to filter out the rustc options from the args to rustdoc.
but as I do not know how to do that maybe change the run_test to take a rustdoc::config::options instead of all the parameters that it take now as it feels strange to try take some of the session parameters and convert to command line args.

Mark-Simulacrum · 2019-08-19T21:37:26Z

I would start off by just interpreting the session parameters (they should be close to 1-1 and mostly lossless); I think there's possibly potential for cleanup but I could see the config::options being more complicated to re-stringify into rustc args. Whatever works best though when you get into the weeds is fine by me though!

andjo403 · 2019-08-21T21:23:32Z

@alexcrichton @Mark-Simulacrum
have something that works now I think andjo403@9d35469
I copied the strings that was sent in to rustdoc for the codegen options, libs and external,
was the only way that I was able to make the codegen options work as before when new options is added.

some strange things that I found was the debugging options (-Z ) was not used in the generation of the executable, so have not added that now and the rustdoc linker option overwrites the linker codegen option (-C linker) so still do that.

andjo403 · 2019-08-21T21:27:45Z

and also maybe needs to print the stdout and stderr in some other way then eprintln!("{:?}", output);
the printout is I think hard to read
"/home/andjo403/rust/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--cfg" "stage2" "--cfg" "dox" "--cfg" "rustdoc" "--cfg" "doctest" "--sysroot" "/home/andjo403/rust/build/x86_64-unknown-linux-gnu/stage2" "--edition" "2018" "/tmp/rustdoctestQ2MzFE/input.rs" "-o" "/tmp/rustdoctestQ2MzFE/rust_out" Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "error[E0463]: can\'t find crate for rustdoc\n --> /tmp/rustdoctestQ2MzFE/input.rs:6:1\n |\n6 | extern crate rustdoc;\n | ^^^^^^^^^^^^^^^^^^^^^ can\'t find crate\n\nerror: aborting due to previous error\n\nFor more information about this error, try rustc --explain E0463.\n" } Couldn't compile the test.

andjo403 · 2019-08-22T20:31:52Z

now all the test in src/test/rustdoc passes for andjo403@2661f79

Mark-Simulacrum · 2019-08-22T23:45:53Z

Could you post a PR with your changes? That'll make it easier to keep feedback central and such.

RalfJung · 2019-08-25T18:36:32Z

Out-of-process rustdoc would also help a lot with rust-lang/miri#584. Would be great if a design could be picked that lets the environment override the compiler used for building the test, so that we can run them in Miri instead. :)

@Mark-Simulacrum

rustdoc use -Ccodegen-units=1 by default for test compile as the test is small we do not want split up in multiple codegen units and also as there is multiple test running at the same time this will reduce the number of concurrent threads tested the test time with `./x.py test src/libcore --doc` for my 16 core 32 thread cpu i get about 6% faster execution and my 2 core 4 thread cpu I get about 10% faster execution cc #63638 r? @Mark-Simulacrum

@Mark-Simulacrum

…lacrum rustdoc use -Ccodegen-units=1 by default for test compile as the test is small we do not want split up in multiple codegen units and also as there is multiple test running at the same time this will reduce the number of concurrent threads tested the test time with `./x.py test src/libcore --doc` for my 16 core 32 thread cpu i get about 6% faster execution and my 2 core 4 thread cpu I get about 10% faster execution cc rust-lang#63638 r? @Mark-Simulacrum

andjo403 · 2019-09-01T07:08:25Z

maybe shall open this again as the windows version still do not use 100% cpu and execution time is the same or a small regression.

@Mark-Simulacrum

…lacrum rustdoc use -Ccodegen-units=1 by default for test compile as the test is small we do not want split up in multiple codegen units and also as there is multiple test running at the same time this will reduce the number of concurrent threads tested the test time with `./x.py test src/libcore --doc` for my 16 core 32 thread cpu i get about 6% faster execution and my 2 core 4 thread cpu I get about 10% faster execution cc rust-lang#63638 r? @Mark-Simulacrum

Mark-Simulacrum · 2019-09-01T13:26:39Z

I suspect that windows is blocked on the emitter lock, but don't know. I don't personally think it's too worthwhile to reopen this unless we can get similar stats as with Linux (e.g., is it using 90%+ on Windows? Far less?)

andjo403 · 2019-09-01T14:12:57Z

when I use the src/ci/cpu-usage-over-time.py script to measure the cpu usage for windows with my 16 core cpu the most of the values of the usage is between 40-50 %. both before and after the out of process rustc PR. so think it is as bade as it was for Linux.

Something that I was thinking of is that it creates a new dir for every test is that not slow to do that on windows? maybe change the out files to have other names instead of using dirs.

Mark-Simulacrum · 2019-09-01T14:33:48Z

Interesting. We can reopen in that case, though I wouldn't personally know where to or how to look at performance bottlenecks on Windows.

alexcrichton added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Aug 16, 2019

jonas-schievink added C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. labels Aug 18, 2019

Mark-Simulacrum added E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Aug 19, 2019

andjo403 mentioned this issue Aug 22, 2019

RustDoc: remove the redundant unstable option "--linker" #63816

Closed

andjo403 mentioned this issue Aug 23, 2019

Run doctests via out-of-process rustc #63827

Merged

bors closed this as completed in 0f41401 Aug 30, 2019

andjo403 mentioned this issue Aug 31, 2019

rustdoc use -Ccodegen-units=1 by default for test compile #64032

Merged

Mark-Simulacrum reopened this Sep 1, 2019

Mark-Simulacrum added O-windows Operating system: Windows and removed E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Sep 1, 2019

jyn514 changed the title ~~rustdoc: Poor CPU usage during thousands of doctests~~ rustdoc: Poor CPU usage on Windows during thousands of doctests Dec 15, 2020

camelid added the A-doctests Area: Documentation tests, run by rustdoc label Feb 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustdoc: Poor CPU usage on Windows during thousands of doctests #63638

rustdoc: Poor CPU usage on Windows during thousands of doctests #63638

alexcrichton commented Aug 16, 2019

Mark-Simulacrum commented Aug 16, 2019

andjo403 commented Aug 16, 2019

andjo403 commented Aug 18, 2019

alexcrichton commented Aug 19, 2019

Mark-Simulacrum commented Aug 19, 2019

andjo403 commented Aug 19, 2019

Mark-Simulacrum commented Aug 19, 2019

andjo403 commented Aug 21, 2019

andjo403 commented Aug 21, 2019

andjo403 commented Aug 22, 2019

Mark-Simulacrum commented Aug 22, 2019

RalfJung commented Aug 25, 2019

andjo403 commented Sep 1, 2019

Mark-Simulacrum commented Sep 1, 2019

andjo403 commented Sep 1, 2019

Mark-Simulacrum commented Sep 1, 2019

rustdoc: Poor CPU usage on Windows during thousands of doctests #63638

rustdoc: Poor CPU usage on Windows during thousands of doctests #63638

Comments

alexcrichton commented Aug 16, 2019

Mark-Simulacrum commented Aug 16, 2019

andjo403 commented Aug 16, 2019

andjo403 commented Aug 18, 2019

alexcrichton commented Aug 19, 2019

Mark-Simulacrum commented Aug 19, 2019

andjo403 commented Aug 19, 2019

Mark-Simulacrum commented Aug 19, 2019

andjo403 commented Aug 21, 2019

andjo403 commented Aug 21, 2019

andjo403 commented Aug 22, 2019

Mark-Simulacrum commented Aug 22, 2019

RalfJung commented Aug 25, 2019

andjo403 commented Sep 1, 2019

Mark-Simulacrum commented Sep 1, 2019

andjo403 commented Sep 1, 2019

Mark-Simulacrum commented Sep 1, 2019