Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc: Poor CPU usage on Windows during thousands of doctests #63638

Open
alexcrichton opened this issue Aug 16, 2019 · 16 comments · Fixed by #63827
Open

rustdoc: Poor CPU usage on Windows during thousands of doctests #63638

alexcrichton opened this issue Aug 16, 2019 · 16 comments · Fixed by #63827
Labels
A-doctests Area: Documentation tests, run by rustdoc C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. O-windows Operating system: Windows T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

When running the doctests for the core crate on my 28-core system the system unfortunately never really gets above 40% cpu usage, yet the doctests being compiled are embarassingly parallel and should result in full usage of the cores available to execute.

For now this can be reproduced with ./x.py test src/libcore --doc and watching CPU usage via python src/ci/cpu-usage-over-time.py

One suspicion I have for this is that rustdoc has always use in-process compilation for all the doctests, but there may be an excessive amount of synchronization in LLVM or rustc which prevents all CPU cores from being used. I'm not sure if this is the cause though.

@alexcrichton alexcrichton added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Aug 16, 2019
@Mark-Simulacrum
Copy link
Member

I've explored replacing the in-proceess rustc functionality for doctests with writing out doctests and spawning rustc (though not fully fleshed out, e.g. linker arguments and such are not passed down). With my patch, and cpu-usage-over-time reports ~10% idle; libcore doctests finish in 80 seconds. Without, cpu-usage reports ~40% idle, and libcore doctests complete in 140 seconds.

This seems like a viable strategy (nearly 2x improvement in runtime on 8 core/16 thread machine). I can work on fleshing out the patch if we think this is a viable approach, initial work is up here: Mark-Simulacrum@d8fee19.

I also tried to do a little investigating into where the slowdown is, but wasn't able to conclude much. It does seem like we're seeing a lot of threads blocked on acquiring jobserver tokens, and there are seemingly dozens of threads being spawned per second, and there seems to be an abnormally high futex syscall count (based on strace -fc) but no firm conclusions that I could reach. It is perhaps worth noting that increasing jobserver tokens (e.g., via -j128) does lead to more simultaneous attempts to compile rustc but no similar gains in performance as with out-of-process rustc.

@andjo403
Copy link
Contributor

something that I noticed when I was looing in to this two years ago was that if I decreased the RUST_TEST_THREADS to 6 for my 32 thread machine I got a 2x runtime improvement.

at that time (before measureme) I created som chrome trace files for this, links to the log files can be found on the https://andjo403.github.io/rs_tracing/ page (I can only open the log files in chrome).
I think that it is interesting to see that alot of threads was created for 42 tests at least 547 threads was created. but also the difference between 6 and 32 test threads where eg. expansion avg duration of 199 vs 924 ms

but maybe is better to make new traces with measureme if it is possible to do in rustdoc or maybe this type of traces do not giva any actionable information.

@andjo403
Copy link
Contributor

created some new traces with hacking the rustc self-profile flag to get it working in rustdoc you can find the traces in https://andjo403.github.io/rustdoc_selfprofile/
tested rustdoc from master(e910be8) and also tested to add the rustc out of process fix that Mark-Simulacrum linked.

the out of process looks super nice I got way more than 2x (5,4s to 0,7s) in my small tests with my 32 threads cpu. and 100% cpu usage.

with the out of process fix the part of the compilation that varies the most depending on how many concurrent test that is running is the "running_linker" part.

also tried to add compiler.arg("-Ccodegen-units=1"); that gave me 4% faster builds.

with the out of process solution shall rustdoc start a jobbserver as it before was only one process it was ok the session created an jobbserver with 32 tokens but now that will be done for each test (rustc call).
it is a bit strange to have bout the jobbservers in rustc and the concurrency from libtest maybe the libtest shall use the jobbserver also.

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. labels Aug 18, 2019
@alexcrichton
Copy link
Member Author

Nice work @Mark-Simulacrum and thanks for helping to confirm @andjo403!

Sounds like an out-of-process solution would fix the lion's share of issues here perhaps, and so long as it jumps back up to close to 100% cpu usage I think that's safe to say it'd close this issue!

It's a good point that the jobserver and/or limiting parallelism isn't done very well today. I think though that clamping the codegen units to 1 is probably a good idea since doctests are almost always very small and don't really benefit from parallelism anyway. Additionally clamping to one codegen unit should avoid the need to deal with the jobserver for now since parallelism only comes from codegen units. Eventually with a parallel rustc we'll need to handle the jobserver, but it's safe to postpone that!

@Mark-Simulacrum Mark-Simulacrum added E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Aug 19, 2019
@Mark-Simulacrum
Copy link
Member

I've marked this issue as E-mentor since I think I can provide sufficient guidance for someone interested in solving this (I probably won't get around to it for some time). What we want to do here is take the patch in Mark-Simulacrum@d8fee19 and finish threading through all of the options to rustc via command line arguments. There's quite a few, but I largely suspect this should be a long but relatively simple task. We'll probably discover a regression here or there once we land this for esoteric configurations where we don't quite match up with argument parsing and what we were passing in directly, but we can deal with that as they arise I suspect.

Once the options are threaded through I think that patch will be mostly done -- we'll want to perhaps land the codegen units change in a separate patch.

One outstanding question that will need some thought is identifying the rustc binary to run; right now the patch looks at std::env::current_exe() for a rustc binary in the same directory, but that might not be tenable to land; I couldn't come up with any good alternatives though. Maybe @alexcrichton has thoughts (I don't think we have a good solution in general for this, perhaps we could look at RUSTC in the environment, or take a command line argument, but both of those seem worse since they're not (as) backwards compatible).

Happy to provide more guidance here if it'd be helpful!

@andjo403 Would you perhaps be interested in taking this patch on? If not, no worries!

@andjo403
Copy link
Contributor

I can to try implement this.
@Mark-Simulacrum do you have an idea for how to thread trough all the options?
my fist taught was that as the rustdoc command line args is parsed by the rustc code maybe it is possible to filter out the rustc options from the args to rustdoc.
but as I do not know how to do that maybe change the run_test to take a rustdoc::config::options instead of all the parameters that it take now as it feels strange to try take some of the session parameters and convert to command line args.

@Mark-Simulacrum
Copy link
Member

I would start off by just interpreting the session parameters (they should be close to 1-1 and mostly lossless); I think there's possibly potential for cleanup but I could see the config::options being more complicated to re-stringify into rustc args. Whatever works best though when you get into the weeds is fine by me though!

@andjo403
Copy link
Contributor

@alexcrichton @Mark-Simulacrum
have something that works now I think andjo403@9d35469
I copied the strings that was sent in to rustdoc for the codegen options, libs and external,
was the only way that I was able to make the codegen options work as before when new options is added.

some strange things that I found was the debugging options (-Z ) was not used in the generation of the executable, so have not added that now and the rustdoc linker option overwrites the linker codegen option (-C linker) so still do that.

@andjo403
Copy link
Contributor

and also maybe needs to print the stdout and stderr in some other way then eprintln!("{:?}", output);
the printout is I think hard to read
"/home/andjo403/rust/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--cfg" "stage2" "--cfg" "dox" "--cfg" "rustdoc" "--cfg" "doctest" "--sysroot" "/home/andjo403/rust/build/x86_64-unknown-linux-gnu/stage2" "--edition" "2018" "/tmp/rustdoctestQ2MzFE/input.rs" "-o" "/tmp/rustdoctestQ2MzFE/rust_out" Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "error[E0463]: can\'t find crate for rustdoc\n --> /tmp/rustdoctestQ2MzFE/input.rs:6:1\n |\n6 | extern crate rustdoc;\n | ^^^^^^^^^^^^^^^^^^^^^ can\'t find crate\n\nerror: aborting due to previous error\n\nFor more information about this error, try rustc --explain E0463.\n" } Couldn't compile the test.

@andjo403
Copy link
Contributor

now all the test in src/test/rustdoc passes for andjo403@2661f79

@Mark-Simulacrum
Copy link
Member

Could you post a PR with your changes? That'll make it easier to keep feedback central and such.

@RalfJung
Copy link
Member

Out-of-process rustdoc would also help a lot with rust-lang/miri#584. Would be great if a design could be picked that lets the environment override the compiler used for building the test, so that we can run them in Miri instead. :)

@bors bors closed this as completed in 0f41401 Aug 30, 2019
bors added a commit that referenced this issue Aug 31, 2019
rustdoc use -Ccodegen-units=1 by default for test compile

as the test is small we do not want split up in multiple codegen units
and also as there is multiple test running at the same time this
will reduce the number of concurrent threads

tested the test time with `./x.py test src/libcore --doc`
for my 16 core 32 thread cpu i get about 6% faster execution
and my 2 core 4 thread cpu I get about 10% faster execution

cc #63638
 r? @Mark-Simulacrum
Centril added a commit to Centril/rust that referenced this issue Sep 1, 2019
…lacrum

rustdoc use -Ccodegen-units=1 by default for test compile

as the test is small we do not want split up in multiple codegen units
and also as there is multiple test running at the same time this
will reduce the number of concurrent threads

tested the test time with `./x.py test src/libcore --doc`
for my 16 core 32 thread cpu i get about 6% faster execution
and my 2 core 4 thread cpu I get about 10% faster execution

cc rust-lang#63638
 r? @Mark-Simulacrum
@andjo403
Copy link
Contributor

andjo403 commented Sep 1, 2019

maybe shall open this again as the windows version still do not use 100% cpu and execution time is the same or a small regression.

Centril added a commit to Centril/rust that referenced this issue Sep 1, 2019
…lacrum

rustdoc use -Ccodegen-units=1 by default for test compile

as the test is small we do not want split up in multiple codegen units
and also as there is multiple test running at the same time this
will reduce the number of concurrent threads

tested the test time with `./x.py test src/libcore --doc`
for my 16 core 32 thread cpu i get about 6% faster execution
and my 2 core 4 thread cpu I get about 10% faster execution

cc rust-lang#63638
 r? @Mark-Simulacrum
@Mark-Simulacrum
Copy link
Member

I suspect that windows is blocked on the emitter lock, but don't know. I don't personally think it's too worthwhile to reopen this unless we can get similar stats as with Linux (e.g., is it using 90%+ on Windows? Far less?)

@andjo403
Copy link
Contributor

andjo403 commented Sep 1, 2019

when I use the src/ci/cpu-usage-over-time.py script to measure the cpu usage for windows with my 16 core cpu the most of the values of the usage is between 40-50 %. both before and after the out of process rustc PR. so think it is as bade as it was for Linux.

Something that I was thinking of is that it creates a new dir for every test is that not slow to do that on windows? maybe change the out files to have other names instead of using dirs.

@Mark-Simulacrum
Copy link
Member

Interesting. We can reopen in that case, though I wouldn't personally know where to or how to look at performance bottlenecks on Windows.

@Mark-Simulacrum Mark-Simulacrum added O-windows Operating system: Windows and removed E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Sep 1, 2019
@jyn514 jyn514 changed the title rustdoc: Poor CPU usage during thousands of doctests rustdoc: Poor CPU usage on Windows during thousands of doctests Dec 15, 2020
@camelid camelid added the A-doctests Area: Documentation tests, run by rustdoc label Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-doctests Area: Documentation tests, run by rustdoc C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. O-windows Operating system: Windows T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

6 participants