compile_commands.json contains entries for headers! And it takes a little while to generate! ~20s on the Hedron main repo at the time of writing. Could we speed it up? #5

cpsauer · 2021-11-17T07:15:23Z

Status: The slowness is from having clang preprocess every source file in the whole project to figure out which headers it uses (in refresh.template.py). We'll need to do this for clangd until it does this for us in its index. Clangd issue. Once this is fixed, things should be much faster and we should consider regenerating automatically on BUILD file save. Without the preprocessing, refresh.py only takes 8s or so, single threaded.

Suggested Workaround: You might be able to refresh compile_commands.json slightly less often, making the current runtime okay. If you're adding files, clangd should make pretty decent guesses at completions, using commands from nearby files. And if you're deleting files, there's not a problem. So you may not need to rerun refresh.py on every change to BUILD files. Instead, maybe refresh becomes something you run every so often when you can spare the time, making the current runtime okay.

If that doesn't work for you, consider trying to only index the part of the codebase you care about using the refresh_compile_commands rule (see README). Usually the runtime is only a real problem on huge repos--and there you're probably only editing a small subset of the codebase.

Finally, there are now options to exclude indexing for external workspaces. See docs at the top of the refesh_compile_command.

Otherwise: If you'd like to explore ways it could be made faster or help, read on!

The text was updated successfully, but these errors were encountered:

cpsauer · 2022-02-06T08:27:37Z

See also #24 for more discussion about the same root issue.

matta · 2022-02-07T02:55:55Z

I saw #24 and it triggered some memories.

Bazel requires headers to be declared with a hdrs attribute, so in theory every header, if it is "#includable" at all, appears explicitly in the build graph. I believe that this has been strictly enforced for about seven years or so (e.g. https://groups.google.com/g/bazel-discuss/c/LzlzSuqxhFU).

The discussion on clangd/clangd#123 about the clangd heuristic is in the context of clangd working as best it can regardless of the build system used. Most build systems work at the include path level, and don't require individual header files to be declared to the build system. Some sort of "auto discovery" certainly makes sense in that context.

I'm not sure why bazel-compile-commands-extractor must resort to preprocessing source code, given the strictness of the Bazel build. I would think it at least theoretically possible to discover plausible compile commands for all headers in the build using the Bazel build graph alone.

Clangd can improve its heuristics, but it still won't get it right in all cases. It seems like inferring "correctly representative" compile commands for header files, using explicitly declared information in the Bazel build graph, would I imagine still be worth doing because it will be more likely to be correct.

Sometimes with these things, the devil is in the details. It is possible to express some distinctly non-standard builds with Bazel, which can break Bazel's "strict headers checking" logic, etc. So perhaps the heuristics clangd uses are still relevant. I am curious to understand why preprocessing is currently preferable to processing the build graph, though.

P.S. the bazel-discuss post I link above hints that Bazel doesn't do any "include scanning" itself. I believe it was at least always the team's intent to never ship "include scanning" outside Google, because it was considered a design mistake that in practice produced unsound builds. Internally, Google's Blaze did (and may still do) "include scanning" in some cases. But, perhaps relevant here, this was never full blown preprocessing. Blaze had/has a very dummed down scanner that looks for things like #include "blarg/blah.h" and includes the blarg/blah.h file in the build if it happens to be available.

cpsauer · 2022-02-07T07:20:43Z

Thanks for seeing what's new and chipping in, @matta :)

If you have any ideas on how to get 'em out of bazel faster, I'm definitely excited about it!

Some history, if useful: In our previous, extra_action approach (more in #24 and ImplementationReadme) bazel did list the headers for us, but that was after a build, much more slowly, and with a variety of other problems. Maybe it, too, was parsing a dependency (.d) file outputted from the build, as described in the discussion you linked, and like we're doing currently. When we switched to the current, much better aquery-based approach, I was bummed to not see the headers directly listed in the inputs field. But maybe they're findable by traversing the "middlemen." (I am, however, seeing that at least the system SDK headers aren't directly listed anywhere). Any chance you (or anyone reading) know what's going on with middlemen enough to navigate with confidence?

If curious about the browsing the middlemen, try
bazel aquery "mnemonic('(Objc|Cpp)Compile|Middleman',deps(<your target>))" | less

Just to steer on a few things and to try to answer some questions:
(1) I think Clangd can definitely nail finding included headers as well as anyone. It's got the benefit of clang's C++ compiler frontend and is already running the full header inclusion search process as (a small) part of helping you edit. That is, clangd, like clang, can determine exactly which headers are used by a given source file given the compile command. No heuristics needed; they're able to directly invoke the relevant part of the compiler to get an authoritative answer. IMO doing this proactively instead of reactively is a pretty natural build-in for them, given they've already got that ability, plus background indexing and caching--and a need for this across build systems.
(2) Until then, we've been directly invoking clang/gcc to run an equivalently authoritative process up front, invoking header search and dependency generation but stopping after preprocessing, forgoing compilation. (see --dependencies in refresh.template.py)
(3) I'm confident that Bazel knows somewhere which headers (and directories) could be included. But what we'd really like is to know which ones are actually included, to keep the compile_commands specific and small. It seems to me like we need the compiler[preprocessor] to do this correctly, easily, and robustly. Now, maybe we should be willing to give up this specificity for speed. That is just quickly emit entries for all headers a file could include regardless of whether it does, based on Bazel's understanding. We'd want to be careful that we don't miss something! But I think Bazel is less likely to be easily correct here than clang(d), rather than more.

Thoughts? Things I might be missing?

Still super eager to hear about your experience using this with Emacs! But that's a separate issue, I suppose.

matta · 2022-02-07T19:25:32Z

Including the system headers, and other headers from installed source (not under bazel management) is an interesting sub-problem. So is the situation where one header is compiled in potentially multiple ways in one build.

Unfortunately, my Bazel experience predates aquery and cquery (there was just "query"). I don't even know what a Bazel aspect is, really. Perhaps best to ignore me? :-) I'm vaguely aware that a lot of the mobile build stuff has made the logic much more complicated than it used to be (e.g. a target can change how its dependencies are built). Frankly my head kinda explodes whenever I have to think about it.

I do think it would be nice if Bazel was a kind of "source of truth" at finding the "best" .cc file to pair with an .h file for the purposes of compile_commands.json. In my build's case bazel-compile-commands-extractor usually picks a reasonable "nearby" .cc file, but sometimes it doesn't. I'll file an issue about the case I'm seeing.

P.S. Emacs isn't really an interesting special case here. It is using clangd much like VS Code would. It just needs a bit more explicit setup to get a "project" defined and working, instead of the click-to-install-clangd-and-auto-configure approach that VS Code takes.

cpsauer · 2022-02-08T05:24:34Z

No, no. Good stuff--and thanks for thinking with me. Turns out that aquery stuff is pretty handy, though! Enables this plugin. And unfortunately, the multiply included header and system headers cases are more the common case than the unusual.

[I'll give a better reply back to your separate issue. But briefly, it's just picking an (arbitrary) source file that includes the header (possible transitively) during compilation, so it can get a valid compile command that runs on that header.]

And on the P.S.: Great to hear emacs isn't too different. Easy enough that people should just be able to figure it out? You're our trailblazer here :) So if there are tips people should know, I would so love it if you'd add it to the README. Even just a note saying "Emacs: It works!"

cpsauer · 2022-04-07T02:16:45Z

I thought of another way we could potentially speed this up if people needed it--at least on macOS/Linux:

We could check to see if Bazel has cached .d files listing the dependencies and use those instead of preoprocessing everything to regenerate them.

If all the files listed in the .d file have older last-modified dates than the .d file itself, this should be safe. We'd want to check that bazel isn't 0-timestamping generated files, though.

We could also write .d files when needed, saving work between runs.

Maybe there's a good way of doing the equivalent on Windows, too, maybe using /sourceDependencies

parallel to #5

cpsauer · 2022-04-07T02:24:18Z

I've added this to the other ideas in the OPTIMNOTE section of https://github.com/hedronvision/bazel-compile-commands-extractor/blob/main/refresh.template.py

cpsauer · 2022-05-20T03:50:12Z

^ If someone is reads this an is interested in helping speed this up, I'd love a PR taking a crack at the caching described above #5 (comment).

cpsauer · 2022-06-02T04:34:26Z

Closing this down because I just landed code that makes header finding way faster. We now have caching for header file finding and for clang/gcc, piggyback on Bazel's dependency cache--in addition to the options to control header finding merged previously!

That should be a huge improvement in speed. Don't think this can be much faster, and it's now quite good.

(Though if you're a windows user reading this: We might be able to make piggyback Bazel's cache with MSVC too. See #49 (comment) if you're interested in helping.)

…d faster

wolfd · 2024-04-29T18:08:52Z

For what it's worth, I tried the aquery method that was suggested in this thread and it works great for us at Skydio. We open-sourced it, and I certainly wouldn't mind if it served as inspiration to this project.

https://github.com/danny-skydio/bazel-compile-commands

parallel to hedronvision/bazel-compile-commands-extractor#5

cpsauer mentioned this issue Feb 6, 2022

Improve compile_commands refresh time. #24

Closed

cpsauer added a commit that referenced this issue Feb 8, 2022

Notes on getting headers via aquery from #5. Close but no cigar.

066b29f

cpsauer added a commit that referenced this issue Apr 7, 2022

Add additional OPTIMNOTE

488dda3

parallel to #5

cpsauer mentioned this issue Apr 25, 2022

Feature Request / Discussion - Allow multiple compile_commands.json / settings the target path to the generated compile_commands.json #48

Closed

cpsauer mentioned this issue May 26, 2022

Why it is compiling my code? And generation time is very long. #53

Closed

cpsauer closed this as completed Jun 2, 2022

cpsauer added a commit that referenced this issue Jun 2, 2022

Remove reference to #5, now closed, since header finding is cached an…

c7c5142

…d faster

tiponada4l added a commit to tiponada4l/bazel-compile-commands-extractor that referenced this issue Aug 10, 2024

Add additional OPTIMNOTE

b976c32

parallel to hedronvision/bazel-compile-commands-extractor#5

Trabochuk added a commit to Trabochuk/bazel-compile-commands-extractor that referenced this issue Aug 10, 2024

Add additional OPTIMNOTE

712d699

parallel to hedronvision/bazel-compile-commands-extractor#5

ngiloq6 added a commit to ngiloq6/bazel-compile-commands-extractor that referenced this issue Aug 17, 2024

Add additional OPTIMNOTE

26adf1f

parallel to hedronvision/bazel-compile-commands-extractor#5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compile_commands.json contains entries for headers! And it takes a little while to generate! ~20s on the Hedron main repo at the time of writing. Could we speed it up? #5

compile_commands.json contains entries for headers! And it takes a little while to generate! ~20s on the Hedron main repo at the time of writing. Could we speed it up? #5

cpsauer commented Nov 17, 2021 •

edited

Loading

cpsauer commented Feb 6, 2022 •

edited

Loading

matta commented Feb 7, 2022

cpsauer commented Feb 7, 2022 •

edited

Loading

matta commented Feb 7, 2022

cpsauer commented Feb 8, 2022

cpsauer commented Apr 7, 2022 •

edited

Loading

cpsauer commented Apr 7, 2022

cpsauer commented May 20, 2022

cpsauer commented Jun 2, 2022 •

edited

Loading

wolfd commented Apr 29, 2024

compile_commands.json contains entries for headers! And it takes a little while to generate! ~20s on the Hedron main repo at the time of writing. Could we speed it up? #5

compile_commands.json contains entries for headers! And it takes a little while to generate! ~20s on the Hedron main repo at the time of writing. Could we speed it up? #5

Comments

cpsauer commented Nov 17, 2021 • edited Loading

cpsauer commented Feb 6, 2022 • edited Loading

matta commented Feb 7, 2022

cpsauer commented Feb 7, 2022 • edited Loading

matta commented Feb 7, 2022

cpsauer commented Feb 8, 2022

cpsauer commented Apr 7, 2022 • edited Loading

cpsauer commented Apr 7, 2022

cpsauer commented May 20, 2022

cpsauer commented Jun 2, 2022 • edited Loading

wolfd commented Apr 29, 2024

cpsauer commented Nov 17, 2021 •

edited

Loading

cpsauer commented Feb 6, 2022 •

edited

Loading

cpsauer commented Feb 7, 2022 •

edited

Loading

cpsauer commented Apr 7, 2022 •

edited

Loading

cpsauer commented Jun 2, 2022 •

edited

Loading