Use parsed files to improve header compile commands #123

sam-mccall · 2019-08-09T21:35:17Z

Currently the dominant way we get compile commands for open headers is filename based matching (in JSONCompilationDatabase).

However we have two sets of better information:

the files that have been open (headers are often opened after some file that includes them)
the background index (most headers have been indexed by some TU)
we could make use of these to get the TU, and then transfer its compile command.

ilya-biryukov · 2019-08-12T12:27:47Z

Great idea! We could potentially get a better "switch header-source" for free while implementing this feature too.

knopp · 2020-05-02T00:54:56Z

Is there any progress on this? I believe 2) is roughly how Xcode indexes headers, except that it also reflects additional preprocessor state at the point of include.

so having mysource.cpp

#define A
#include "myheader.h"

Will cause -DA added to myheader.h compilation flags alongside mysource.cpp flags. This leads to very good result.

Currently our strategy for getting header compile flags is something like: A) look for flags for the header in compile_commands.json This basically never works, build systems don't generate this info. B) try to match to an impl file in compile_commands.json and use its flags This only (mostly) works if the headers are in the same project. C) give up and use fallback flags This kind of works for stdlib in the default configuration, and otherwise doesn't. Obviously there are big gaps here. This patch inserts a new attempt between A and B: if the header is transitively included by any open file (whether same project or not), then we use its compile command. This doesn't make any attempt to solve some related problems: - parsing non-self-contained header files in context (importing PP state) - using the compile flags of non-opened candidate files found in the index Fixes clangd/clangd#123 Fixes clangd/clangd#695 See clangd/clangd#519 Differential Revision: https://reviews.llvm.org/D97351

HighCommander4 · 2021-05-10T01:07:29Z

It looks to me like the patch that landed here is a partial solution (it picks the right source file if that source file has been opened), not a complete solution (which I would envision as using the right source file even if it hasn't been opened, using include-graph information stored in the index).

Should we re-open, or file a new issue to track the complete solution? This is something that users continue to run into, e.g. clangd/vscode-clangd#187.

kadircet · 2021-05-10T09:23:23Z

we've avoided doing that so far mainly because storing reverse include graphs are hard.

How many candidates do we want to store per header?

The current solution stores only one and it works well because most of the time the user would like to see the header in context of the most recent file they were working in and existing solution guarantees that in most cases.

When we go with using the info from background-index, back at the day we've concluded that we can't store a single candidate. Because it might be coming from a different part of the code-base (similar to picking a bad candidate heuristically) and clangd might end up building the header in a not-so-useful state (probably still better than completely broken). Even if you stored multiple candidates, how are you gonna pick between them? (e.g. one has -std=c++20 other has -std=c++17).

Also the candidate might just be invalid because someone deleted the file, or it no longer depends on the header (and static index can go stale). There's no way to easily verify the latter without at least running the preprocessor on the file again.
Anyway, we need fallback candidates now (as they might be invalid too). If we decide to store all, it becomes O(N^2) storage. Even if you cleverly intern filenames and such, in codebases like chromium there are ~300K source files, this is GBs worth of storage.

Maybe some middle ground could be found by still storing a single candidate in static (background) index and using it whenever there's no information coming from dynamic index. Only cases I can think of are:

User directly opening a header file, without opening any cc file depending on it
There are indeed no files depending on a particular header and compile commands don't have an entry for it.

There isn't really much we can do for the second case, it just won't work with a static index approach too.

As for the first case, i got a feeling like it should be rare in practice to justify all the extra complication. (As we already have a heuristic CDB for this case, and are only considering the cases in which heuristics pick a bad file. That possibility still exists with a static-index based approach but becomes less likely as explained)

Are there any cases I am missing here in which the dynamic index based solution doesn't really cover?

HighCommander4 · 2021-05-10T17:37:06Z

I think the first case is likely to be common for projects which are mostly headers, e.g. a header-only / template library. In such projects, the only source files are often test files, and users don't necessarily open test files before opening the header files.

I also think a single candidate based on the include graph is still better, in that we know that e.g. it provides the flags necessary to resolve includes in the header being opened, than heuristically chosen candidates, which may not.

Preferring information from the dynamic index and falling back on a single candidate stored in the static index seems reasonable to me.

kadircet · 2021-05-10T19:45:05Z

I also think a single candidate based on the include graph is still better, in that we know that e.g. it provides the flags necessary to resolve includes in the header being opened, than heuristically chosen candidates, which may not.

Sorry if i wasn't explicit enough. I was also trying to make this point, while adding that:

It still won't be foolproof, some things that come to mind:
- the candidate we pick might vanish between indexing and time to use
- go stale e.g. not depend on the particular header anymore or got deleted
- we picked the configuration user wasn't interested in
I don't think it happens common enough in practice to justify the complications in static-index + CDB. The two possible designs i can see are:
either changing the SymbolIndex interface to support such queries (well doesn't sound like something a "symbol" index should be doing at the very least)
signalling either TUScheduler/CDB from "indexer" component. (this one definitely messy)

So it sounds to me like we'll improve the world slightly at the cost of lots of complexity. Hence i was saying that it didn't sound that feasible. But maybe there are better alternatives that I don't see ATM, either foolproof (which i don't think we can at the face of multiple configurations even if we figure out all the rest) or that can be implemented locally without changing lots of interfaces. So we are open to suggestions.

HighCommander4 · 2021-05-11T07:02:41Z

Your points about the tradeoffs are well taken.

My concrete suggestion for the time being, is that we reopen this issue (or file a new one) to acknowledge that there is a remaining gap here, so that we have a place where we can point users that run into this (such as in clangd/vscode-clangd#187), and document workarounds (such as using a tool like CompDB to generate header entries in the CDB).

Even if the resolution ends up being "it's not worth to spend more time enhancing clangd's command inference, let's just accept that header-only library projects need to use the CompDB approach", that's worth documenting / putting in an FAQ or something.

kadircet · 2021-05-11T08:49:06Z

Yes. Since we currently lack the roadmap, it is hard to convey the situation on our side (but i am working on it!). In the long term we would like to ensure that the "open" issues are only for the cases where no action was taken or there's a consensus around taking some action but it isn't complete yet. Then have all the other cases in which we made a decision to take no action documented in a different place ("closed" issues are also a candidate) by explaining the reasons why, so that it is visible to everyone (including us, history is hard to keep track of!). Hopefully we'll get there soon.

In the meantime feel free to open an issue and summarize these discussions if you have some time, but otherwise this particular issue is also on my to-do list to mention (like most other open issues) once we figure out details about how we want to proceed.

GavinRay97 · 2021-06-01T12:03:18Z

Just leaving a note here, without much to contribute to the discussion, that CompDB is fantastic but for some projects fails & is unusable.

IE when used when building JUCE apps (even ones that have a single .cpp file), the resulting header compilation commands make the compile_commands.json so large that clangd cannot process them anymore =(

I am sure there are a non-insignificant number of projects where this is the case unfortunately.

HighCommander4 · 2021-07-20T07:28:01Z

I don't think it happens common enough in practice to justify the complications in static-index + CDB.

@kadircet, since we had this discussion ~9 weeks ago, 7 different users (that I'm aware of) have reported running into this problem in various forums (Discourse, Discord, or here in the bug tracker). I've collected them:

2021-05-21: Source file chosen to infer compile commands for a header sometimes has the wrong language #519 (comment)
2021-06-01: Use parsed files to improve header compile commands #123 (comment)
2021-06-09: https://llvm.discourse.group/t/clangd-compilation-unit-incorrectly-inferred-for-a-header-file/3620
2021-06-14: Source file chosen to infer compile commands for a header sometimes has the wrong language #519 (comment)
2021-07-01: https://discord.com/channels/636084430946959380/649134148723802113/860276527496298516
2021-07-10: https://discord.com/channels/636084430946959380/649134148723802113/863578622899781682
2021-07-20: File is not parsed with C++17 support #823

In my opinion, this is a serious enough recurring issue that it's worth trying to solve via the static index even though it's not foolproof as discussed, and even if it entails some internal complexity such as expanding the interface of SymbolIndex.

kadircet · 2021-07-29T15:51:34Z

Thanks for the detailed update Nathan (and sorry for late reply on my side),

We were discussing this offline with the team and came to the conclusion that even though this is going to add some complexity and solutions that we can come up with using the current infra is still going to be limited, the impact is probably justified.

The interaction still needs to be designed carefully though as it will need to touch lots of different components. First we need to figure out how to extract the data needed in SymbolCollector (hopefully consuming IncludeGraph should suffice). Then how to store this new information in index shards and how to build serving structures on top of it in dex/mem. Afterwards we need to make it available through SymbolIndex and define the behaviour for Merge/File and various other index implementations (and probably an error case too). Finally it is going to touch TUScheduler so that we can pivot to a different file if currently active files fallback fails (+ maybe just make existing logic part of dynamic index already and have a single fallback rather than 2 different fallbacks).

It is not something we can prioritize today, or in the near term but definitely something we support. I am planning to come up with a roadmap in the following weeks, hopefully this item will also have its place in the agenda so that someone can pick it up.

koenlek · 2021-09-09T00:29:17Z

Could we re-open this ticket? Or is there a better ticket to subscribe to for getting clangd to infer "includes in headers" based on the compile commands?

HighCommander4 · 2021-09-09T01:37:11Z

I think it makes sense to re-open this one, as I think there is a lot of relevant discussion here.

cpsauer · 2022-02-06T07:32:13Z

Chiming in from the bazel plugin side to echo the need (beyond the backlinks, above):

It'd be so nice if clangd proactively traversed the include graph, applying flags to headers from the sources that included them as part of background indexing.

Can verify that this is a case we and our users would hit all the time! And that much of work (runtime, and file size) spent to get a compile_commands.json to work well with clangd involves working around this issue, compdb-style.

Currently our strategy for getting header compile flags is something like: A) look for flags for the header in compile_commands.json This basically never works, build systems don't generate this info. B) try to match to an impl file in compile_commands.json and use its flags This only (mostly) works if the headers are in the same project. C) give up and use fallback flags This kind of works for stdlib in the default configuration, and otherwise doesn't. Obviously there are big gaps here. This patch inserts a new attempt between A and B: if the header is transitively included by any open file (whether same project or not), then we use its compile command. This doesn't make any attempt to solve some related problems: - parsing non-self-contained header files in context (importing PP state) - using the compile flags of non-opened candidate files found in the index Fixes clangd/clangd#123 Fixes clangd/clangd#695 See clangd/clangd#519 Differential Revision: https://reviews.llvm.org/D97351

MrDiver · 2023-12-28T20:23:27Z

Any updates ?

using https://github.com/Sarcasm/compdb to work around issue clangd/clangd#123

sam-mccall assigned usx95 May 12, 2020

HighCommander4 mentioned this issue Feb 18, 2021

Clangd cannot find a header file #695

Closed

sam-mccall closed this as completed in llvm/llvm-project@588db1c Mar 1, 2021

HighCommander4 mentioned this issue May 10, 2021

Include paths not correctly detected in VSCode clangd/vscode-clangd#187

Open

HighCommander4 mentioned this issue May 21, 2021

Source file chosen to infer compile commands for a header sometimes has the wrong language #519

Open

HighCommander4 mentioned this issue Jun 13, 2021

Create FAQ llvm/clangd-www#33

Closed

HighCommander4 mentioned this issue Jul 20, 2021

File is not parsed with C++17 support #823

Closed

HighCommander4 reopened this Sep 9, 2021

This was referenced Sep 22, 2021

Expand documentation about compile command inference for header files HighCommander4/clangd-www#1

Closed

Expand documentation about compile command inference for header files llvm/clangd-www#44

Merged

HighCommander4 mentioned this issue Nov 18, 2021

File is parsed correctly or not depending on the existence of another file in the compilation database clangd/vscode-clangd#252

Closed

sam-mccall mentioned this issue Jan 24, 2022

Adjust include paths when renaming/moving/deleting a file #994

Open

cpsauer mentioned this issue Feb 6, 2022

Improve compile_commands refresh time. hedronvision/bazel-compile-commands-extractor#24

Closed

matta mentioned this issue Feb 8, 2022

The extractor sometimes picks a non-package-local .cc file's command for header files hedronvision/bazel-compile-commands-extractor#26

Closed

HighCommander4 mentioned this issue Apr 10, 2022

clangd is inferring the wrong source file from a unity build source file #1098

Closed

cpsauer mentioned this issue Apr 18, 2022

Add attributes to control header and external source inclusion hedronvision/bazel-compile-commands-extractor#44

Merged

cpsauer mentioned this issue May 26, 2022

Feature Request / Discussion - Allow multiple compile_commands.json / settings the target path to the generated compile_commands.json hedronvision/bazel-compile-commands-extractor#48

Closed

HighCommander4 mentioned this issue Jun 13, 2022

--query-driver + extension-less headers #1173

Closed

cpsauer mentioned this issue Aug 25, 2022

Spurious Standard Library Inclusion Errors with GCC, --query-driver, and .h header commands #1263

Closed

HighCommander4 mentioned this issue Nov 30, 2022

[clangd] Wrong compilation flags are chosen for some header files llvm/llvm-project#59254

Closed

cpsauer mentioned this issue Dec 2, 2022

Improve support of proprietary compilers hedronvision/bazel-compile-commands-extractor#90

Closed

cpsauer mentioned this issue Feb 28, 2023

Same compile command repeated many times. hedronvision/bazel-compile-commands-extractor#106

Closed

HighCommander4 mentioned this issue Mar 22, 2023

Multi project (workspaceFolders) support #1549

Open

jk000 mentioned this issue May 31, 2023

enable enumeration & selection of compilation unit for given header file #1651

Closed

abhiagg92 mentioned this issue Oct 2, 2023

Clangd: Eigen/Core not found error colmap/colmap#2210

Closed

HighCommander4 mentioned this issue Nov 23, 2023

[clangd] Add includes from source to non-self-contained headers llvm/llvm-project#72479

Closed

AdelKS added a commit to AdelKS/ZeCalculator that referenced this issue Mar 24, 2024

vscode: make clangd properly work on the database

10b5faf

using https://github.com/Sarcasm/compdb to work around issue clangd/clangd#123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use parsed files to improve header compile commands #123

Use parsed files to improve header compile commands #123

sam-mccall commented Aug 9, 2019

ilya-biryukov commented Aug 12, 2019 •

edited

Loading

knopp commented May 2, 2020

HighCommander4 commented May 10, 2021 •

edited

Loading

kadircet commented May 10, 2021

HighCommander4 commented May 10, 2021

kadircet commented May 10, 2021

HighCommander4 commented May 11, 2021

kadircet commented May 11, 2021

GavinRay97 commented Jun 1, 2021

HighCommander4 commented Jul 20, 2021

kadircet commented Jul 29, 2021

koenlek commented Sep 9, 2021

HighCommander4 commented Sep 9, 2021

cpsauer commented Feb 6, 2022 •

edited

Loading

MrDiver commented Dec 28, 2023

Use parsed files to improve header compile commands #123

Use parsed files to improve header compile commands #123

Comments

sam-mccall commented Aug 9, 2019

ilya-biryukov commented Aug 12, 2019 • edited Loading

knopp commented May 2, 2020

HighCommander4 commented May 10, 2021 • edited Loading

kadircet commented May 10, 2021

HighCommander4 commented May 10, 2021

kadircet commented May 10, 2021

HighCommander4 commented May 11, 2021

kadircet commented May 11, 2021

GavinRay97 commented Jun 1, 2021

HighCommander4 commented Jul 20, 2021

kadircet commented Jul 29, 2021

koenlek commented Sep 9, 2021

HighCommander4 commented Sep 9, 2021

cpsauer commented Feb 6, 2022 • edited Loading

MrDiver commented Dec 28, 2023

ilya-biryukov commented Aug 12, 2019 •

edited

Loading

HighCommander4 commented May 10, 2021 •

edited

Loading

cpsauer commented Feb 6, 2022 •

edited

Loading