Improve correctness of our Clang tooling infrastructure. #392

chandlerc · 2021-03-14T04:55:10Z

This restructures the compile_flags.txt to use the downloaded libc++ system
headers and avoid needing a virtual include directory to be built. It still
needs some Bazel build to complete before working in order to have the libc++
system headers downloaded and the symlink to the Bazel tree created.

One (very) tricky part of making this work is to work around bugs in Clang's
tooling layer that incorrectly handle .. path components after traversing
symlinks. To avoid this, we add a custom symlinks (bazel-execroot and
bazel-clang-toolchain) that hide the relevant traversal of the Bazel layout to
find build artifacts and the downloaded toolchain. These symlinks will be broken
until a build with Bazel downloads the toolchain and creates the basic output
tree structure.

It also adds a create_compdb.py script. Running this script improves the
tooling fidelity by taking a few steps:

It queries Bazel to find all the relevant files and adds them to a
compile_commands.json database that allows clangd and other tools to
index the entire project for improved cross-references, etc.
It builds all the generated files with Bazel so that they can be included
successfully. This is very fast in my testing, taking only 10s of seconds. It
is also very likely to be cached effectively.
It translates the arguments from compile_flags.txt to make them
persistently use the built generated files include paths so that nothing
breaks even as different targets are built potentially with different
configurations.

There are still some limitations.

It still requires running Bazel before anything works, even if a fast run.
It will require re-running if new generated files are added and needed but not
built.
It assumes that the standard Bazel symlink names are used and available.

Much of the Python here was written by @geoffromer in #384 -- I've adapted it
here after discussing to try to fill in some of the blanks and use a slightly
different approach to querying Bazel. I use the normal bazel query rather than
bazel aquery. This, for example, allows the index to reliably cover header
files in header-only libraries more directly (rather than relying on transitive
inclusion). It also seems a bit simpler too parse, but that is a pretty minor
difference.

… headers. also rewrite the googletest include to not rely on virtual include dirs created only during a build.

fowles · 2021-03-14T05:38:03Z

scripts/create_compdb.py

+# fail in case there are build errors in the client, and just warn the user
+# that they may be missing generated files.
+print("Building the generated files so that tools can find them...")
+subprocess.run(["bazelisk", "build", "--keep_going"] + generated_file_labels)


should this subprocess be the bazel var from above instead of "bazelisk"?

Good catch, I missed this one, thanks!

mmdriley

If we have a proper compiler_commands.json now, do we still need compile_flags.txt? (especially since compile_flags.txt now requires a Bazel build has run so it can consume headers through the bazel-<workspacename> symlink)

mmdriley · 2021-03-15T21:42:09Z

scripts/create_compdb.py

+# Filter into the Carbon source files that we'll find directly in the
+# workspace, and LLVM source files that need to be mapped through the merged
+# LLVM tree in Bazel's execution root.
+pwd = os.environ["PWD"] + "/"


why os.environ["PWD"] here vs. os.getcwd() before?

Because I brought this over from my shell script experimentation as I'm bad at Python. =D Good catch, and just using directory that we've already computed.

mmdriley · 2021-03-15T21:43:41Z

scripts/create_compdb.py

+import subprocess
+import sys
+
+directory = os.getcwd()


What is the assumed working directory? The root of the source tree?

@mattgodbolt 's comment here might be relevant -- better to make script-relative and avoid $PWD entirely? #384 (review)

This is tempting, but I was worried about how we would correctly recognize and remove the working directory from paths if it were to be based on $PWD rather than getcwd() (which differ in interesting cases involving symlinks).

That said, some experimentation seems to show that Bazel does not use $PWD (the way it should IMO) and instead uses getcwd() which couldn't care less. So we could rewrite this if folks want. I've done so, although I'm not sure how much of an improvement this is.

mmdriley · 2021-03-15T22:47:15Z

bazel-clang-toolchain

@@ -0,0 +1 @@
+bazel-carbon-lang/../../external/bootstrap_clang_toolchain


worth remembering, somewhere, that this will break for anyone who clones the repo into a folder not named carbon-lang, e.g. people who clone their forked copy as carbon-lang-mmdriley.

Doh. I had thought carbon-lang comes from the WORKSPACE name, and not the directory name... Seems not.

I can rewrite all of these paths to use more stable bazel-out now that I know the trick of a symlink if you'd rather?

I created a bazel-execroot symlink and added it that does what bazel-carbon-lang was doing but with a stable name. The trick of using a symlink to hide the ..s from ClangD continues to work.

I also re-pointed this through bazel-out to make it more stable.

mmdriley

I believe this works, but it's frankly terrifying. I don't think a lot of people on the project will be able to debug this if/when it breaks.

It's sad that Bazel has decided not to support this, though I think their answer would be for us to use aspects, which is neither simpler nor more approachable.

But this is useful and better than we have today, and it seems as robust as possible given the approach. LGTM.

scripts/create_compdb.py

chandlerc

PTAL, I think addressed everything.

chandlerc · 2021-03-16T01:44:43Z

bazel-clang-toolchain

@@ -0,0 +1 @@
+bazel-carbon-lang/../../external/bootstrap_clang_toolchain


Doh. I had thought carbon-lang comes from the WORKSPACE name, and not the directory name... Seems not.

I can rewrite all of these paths to use more stable bazel-out now that I know the trick of a symlink if you'd rather?

chandlerc · 2021-03-17T06:48:21Z

scripts/create_compdb.py

+import subprocess
+import sys
+
+directory = os.getcwd()


This is tempting, but I was worried about how we would correctly recognize and remove the working directory from paths if it were to be based on $PWD rather than getcwd() (which differ in interesting cases involving symlinks).

That said, some experimentation seems to show that Bazel does not use $PWD (the way it should IMO) and instead uses getcwd() which couldn't care less. So we could rewrite this if folks want. I've done so, although I'm not sure how much of an improvement this is.

chandlerc · 2021-03-17T06:51:09Z

scripts/create_compdb.py

+# Filter into the Carbon source files that we'll find directly in the
+# workspace, and LLVM source files that need to be mapped through the merged
+# LLVM tree in Bazel's execution root.
+pwd = os.environ["PWD"] + "/"


Because I brought this over from my shell script experimentation as I'm bad at Python. =D Good catch, and just using directory that we've already computed.

scripts/create_compdb.py

chandlerc · 2021-03-17T09:33:30Z

bazel-clang-toolchain

@@ -0,0 +1 @@
+bazel-carbon-lang/../../external/bootstrap_clang_toolchain


I created a bazel-execroot symlink and added it that does what bazel-carbon-lang was doing but with a stable name. The trick of using a symlink to hide the ..s from ClangD continues to work.

I also re-pointed this through bazel-out to make it more stable.

dabrahams

LGTM

chandlerc · 2021-03-17T18:09:34Z

If we have a proper compiler_commands.json now, do we still need compile_flags.txt? (especially since compile_flags.txt now requires a Bazel build has run so it can consume headers through the bazel-<workspacename> symlink)

I forgot to reply here, sorry.

One thing that is nice about keeping compiler_flags.txt is that if you just build stuff, most things just work without ever running this script.

chandlerc · 2021-03-18T09:18:36Z

Thanks for reviews. Now that it's been updated to work with Python 3.6, landing. Please don't hesitate to ask for follow-up fixes anywhere.

@geoffromer

This restructures the `compile_flags.txt` to use the downloaded libc++ system headers and avoid needing a virtual include directory to be built. It still needs _some_ Bazel build to complete before working in order to have the libc++ system headers downloaded and the symlink to the Bazel tree created. One (very) tricky part of making this work is to work around bugs in Clang's tooling layer that incorrectly handle `..` path components after traversing symlinks. To avoid this, we add a custom symlinks (`bazel-execroot` and `bazel-clang-toolchain`) that hide the relevant traversal of the Bazel layout to find build artifacts and the downloaded toolchain. These symlinks will be broken until a build with Bazel downloads the toolchain and creates the basic output tree structure. It also adds a `create_compdb.py` script. Running this script improves the tooling fidelity by taking a few steps: 1. It queries Bazel to find all the relevant files and adds them to a `compile_commands.json` database that allows `clangd` and other tools to index the entire project for improved cross-references, etc. 2. It builds all the generated files with Bazel so that they can be included successfully. This is very fast in my testing, taking only 10s of seconds. It is also very likely to be cached effectively. 3. It translates the arguments from `compile_flags.txt` to make them persistently use the built generated files include paths so that nothing breaks even as different targets are built potentially with different configurations. There are still some limitations. - It still requires running Bazel before anything works, even if a fast run. - It will require re-running if new generated files are added and needed but not built. - It assumes that the standard Bazel symlink names are used and available. Much of the Python here was written by @geoffromer in #384 -- I've adapted it here after discussing to try to fill in some of the blanks and use a slightly different approach to querying Bazel. I use the normal `bazel query` rather than `bazel aquery`. This, for example, allows the index to reliably cover header files in header-only libraries more directly (rather than relying on transitive inclusion). It also seems a bit simpler too parse, but that is a pretty minor difference. Co-authored-by: Geoffrey Romer <[email protected]>

geoffromer and others added 10 commits March 13, 2021 04:09

Initial clangd support

631ce18

Tidy up compile_flags.txt

cf31a87

Further compile_flags cleanup.

1ea1f10

Undo flags edits and add my horrible prototype.

6222f7e

Move the script to a more general name.

9019cd9

Make things largely work, including w/ libc++

51cde97

fix formatting

3bfb40d

fix formatting and flake8

56ff737

translate -bin paths to use a persistently available set of generated…

537ef76

… headers. also rewrite the googletest include to not rely on virtual include dirs created only during a build.

format

87dfa5c

chandlerc requested a review from a team as a code owner March 14, 2021 04:55

google-cla bot added the cla: yes PR meets CLA requirements according to bot. label Mar 14, 2021

chandlerc requested a review from geoffromer March 14, 2021 04:57

fowles approved these changes Mar 14, 2021

View reviewed changes

mmdriley reviewed Mar 15, 2021

View reviewed changes

mmdriley approved these changes Mar 15, 2021

View reviewed changes

geoffromer reviewed Mar 16, 2021

View reviewed changes

scripts/create_compdb.py Outdated Show resolved Hide resolved

scripts/create_compdb.py Show resolved Hide resolved

scripts/create_compdb.py Outdated Show resolved Hide resolved

geoffromer mentioned this pull request Mar 16, 2021

Experimental clangd support #384

Closed

chandlerc added 6 commits March 17, 2021 06:45

Merge branch 'trunk' into clangd

3a7bb03

Major update based on code review comments.

9b1402d

format

2c315ee

flake8

7dc1325

fix the other symlink

bd361c5

update precommit

4c0d349

chandlerc commented Mar 17, 2021

View reviewed changes

dabrahams approved these changes Mar 17, 2021

View reviewed changes

geoffromer approved these changes Mar 17, 2021

View reviewed changes

Update script to work with Python 3.6

0ab6514

chandlerc merged commit 61fb25f into carbon-language:trunk Mar 18, 2021

chandlerc deleted the clangd branch March 18, 2021 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve correctness of our Clang tooling infrastructure. #392

Improve correctness of our Clang tooling infrastructure. #392

chandlerc commented Mar 14, 2021 •

edited

Loading

fowles Mar 14, 2021

chandlerc Mar 14, 2021

mmdriley left a comment

mmdriley Mar 15, 2021

chandlerc Mar 17, 2021

mmdriley Mar 15, 2021

mmdriley Mar 15, 2021

chandlerc Mar 17, 2021

mmdriley Mar 15, 2021

chandlerc Mar 16, 2021

chandlerc Mar 17, 2021

mmdriley left a comment

chandlerc left a comment

chandlerc Mar 16, 2021

chandlerc Mar 17, 2021

chandlerc Mar 17, 2021

chandlerc Mar 17, 2021

dabrahams left a comment

chandlerc commented Mar 17, 2021

chandlerc commented Mar 18, 2021

		@@ -0,0 +1 @@
		bazel-carbon-lang/../../external/bootstrap_clang_toolchain

Improve correctness of our Clang tooling infrastructure. #392

Improve correctness of our Clang tooling infrastructure. #392

Conversation

chandlerc commented Mar 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmdriley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmdriley left a comment

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dabrahams left a comment

Choose a reason for hiding this comment

chandlerc commented Mar 17, 2021

chandlerc commented Mar 18, 2021

chandlerc commented Mar 14, 2021 •

edited

Loading