Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve correctness of our Clang tooling infrastructure. #392

Merged
merged 17 commits into from
Mar 18, 2021
Merged

Improve correctness of our Clang tooling infrastructure. #392

merged 17 commits into from
Mar 18, 2021

Conversation

chandlerc
Copy link
Contributor

@chandlerc chandlerc commented Mar 14, 2021

This restructures the compile_flags.txt to use the downloaded libc++ system
headers and avoid needing a virtual include directory to be built. It still
needs some Bazel build to complete before working in order to have the libc++
system headers downloaded and the symlink to the Bazel tree created.

One (very) tricky part of making this work is to work around bugs in Clang's
tooling layer that incorrectly handle .. path components after traversing
symlinks. To avoid this, we add a custom symlinks (bazel-execroot and
bazel-clang-toolchain) that hide the relevant traversal of the Bazel layout to
find build artifacts and the downloaded toolchain. These symlinks will be broken
until a build with Bazel downloads the toolchain and creates the basic output
tree structure.

It also adds a create_compdb.py script. Running this script improves the
tooling fidelity by taking a few steps:

  1. It queries Bazel to find all the relevant files and adds them to a
    compile_commands.json database that allows clangd and other tools to
    index the entire project for improved cross-references, etc.
  2. It builds all the generated files with Bazel so that they can be included
    successfully. This is very fast in my testing, taking only 10s of seconds. It
    is also very likely to be cached effectively.
  3. It translates the arguments from compile_flags.txt to make them
    persistently use the built generated files include paths so that nothing
    breaks even as different targets are built potentially with different
    configurations.

There are still some limitations.

  • It still requires running Bazel before anything works, even if a fast run.
  • It will require re-running if new generated files are added and needed but not
    built.
  • It assumes that the standard Bazel symlink names are used and available.

Much of the Python here was written by @geoffromer in #384 -- I've adapted it
here after discussing to try to fill in some of the blanks and use a slightly
different approach to querying Bazel. I use the normal bazel query rather than
bazel aquery. This, for example, allows the index to reliably cover header
files in header-only libraries more directly (rather than relying on transitive
inclusion). It also seems a bit simpler too parse, but that is a pretty minor
difference.

@chandlerc chandlerc requested a review from a team as a code owner March 14, 2021 04:55
@google-cla google-cla bot added the cla: yes PR meets CLA requirements according to bot. label Mar 14, 2021
# fail in case there are build errors in the client, and just warn the user
# that they may be missing generated files.
print("Building the generated files so that tools can find them...")
subprocess.run(["bazelisk", "build", "--keep_going"] + generated_file_labels)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this subprocess be the bazel var from above instead of "bazelisk"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I missed this one, thanks!

Copy link
Contributor

@mmdriley mmdriley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a proper compiler_commands.json now, do we still need compile_flags.txt? (especially since compile_flags.txt now requires a Bazel build has run so it can consume headers through the bazel-<workspacename> symlink)

# Filter into the Carbon source files that we'll find directly in the
# workspace, and LLVM source files that need to be mapped through the merged
# LLVM tree in Bazel's execution root.
pwd = os.environ["PWD"] + "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why os.environ["PWD"] here vs. os.getcwd() before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I brought this over from my shell script experimentation as I'm bad at Python. =D Good catch, and just using directory that we've already computed.

import subprocess
import sys

directory = os.getcwd()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the assumed working directory? The root of the source tree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattgodbolt 's comment here might be relevant -- better to make script-relative and avoid $PWD entirely? #384 (review)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tempting, but I was worried about how we would correctly recognize and remove the working directory from paths if it were to be based on $PWD rather than getcwd() (which differ in interesting cases involving symlinks).

That said, some experimentation seems to show that Bazel does not use $PWD (the way it should IMO) and instead uses getcwd() which couldn't care less. So we could rewrite this if folks want. I've done so, although I'm not sure how much of an improvement this is.

@@ -0,0 +1 @@
bazel-carbon-lang/../../external/bootstrap_clang_toolchain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth remembering, somewhere, that this will break for anyone who clones the repo into a folder not named carbon-lang, e.g. people who clone their forked copy as carbon-lang-mmdriley.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh. I had thought carbon-lang comes from the WORKSPACE name, and not the directory name... Seems not.

I can rewrite all of these paths to use more stable bazel-out now that I know the trick of a symlink if you'd rather?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a bazel-execroot symlink and added it that does what bazel-carbon-lang was doing but with a stable name. The trick of using a symlink to hide the ..s from ClangD continues to work.

I also re-pointed this through bazel-out to make it more stable.

Copy link
Contributor

@mmdriley mmdriley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this works, but it's frankly terrifying. I don't think a lot of people on the project will be able to debug this if/when it breaks.

It's sad that Bazel has decided not to support this, though I think their answer would be for us to use aspects, which is neither simpler nor more approachable.

But this is useful and better than we have today, and it seems as robust as possible given the approach. LGTM.

scripts/create_compdb.py Outdated Show resolved Hide resolved
scripts/create_compdb.py Show resolved Hide resolved
scripts/create_compdb.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL, I think addressed everything.

@@ -0,0 +1 @@
bazel-carbon-lang/../../external/bootstrap_clang_toolchain
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh. I had thought carbon-lang comes from the WORKSPACE name, and not the directory name... Seems not.

I can rewrite all of these paths to use more stable bazel-out now that I know the trick of a symlink if you'd rather?

import subprocess
import sys

directory = os.getcwd()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tempting, but I was worried about how we would correctly recognize and remove the working directory from paths if it were to be based on $PWD rather than getcwd() (which differ in interesting cases involving symlinks).

That said, some experimentation seems to show that Bazel does not use $PWD (the way it should IMO) and instead uses getcwd() which couldn't care less. So we could rewrite this if folks want. I've done so, although I'm not sure how much of an improvement this is.

# Filter into the Carbon source files that we'll find directly in the
# workspace, and LLVM source files that need to be mapped through the merged
# LLVM tree in Bazel's execution root.
pwd = os.environ["PWD"] + "/"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I brought this over from my shell script experimentation as I'm bad at Python. =D Good catch, and just using directory that we've already computed.

scripts/create_compdb.py Outdated Show resolved Hide resolved
scripts/create_compdb.py Show resolved Hide resolved
scripts/create_compdb.py Outdated Show resolved Hide resolved
scripts/create_compdb.py Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
bazel-carbon-lang/../../external/bootstrap_clang_toolchain
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a bazel-execroot symlink and added it that does what bazel-carbon-lang was doing but with a stable name. The trick of using a symlink to hide the ..s from ClangD continues to work.

I also re-pointed this through bazel-out to make it more stable.

Copy link

@dabrahams dabrahams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chandlerc
Copy link
Contributor Author

If we have a proper compiler_commands.json now, do we still need compile_flags.txt? (especially since compile_flags.txt now requires a Bazel build has run so it can consume headers through the bazel-<workspacename> symlink)

I forgot to reply here, sorry.

One thing that is nice about keeping compiler_flags.txt is that if you just build stuff, most things just work without ever running this script.

@chandlerc
Copy link
Contributor Author

Thanks for reviews. Now that it's been updated to work with Python 3.6, landing. Please don't hesitate to ask for follow-up fixes anywhere.

@chandlerc chandlerc merged commit 61fb25f into carbon-language:trunk Mar 18, 2021
@chandlerc chandlerc deleted the clangd branch March 18, 2021 09:29
chandlerc added a commit that referenced this pull request Jun 28, 2022
This restructures the `compile_flags.txt` to use the downloaded libc++ system
headers and avoid needing a virtual include directory to be built. It still
needs _some_ Bazel build to complete before working in order to have the libc++
system headers downloaded and the symlink to the Bazel tree created.

One (very) tricky part of making this work is to work around bugs in Clang's
tooling layer that incorrectly handle `..` path components after traversing
symlinks. To avoid this, we add a custom symlinks (`bazel-execroot` and
`bazel-clang-toolchain`) that hide the relevant traversal of the Bazel layout to
find build artifacts and the downloaded toolchain. These symlinks will be broken
until a build with Bazel downloads the toolchain and creates the basic output
tree structure.

It also adds a `create_compdb.py` script. Running this script improves the
tooling fidelity by taking a few steps:

1. It queries Bazel to find all the relevant files and adds them to a
   `compile_commands.json` database that allows `clangd` and other tools to
   index the entire project for improved cross-references, etc.
2. It builds all the generated files with Bazel so that they can be included
   successfully. This is very fast in my testing, taking only 10s of seconds. It
   is also very likely to be cached effectively.
3. It translates the arguments from `compile_flags.txt` to make them
   persistently use the built generated files include paths so that nothing
   breaks even as different targets are built potentially with different
   configurations.

There are still some limitations.

- It still requires running Bazel before anything works, even if a fast run.
- It will require re-running if new generated files are added and needed but not
  built.
- It assumes that the standard Bazel symlink names are used and available.

Much of the Python here was written by @geoffromer in #384 -- I've adapted it
here after discussing to try to fill in some of the blanks and use a slightly
different approach to querying Bazel. I use the normal `bazel query` rather than
`bazel aquery`. This, for example, allows the index to reliably cover header
files in header-only libraries more directly (rather than relying on transitive
inclusion). It also seems a bit simpler too parse, but that is a pretty minor
difference.

Co-authored-by: Geoffrey Romer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes PR meets CLA requirements according to bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants