-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve correctness of our Clang tooling infrastructure. #392
Conversation
… headers. also rewrite the googletest include to not rely on virtual include dirs created only during a build.
scripts/create_compdb.py
Outdated
# fail in case there are build errors in the client, and just warn the user | ||
# that they may be missing generated files. | ||
print("Building the generated files so that tools can find them...") | ||
subprocess.run(["bazelisk", "build", "--keep_going"] + generated_file_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this subprocess be the bazel
var from above instead of "bazelisk"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I missed this one, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a proper compiler_commands.json
now, do we still need compile_flags.txt
? (especially since compile_flags.txt
now requires a Bazel build has run so it can consume headers through the bazel-<workspacename>
symlink)
scripts/create_compdb.py
Outdated
# Filter into the Carbon source files that we'll find directly in the | ||
# workspace, and LLVM source files that need to be mapped through the merged | ||
# LLVM tree in Bazel's execution root. | ||
pwd = os.environ["PWD"] + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why os.environ["PWD"]
here vs. os.getcwd()
before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I brought this over from my shell script experimentation as I'm bad at Python. =D Good catch, and just using directory
that we've already computed.
scripts/create_compdb.py
Outdated
import subprocess | ||
import sys | ||
|
||
directory = os.getcwd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the assumed working directory? The root of the source tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattgodbolt 's comment here might be relevant -- better to make script-relative and avoid $PWD
entirely? #384 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tempting, but I was worried about how we would correctly recognize and remove the working directory from paths if it were to be based on $PWD
rather than getcwd()
(which differ in interesting cases involving symlinks).
That said, some experimentation seems to show that Bazel does not use $PWD
(the way it should IMO) and instead uses getcwd()
which couldn't care less. So we could rewrite this if folks want. I've done so, although I'm not sure how much of an improvement this is.
bazel-clang-toolchain
Outdated
@@ -0,0 +1 @@ | |||
bazel-carbon-lang/../../external/bootstrap_clang_toolchain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth remembering, somewhere, that this will break for anyone who clones the repo into a folder not named carbon-lang
, e.g. people who clone their forked copy as carbon-lang-mmdriley
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh. I had thought carbon-lang
comes from the WORKSPACE
name, and not the directory name... Seems not.
I can rewrite all of these paths to use more stable bazel-out
now that I know the trick of a symlink if you'd rather?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a bazel-execroot
symlink and added it that does what bazel-carbon-lang
was doing but with a stable name. The trick of using a symlink to hide the ..
s from ClangD continues to work.
I also re-pointed this through bazel-out
to make it more stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this works, but it's frankly terrifying. I don't think a lot of people on the project will be able to debug this if/when it breaks.
It's sad that Bazel has decided not to support this, though I think their answer would be for us to use aspects, which is neither simpler nor more approachable.
But this is useful and better than we have today, and it seems as robust as possible given the approach. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL, I think addressed everything.
bazel-clang-toolchain
Outdated
@@ -0,0 +1 @@ | |||
bazel-carbon-lang/../../external/bootstrap_clang_toolchain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh. I had thought carbon-lang
comes from the WORKSPACE
name, and not the directory name... Seems not.
I can rewrite all of these paths to use more stable bazel-out
now that I know the trick of a symlink if you'd rather?
scripts/create_compdb.py
Outdated
import subprocess | ||
import sys | ||
|
||
directory = os.getcwd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tempting, but I was worried about how we would correctly recognize and remove the working directory from paths if it were to be based on $PWD
rather than getcwd()
(which differ in interesting cases involving symlinks).
That said, some experimentation seems to show that Bazel does not use $PWD
(the way it should IMO) and instead uses getcwd()
which couldn't care less. So we could rewrite this if folks want. I've done so, although I'm not sure how much of an improvement this is.
scripts/create_compdb.py
Outdated
# Filter into the Carbon source files that we'll find directly in the | ||
# workspace, and LLVM source files that need to be mapped through the merged | ||
# LLVM tree in Bazel's execution root. | ||
pwd = os.environ["PWD"] + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I brought this over from my shell script experimentation as I'm bad at Python. =D Good catch, and just using directory
that we've already computed.
bazel-clang-toolchain
Outdated
@@ -0,0 +1 @@ | |||
bazel-carbon-lang/../../external/bootstrap_clang_toolchain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a bazel-execroot
symlink and added it that does what bazel-carbon-lang
was doing but with a stable name. The trick of using a symlink to hide the ..
s from ClangD continues to work.
I also re-pointed this through bazel-out
to make it more stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I forgot to reply here, sorry. One thing that is nice about keeping |
Thanks for reviews. Now that it's been updated to work with Python 3.6, landing. Please don't hesitate to ask for follow-up fixes anywhere. |
This restructures the `compile_flags.txt` to use the downloaded libc++ system headers and avoid needing a virtual include directory to be built. It still needs _some_ Bazel build to complete before working in order to have the libc++ system headers downloaded and the symlink to the Bazel tree created. One (very) tricky part of making this work is to work around bugs in Clang's tooling layer that incorrectly handle `..` path components after traversing symlinks. To avoid this, we add a custom symlinks (`bazel-execroot` and `bazel-clang-toolchain`) that hide the relevant traversal of the Bazel layout to find build artifacts and the downloaded toolchain. These symlinks will be broken until a build with Bazel downloads the toolchain and creates the basic output tree structure. It also adds a `create_compdb.py` script. Running this script improves the tooling fidelity by taking a few steps: 1. It queries Bazel to find all the relevant files and adds them to a `compile_commands.json` database that allows `clangd` and other tools to index the entire project for improved cross-references, etc. 2. It builds all the generated files with Bazel so that they can be included successfully. This is very fast in my testing, taking only 10s of seconds. It is also very likely to be cached effectively. 3. It translates the arguments from `compile_flags.txt` to make them persistently use the built generated files include paths so that nothing breaks even as different targets are built potentially with different configurations. There are still some limitations. - It still requires running Bazel before anything works, even if a fast run. - It will require re-running if new generated files are added and needed but not built. - It assumes that the standard Bazel symlink names are used and available. Much of the Python here was written by @geoffromer in #384 -- I've adapted it here after discussing to try to fill in some of the blanks and use a slightly different approach to querying Bazel. I use the normal `bazel query` rather than `bazel aquery`. This, for example, allows the index to reliably cover header files in header-only libraries more directly (rather than relying on transitive inclusion). It also seems a bit simpler too parse, but that is a pretty minor difference. Co-authored-by: Geoffrey Romer <[email protected]>
This restructures the
compile_flags.txt
to use the downloaded libc++ systemheaders and avoid needing a virtual include directory to be built. It still
needs some Bazel build to complete before working in order to have the libc++
system headers downloaded and the symlink to the Bazel tree created.
One (very) tricky part of making this work is to work around bugs in Clang's
tooling layer that incorrectly handle
..
path components after traversingsymlinks. To avoid this, we add a custom symlinks (
bazel-execroot
andbazel-clang-toolchain
) that hide the relevant traversal of the Bazel layout tofind build artifacts and the downloaded toolchain. These symlinks will be broken
until a build with Bazel downloads the toolchain and creates the basic output
tree structure.
It also adds a
create_compdb.py
script. Running this script improves thetooling fidelity by taking a few steps:
compile_commands.json
database that allowsclangd
and other tools toindex the entire project for improved cross-references, etc.
successfully. This is very fast in my testing, taking only 10s of seconds. It
is also very likely to be cached effectively.
compile_flags.txt
to make thempersistently use the built generated files include paths so that nothing
breaks even as different targets are built potentially with different
configurations.
There are still some limitations.
built.
Much of the Python here was written by @geoffromer in #384 -- I've adapted it
here after discussing to try to fill in some of the blanks and use a slightly
different approach to querying Bazel. I use the normal
bazel query
rather thanbazel aquery
. This, for example, allows the index to reliably cover headerfiles in header-only libraries more directly (rather than relying on transitive
inclusion). It also seems a bit simpler too parse, but that is a pretty minor
difference.