Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do target configuration constraints work? #184

Open
zjturner opened this issue Apr 19, 2023 · 13 comments
Open

How do target configuration constraints work? #184

zjturner opened this issue Apr 19, 2023 · 13 comments

Comments

@zjturner
Copy link
Contributor

I found this code from grepping the repository:

        kwargs["link_style"] = select({
            "DEFAULT": link_style,
            "ovr_config//os:macos": "static",
        })

How does parsing of this "ovr_config//os:macos" string work? That also looks a lot like one of those target patterns, but that clearly is not a target. And it's not the same thing as @prelude//toolchains:cxx.bzl either, because it doesn't have an @ at the beginning, and doesn't refer to a file.

I grepped for ovr_config and found this:

[repository_aliases]
ovr_config = prelude

I guess it's coming from here. But the exact syntax isn't documented. And what is os? What other selectors are there that I could use besides os? It looks like this pattern might be a "target configuration constraint", which I figured out by looking at this piece of code from somewhere else:

def _tp2_constraint(project, version):
    """
    Return the target configuration constraint to use for the given project
    version
    """

    return "ovr_config//third-party/{}/constraints:{}".format(project, version)

but if you type "constraint" into the search box of the documentation page you get nothing.

And how do these map to values from .buckconfig? Concrete example: Let's say I have this in my .buckconfig:

[foo]
bar = baz

How can I do the following two things:

  1. Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?
  2. Ignore .buckconfig for the purposes of obtaining a value of foo.bar, and have the user write buck2 build <command-line-argument-that-forces-foo.bar-to-buzz>?
@stepancheg
Copy link
Contributor

How does parsing of this "ovr_config//os:macos" string work? That also looks a lot like one of those target patterns, but that clearly is not a target

It is a target.

Try buck2 uquery ovr_config//os:macos.

what is os?

Directory in ovr_config cell.

What other selectors are there that I could use besides os

Simplified, config_setting rule defines select key.

But yes, documentation could be better.

@zjturner
Copy link
Contributor Author

Thank you! Would you be able to also provide answers to the two questions at the end of the OP?

@stepancheg
Copy link
Contributor

Would you be able to also provide answers to the two questions at the end of the OP?

Can you please give more concrete examples of what flags do you want to switch?

@zjturner
Copy link
Contributor Author

Hmm, is that actually important? I chose a hypothetical non-existant flag --compiler-flag-1 on purpose because I'm assuming that the answer should be the same for any arbitrary flag. If the setting in buck config is foo, pass flag 1. If it's bar, pass flag 2. Does it actually m atter what flag 1 and flag 2 are?

@stepancheg
Copy link
Contributor

Hmm, is that actually important?

Yes, it is. Depending on your use case I can suggest one solution or another, or describe our plans to address the issue.

Basically we don't support "flags". Most of the current issue can be addressed by adding a custom configuration with platform rule. But these are not composable. There's workaround with buckconfigs and config_setting depending on that buckconfig, which is partly deprecated, but we don't have a replacement yet.

@cjhopman
Copy link
Contributor

Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?

There's a lot that goes into this. A lot of it is described here: https://buck2.build/docs/rule_authors/configurations/

Here's one way to define a platform and a select such that the select() will resolve as you ask when in the configuration defined by that platform:

constraint_setting(
  name = "foo-bar"
)

constraint_value(
  name = "foo-bar-baz",
  constraint_setting = ":foo-bar",
)

constraint_value(
  name = "foo-bar-buzz",
  constraint_setting = ":foo-bar",
)

foo_config = read_config("foo", "bar", None)

platform(
  name = "my-platform",
  constraint_values = [":foo-bar-baz"] if foo_config == "baz" else ([":foo-bar-buzz"] if foo_config = "buzz") else []
)

my_rule(
  name = "use",
  flags = select(
     ":foo-bar-baz": ["--compiler-flag-1"],
     ":foo-bar-buzz": ["--compiler-flag-2"],
   )
)

Now, that example doesn't cause the :use rule to be built in the :my-platform configuration, there's a handful of different ways to do that. But if it is built in that configuration, that select() would be resolved in the way you asked.

@dtolnay
Copy link

dtolnay commented Apr 20, 2023

How can I do the following two things:

  1. Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?
  2. Ignore .buckconfig for the purposes of obtaining a value of foo.bar, and have the user write buck2 build <command-line-argument-that-forces-foo.bar-to-buzz>?

It's hard to be sure whether you are asking this in order to understand better how platforms/configurations/selects work, vs. to accomplish these two specific pieces of functionality.

Mostly likely it's the former, in which case cjhopman's answer has good guidance.

But just in case it's the latter, select and constraint may not necessarily be the thing you're after.

  1.  foo_bar = read_config("foo", "bar", None)
    
     my_rule(
         name = "...",
         flags = [
             "--compiler-flag-1" if foo_bar == "baz"
             else "--compiler-flag-2" if foo_bar == "buzz"
             else fail("...")
         ],
     )
  2. buck2 build --config foo.bar='...'

    foo_bar = read_config("foo", "bar", None)
    
    my_rule(
        name = "...",
        flags = foo_bar.split(" ") if foo_bar else [],
    )

@zjturner
Copy link
Contributor Author

zjturner commented Apr 20, 2023

Yes, it is. Depending on your use case I can suggest one solution or another, or describe our plans to address the issue.

Basically we don't support "flags". Most of the current issue can be addressed by adding a custom configuration with platform rule. But these are not composable. There's workaround with buckconfigs and config_setting depending on that buckconfig, which is partly deprecated, but we don't have a replacement yet.

This is quite a fundamental divergence from what I'm accustomed to, and it makes me question whether buck2 is even compatible with our builds at all.

For some context, we currently use CMake. In CMake you can pass -DKEY=VALUE on the command line, and you can pass that over and over again with different keys and different values. And you can have any combination of these. So our entire build has been constructed with this assumption in mind, and it sounds like you're saying this simply isn't possible.

I probably can't list every possible way in which we rely on this, but here's a few:

  1. We allow the user to specify which santizer(s) they want to build with. -DUSE_SANITIZER=[address|undefined|thread].
  2. We allow the user to specify which build configuration they want. -DBUILD_TYPE=[debug|optimized|production]
  3. We allow the user to specify which product they're building (a single codebase can build several different products). -DPRODUCT=[client|server|secret_project].
  4. We allow the user to specify which kinds of hardening measures they want baked into the application. -DHARDENING=[none|minimal|production].

Number 3 is especially problematic, because part of the semantics of that one involve passing a define into every compilation unit, including shared code. This means that a build of two different products passes different flags to the same source files so the compiled code can't be shared between products, even though much of the source code is the same.

There are at least 20 more of these flags that we can pass to the build, and any combination of flags can be mixed and matched. The semantics of these flags are extremely diverse. They can mean anything from:

  1. Add a flag to the linker for a single target
  2. Add a define to every compilation unit
  3. Perform some additional packaging / installation steps at the end of the build
  4. Conditionally compile (or don't compile) some source files based on the value of the flag.
  5. Omit some build targets entirely, so that the targets don't even exist.

And more.

And of course, the semantics can also be compound, so that a single build flag can do a combination of the above semantics, including various things that aren't listed.

@thoughtpolice
Copy link
Contributor

thoughtpolice commented Apr 20, 2023

Hey Zach, good to see you around these parts; been a while! I'll try to give you a good answer to help get you moving along since I saw some other questions of yours...

Write a select() statement that passes --compiler-flag-1 if foo.bar == baz, pass --compiler-flag-2 if foo.bar == buzz, and generate a build failure if foo.bar is anything else?

Well, the statement itself is pretty simple, in theory, though I haven't tested this with the prelude:

cxx_binary(
  cflags = select({
    "cell//path/to:target": [ "-O2" ],
    "cell//path/to/other:target": [ "-O1" ],
  }) + default_cflags,
)

In the above example, cell//path/to:target is a configuration setting. But we might have an X/Y problem here. This probably isn't a good example so we should start over from the beginning, I think.

  • In something like make, you say make foo.exe to build the foo.exe binary. You call foo.exe a target. In Make, a target is, generally speaking a file. You might even say something like make foo/bar/baz.exe to build that file in a non-recursive Make build system. (Nit: sometimes they aren't files, though; it might be a .PHONY rule, which isn't a file but just a list of commands to run every time you ask.)
  • When you ask Buck to build something, you say something like buck2 build cell//foo/bar:baz. Now, Buck does build targets, but targets are not necessarily files. They are just names of things.
  • The syntax cell//path/to:target is a target pattern. It's a pattern because you can actually match multiple targets at once (more on that in a second)
    • The first part, the cell cell// is just an arbitrary name. It points to a directory in the repository. This is listed in the .buckconfig file, in the [repositories] section. You can basically create as many cells as you want — but there are two mandatory cells, one to refer to the root (often root = .) and one named prelude// you have to specify, at minimum.
    • You can use cells to group targets and code. It's purely logical to help namespace things.
  • The path/to/ part refers to a path within the cell. Just expand the name of the cell to get the path. If cell = foo/bar in the root .buckconfig, then cell//baz/ translates to foo/bar/baz/ when expanded, relative to the root of the repository. Very simple.
    • This path always is a directory, never a file.
    • It is expected this path contains a BUCK file to describe targets.
    • This path is often called the package of the target.
  • Finally, the :target part is, well, the target. But broadly speaking, the target is the name field of something listed in a BUCK file. Note it has a semicolon in front. When writing buck build path/to:target, the semicolon separates the two, but you also use the colon elsewhere (more in a second.)

For example if the following BUCK file exists at the path foo/bar/baz/BUCK:

cxx_binary(name = "test_cxx", ...)
rust_binary(name = "test_rs", ...)

We'd say that cxx_binary and rust_binary are rules. I prefer to think of a rule as something even simpler: it's just a function with arguments! And every time you call that rule, with some arguments, you create an instance of the rule. Every instance of a rule has a name — the name field of a rule instance gives it an externally visible name that can be mentioned in a target pattern. Again: the name of the target that you refer to in buck build is always listed in the name field of a rule. It's a special field that you cannot get rid of. Rule = function. Creating instance of rule = application of function.

Then there are two targets above: foo/bar/baz:test_cxx and foo/bar/baz:test_rs. So you could run buck build on those paths. Remember how I said target patterns are patterns? You can build them both using the ... pattern: buck build frob//path/to:... would build every target listed in the BUCK file in cell frob//, under path path/to. So it would build both of these binaries.

And there are lots of other target patterns you can use, try a bunch:

  • cell//path/to/a/package:name builds a target
  • cell//path/to/a:... builds every target in the package recursively
  • cell//path/to/a: builds every
  • cell//... builds every package and every target in the cell
  • //path/to/a/package:name — implicit cell resolves to root//
  • //... builds everything the default cell
  • ... builds everything in the default cell, even shorter

And some other various, more non-obvious ones, like : for example (all targets in the package in $CWD)


Before you can talk about configurations, you need to talk about providers. This is a very, very, very important part of the design of Buck, and a lot more things made sense once I realized this.

I mentioned make before as a point of contrast. I did so for a reason, because it creates big contrast. To sum up:

  • In make, a target (ignoring .PHONY targets!!!) always refers to a file.
  • In buck, a target doesn't refer to a file, but, in reality, it is a data structure returned from a rule.
    • That's why I said rules are functions. They take arguments, and return values! When you call a function, you expect it to return a value, don't you?
    • This data structure is actually a list.
    • And every element of that list is called a provider, which is a data structure containing information the instance of the rule
    • It's a list, because a single rule can return a lot of different pieces of information about what it should do.
      • For instance, a rule called cxx_library may have two things it outputs: the compiled library (in whatever form you choose), and a big ball of generated HTML doxygen documentation.
      • Therefore this rule would return two providers, one for the linkable library, and one for the doxygen output. A provider just looks like a normal "structure" or "tuple with named fields" in any language. The closest equivalent is probably a Python dataclass i.e. only named fields in a class.
      • Buck understands that dependencies between A < B (rule instance B depends on A) are actually, concretely, a piece of code in B needing a value from a provider returned by A. If cxx_library returns both a DoxygenHtmlProvider object and a CxxLinkableInfoProvider object, then something that depends on cxx_library can consume these objects any way they wish. They are just normal values in a programming language (called Starlark.)
      • This has a huge benefit: if you depend on the output of a cxx_library and only consume the CxxLinkableInfoProvider object, Buck will never build the doxygen docs. After all, you didn't need the DoxygenHtmlProvider, so why compute it?

It is very, very important to understand this. It is key to Buck's modularity. You can never simply look at a target pattern foo//bar:baz and understand what baz is without reading the BUCK file or running uquery or something. It isn't a file output! The string ovr_config//os:macos you brought up is absolutely a target, but it is not a file. It is a data structure. Big difference.

Providers — data structures — are the only way a rule can provide information to downstream rules that depend on it. In Make, a rule can only create a single file, which can be consumed by downstream rules, in contrast.

Providers are very important to understanding how the action graph is created after analysis, and are key to Buck's modularity — they're why you can have a python_binary depend on an ocaml_library which depends on a cxx_library and it all works out magically. But this is about as good of a summary as you need. I strongly suggest reading this: https://buck2.build/docs/rule_authors/writing_rules/ it's sparse but will give you an idea of what a rule looks like inside, and you can see how rules consume and create providers.


So now we know that ovr_config//os:macos is actually a target, and it can return data structures. What does this mean in practice for things like select()? In Buck, you have constraint settings, constraint values, and config settings.

  • There are constraint settings, like cpu, which can take on several possible constraint values, such as x86_64 and aarch64. You can also have a constraint like os which has win, lin, and mac.
    • These are represented by the rules constraint_setting and constraint_value, so they return providers.
    • Constraint settings use the ConstraintSettingInfo provider object, while constraint values use ConstraintValueInfo provider objects.
  • A configuration setting lists off constraint values that need to be satisfied. For example, the x86_64 configuration may specify that it needs the constraint cpu == x86_64. That example seems redundant, since it's one config for one constraint (you still need both, even in that case.) But it naturally extends to more constraints; the x86_64-linux configuration has constraint cpu == x86_64 and os == linux both at the same time, for instance.
    • Again, this is the config_setting rule, and it yields a ConfigurationInfo provider object.
  • You can also attach a configuration setting (containing N constraint values) to a platform, and a platform has a bunch of other metadata, too. Once you have a platform, you can attach that information to a target, and when building the target, all the constraints/configurations are applied.

My understanding of platforms and their ins-and-outs is poor still. This is also getting too long. But the upshot is that you can use select() on constraint or configuration settings, like in the example from @cjhopman or my original example above.

As a side note: actually, I think select() as a function only works on targets that return ConfigurationInfo objects. So why does it work on constraint values, AKA ConstraintValueInfo objects, like in @cjhopman's example? Read the source code to constraint_value_impl to find out why ;)

Ignore .buckconfig for the purposes of obtaining a value of foo.bar, and have the user write buck2 build <command-line-argument-that-forces-foo.bar-to-buzz>?

There is --config flag you can pass to buck2 build that will do this, e.g. buck2 build -c foo.bar=buzz and I believe that overrides it. I can't tell you if it needs to restart the buck2d daemon or anything, some config values require that.

See also #142 and the "configuration-at syntax" RFC linked within. This would allow you to conveniently set a specific configuration platform using syntax on the command line, which is basically what you want. In this case you would simply set a configuration by building a target like buck build //foo/bar:baz@config//platforms:linux-x86_64-O3. Then the rule implementing //foo/bar:baz would be able to select() the right things. But I don't think any of the features are there yet for that RFC. The example from @cjhopman is about as good as it gets.


This is way way way too long. But personally, in this case I'd probably structure my solution differently until some of those features get added — I'd just avoid configurations and build every version of the thing you want, all with different target names. This takes longer to build the whole package, but is more flexible right now. It also helps you ensure every version of a build is working; e.g. a change doesn't break ASAN builds or something. That's actually a big problem with the approach to using $(CFLAGS) like Make does — you can easily test something with make CFLAGS=-fsanitize=address and it works, check it in, and another build breaks (imagine you didn't #ifdef some sanitizer logic appropriately.) Or vice versa; the normal build breaks ASAN. In this case, all your builds would be integrated every time.

So instead, have foobar target be the default, and also foobar-O3 and foobar-O3-LTO and also foobar-fuzz-asan and so on and so forth... You can generate all these rules with code, by the way. Something like this in defs.bzl:

def cxx_multi_binaries(name, **kwargs):
    default_way = {
        "cflags": [],
        "ldflags": [],
    }
    other_ways = {
        "O3": {
            "cflags": [ "-O3" ],
            "ldflags": [ "-O3" ],
        },
        "LTO": {
            "cflags": [ "-O3", "-flto" ],
            "ldflags": [ "-fuse-ld=lld", "-flto" ],
        },
    }

    # generate a target named 'foobar-XYZ' for every 'way' above
    for (way_name, way_opts) in other_ways .items():
        cxx_binary(
            name = "{}-{}".format(name, way_name)
            cflags = way_opts.cflags,
            ld_flags = way_opts.ldflags,
            *kwargs,
        )

    # and also a default binary
    cxx_binary(name = name, cflags = default_way["cflags"], ldflags = default_way["ldflags", **kwargs)

Then, in the BUCK file:

load(":defs.bzl", "cxx_multi_binaries")

cxx_multi_binaries(
    name = "test",
    srcs = glob(["*.cxx"])
)

And then you'll have targets named :test, :test-O3, :test-LTO, etc. You'd just specify whatever target you wanted if you wanted a specific one. This is also a key advantage of Buck: you can just write Starlark functions to generate rules in whatever magical way you want! (Functions that purely call and instantiate rules are often called "macros", confusingly...) And these functions abstract everything away. The caller of cxx_multi_binaries does not have to have any knowledge of how flags are propagated or interpreted, for instance. An engineer can completely change its implementation without impacting callers, as long as some basic functional rules about APIs are respected (i.e. don't rename stuff like crazy.)

This is a long post but hopefully it'll help. Restructuring a build for Buck is fundamentally a lot of work. But it gives you a lot more power than CMake once you lean into it. Hope you're doing well, too!

@zjturner
Copy link
Contributor Author

Hey Zach, good to see you around these parts; been a while! I'll try to give you a good answer to help get you moving along since I saw some other questions of yours...

Haha, noticed you too in other places. Good to see you :)

Anyway, this is a very long answer, so thank you! Will take me some time to read / digest. That said, from a cursory skim, it sounds like this is exactly the kind of advice and guidance that should find a way into the documentation (or examples) in some form. Perhaps I'll take it as an exercise to the reader to produce such an example and make a PR updating the examples folder with new examples.

@thoughtpolice
Copy link
Contributor

thoughtpolice commented Apr 20, 2023

I'll also note something about the efficiency of my proposed solution. It's true that by representing each build as an individual target, rather than modifying CFLAGS, you have added more targets to build and test. If it takes 30 minutes to build a project, then adding the O3 and LTO way like my example will, in the worst case, increase build time by 200% relative to base. At best, it will increase build time by 0%, assuming perfect parallelism and each build is run independently. Except that:

  • You were already doing this anyway if you're integrating your builds properly, so there's no loss in reality. In practice, for an array of reasons from false positives to compiler bugs to actual legit issues or whatever, you need to run a separate CI job that tests with make CFLAGS=-O3 and make CFLAGS=-fsanitize=address on every commit.
  • In actuality, because these flags often imply rebuilds due to new flags, the buck solution is often better IMO because it can:
    • Take advantage of more parallelism, by building both foobar and e.g. foobar-O3 at once. Let's say you call buck build :... to build all the targets. If your foobar build has a part where parallelism decreases due to linking steps, buck can immediately begin building objects in the foobar-O3 target during that time. With make CFLAGS=-O3, it's all or nothing. You have to wait until the build completes fully and then run make CFLAGS=<something-else> afterwords.
    • Helps buck analyze every possible target output that can be built, which helps rule out some errors (e.g. invalid dependencies or code paths.)
  • Finally, assuming you have a remote cache built in, you won't spend lots of time rebuilding in the common, fast case, because you'll fetch objects from the cache. Basically ccache on steroids and needs no extra integration.
    • But even if you don't have a remote cache, the local cache is still better than what CMake offers by default. For example if you compile a single object file with -O3 and -fsanitize=address, those both exist in buck-out/ under unique names. The input cflags are calculated as being an input to the object file. So you can build the -O3 version, the ASAN version, then the -O3 version again, and the final call is instant. In contrast you need to explicitly use separate build directories with cmake or ccache to achieve this.

I call the above solution the "cartesian product" approach because rather than modifying the set of build parameters in place, you instead effectively represent the set of parameters to the build as a tuple (A, B, C, ...), calculate the cartesian product of that tuple A * B * C * ..., and then run the build once for-each one of these. Excluding various invalid scenarios. I've been using this approach for a while and I find it is often much better because in practice there aren't actually a billion parameters, normally there's about 5 or so. And in practice you integrate all of these "build variants" anyway on most commits. And even if you don't want that — just don't build it! Just select the specific target you want; build foobar-O3 and skip foobar-asan. Buck even has cquery and BXL which allows you to programmatically calculate these targets based on various conditions! This isn't appropriate for everything, but I've found it quite handy.

@zjturner
Copy link
Contributor Author

I like the cartesian product idea, we tried to do this with CMake at one point but gave up because CMake did not make this easy. My biggest concern is that now engineers who don't care about build systems and who just want to click a button and have things work suddenly have to start remembering this cartesian product.

Do you think it makes sense to say that developer only options used for testing and experimentation are better off being done via read_config (see @dtolnay's comment above), and configurations that are tested on CI could be done via a cartesian product approach?

@thoughtpolice
Copy link
Contributor

thoughtpolice commented Apr 21, 2023

Yeah, one issue is that people have to pick out what version of the build they want, in the cartesian approach. So if you want ASAN+UBSAN+LTO+Fuzzing, you might have to look for a target-with_really_long_target_name or whatever. It does become annoying for people who just want to test a single build and aren't necessarily using every variant.

I missed his comment originally but, the approach from @dtolnay is instead closer to what people would expect to do with make, so it has less "sticker shock." It's about as simple as it can be, I think? Since you also need to pick up ldflags and maybe some other settings, you could always put them into a .bzl file and then load() those values from the file, too. That also has an advantage that you don't have to keep repeating buck build -c ... every time:

dev_options = {
  "cflags": [ ... ],
  "ldflags": [ ... ],
  "defines": [ ... ],
}
load(":devopts.bzl", "dev_options")

cxx_binary(
  cflags = dev_options.cflags
  # and similar for ldflags, etc
)

In this example dev_options.bzl would just override whatever is defined by default, or augment it, etc. An advantage over putting this here instead of passing buck build -c cc.flags=$SOMETHING ..., or putting it in .buckconfig, is that this file isn't global! There are other tradeoffs with this variant — for example you could accidentally commit changes to devopts.bzl when it should always look like dev_options = {} in the repository; only a working copy being hacked on should be different. You could auto-reject commits that make changes to that file, for instance. Or use fail() inside the file somewhere, or something, if another thing isn't true e.g. if read_config("ci_system","in_ci", false) and dev_options != {}: fail("devopts.bzl can't be used in CI") or whatever.

Ultimately an exact 1-to-1 replica of the make-based interface created by CMake is just a pre-existing choice to adhere too, but it does have the huge advantage of being familiar. The lesson here though is your Starlark program is a program, so you have a bit of flexibility with how you arrange files and integrate them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants