Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can configs and constraints be restricted to targets selectively? #385

Open
zjturner opened this issue Aug 14, 2023 · 12 comments
Open

Can configs and constraints be restricted to targets selectively? #385

zjturner opened this issue Aug 14, 2023 · 12 comments

Comments

@zjturner
Copy link
Contributor

We have an audio processing library internally, and this audio library only supports windows and mac. If building on Linux, the audio library should not be present. We'd like to represent this with a target_compatible_with, but the semantics of that are an AND, not an OR (in other words, we can't say target_compatible_with = linux | windows.

At the same time, I don't know how to selectively attach a set of constraint_values, or config_settings, to a target. I can stick them on the platform object when creating the top-level ExecutionPlatformInfo, but this is global to all targets.

A transition rule seems like it will work, however the documentation states that when dependencies are resolved, they are resolved with the transformed configuration. This would suggest that if I have A and B where B depends on A, and I apply the transition rule to B, then A will be built with the transformed configuration. This isn't what I want though, I want only B to have the transformed configuration because otherwise A will have configuration info that isn't relevant and is polluting the configuration hash.

Is this possible?

@cjhopman
Copy link
Contributor

you can express OR in target_compatible_with. It's something like this:

target_compatible_with = select({
  "linux": [],
  "windows": [],
  "DEFAULT": ["//some/never/set:constraint"],
})

@zjturner
Copy link
Contributor Author

Ok, that solves the immediate problem, but for educational purposes is there any way to say that I want a target B which depends on target A, and B gets some target-specific constraints added to its configuration that A doesn't know about?

Intuitively, it seems like distinct targets have different needs, so you would need a way to have domain specific constraints attached to targets but not their dependents or ancestors.

@zjturner
Copy link
Contributor Author

I wonder if this could be solved by having the transition impl function be allowed to return either a single PlatformInfo, or a 2-tuple of PlatformInfos. If it returns a single PlatformInfo, it behaves as it does today. If it returns a 2-tuple, then first item in the tuple is the configuration to be used for instantiating the specified target, and the second item in the tuple is the configuration to be used for instantiating dependent targets.

Unless there's already a way and I'm missing something.

@ndmitchell
Copy link
Contributor

To expand a bit on what @zjturner is saying, I think this quickly ends up as the configuration trimming problem that we discussed a fair bit at the very start of the Buck2 project. Imagine you have a messaging library that depends on a core library. Imagine the messaging library has 3 different settings (m1, m2, m3), and the core library has 3 different settings (c1, c2, c3). You could imagine:

  • Specify all the settings as constraints. But then they go into the constraints and you end up with 9 different flavours of everything, even things which have no dependencies on the messaging or core library. Nothing will share the cache between the 9 flavours.
  • Specify them as .buckconfig and transform them to constraints with a transition as you depend on the messaging or core libraries. This is more complex but you end up only changing the targets below a transition. But it does mean you get 3 copies of the messaging library and 9 of the core library.

How did we end up doing the Java/native split without duplicating all Java libraries on two platforms? We had the idea that you might observe what configuration was actually used, and then somehow trim the configuration to just the used subset, and then add sharing. The ideas were fairly complex, and even though I think I believed them at the time, within a month or two of our initial discussions I couldn't understand them.

@zjturner
Copy link
Contributor Author

zjturner commented Aug 28, 2023

Maybe it's impossible due to the design, but why can't there just be constraints passed directly into the instantiation of the rule?

cxx_library(
    name = "foo",
    ...
    # Put these constraints only on this target
    constraints = [],

    # Put these constraints on this target and all targets which depend on this target.
    exported_constraints = [
    ]
)

Transition rules are a bit of a pain to use, but if the above doesn't work, then you could imagine the same thing with the transition rule solution. All I really need is a way to make the transitioned constraints not be transitively inherited by the downstream targets.

The documentation describes a transition rule as "applying a function to the configuration and using the output for the target and all dependent targets". All that has to happen is for there to be a way to opt into semantics of "apply a function to the configuration and using the output for the target, but not any dependent targets".

@cjhopman
Copy link
Contributor

cjhopman commented Aug 28, 2023

For exported constraints, that certainly should be handled by transitions, and if it's difficult to use transitions in that way it just points to improvements needed in the transitions api rather than an entirely new category of behavior on targets. But using it for that is particularly risky. If "foo" there depends on "bar" and "baz" also depends on "bar" but not via a dep on "foo", they'd actually get different instances of "bar" because "baz" wouldn't have the constraints that foo added. In the way you specified it here, it sounds like it'd apply to all deps including random tools used for generated code (say, appearing in srcs) and for toolchain deps.

I would pretty much discourage trying to go that route at all though, I think I'd need to see a pretty fleshed out use case to be convinced that it doesn't actually have a bunch of bad behavior that would make it not work at all in practice. Adding exported_constraints to all deps in a particular attribute (or set of attributes) has less risk, but will probably still be a challenge in practice.

The self constraints thing is more interesting. I could certainly see it being useful to have sort of target-local or package-local constraints. I think there should be some consideration if the usecase for that would be better served by extending the package values api (i.e. should we add select over package values? add target-level equivalent of package values?) or by extending the (already quite complex) configurations apis.

@zjturner
Copy link
Contributor Author

I actually don't understand the package system at all, so I guess that's something for me to investigate. I also realized shortly after posting that putting the constraints directly on the target instantiation probably wouldn't work, because the whole point of a constraint is to be able to select() the value of other attributes based on the value of the constraint. But if the constraint is itself determined by a select(), then you have an ordering / multi-pass issue, and if the constraint value is not determined by a a select() then it's a constant, and you didn't need it anyway.

Ultimately self-target-only constraints is the use case though, so I'd be happy with anything that moves closer to that.

@ndmitchell
Copy link
Contributor

The PACKAGE files are basically BUCK files which apply recursively - so if you have foo/BUCK, foo/bar/BUCK and foo/bar/baz/BUCK, and also foo/PACKAGE and foo/bar/baz/PACKAGE then the first PACKAGE file would apply to all three BUCK files, and the last PACKAGE file to only the last BUCK file. In a PACKAGE file you can't define targets, but you can set bits of configuration.

I'm starting to wonder if just using .buckconfig values directly and skipping select might solve this problem. If you have [messaging-library] foo = bar in your .buckconfig, and your BUCK file for the messaging library does a read_config("messaging-library", "foo") then it can use that information directly. The thing you lose compared to select configuration is you can't have part of the messaging library built at foo=bar for one user and part built at foo=baz for another user. But if you don't want that, just using .buckconfig might be feasible? I wonder if PACKAGE specifying local constraints isn't just equivalent to .buckconfig?

@zjturner
Copy link
Contributor Author

How does buckconfig interact with a remote cache?

my understanding is that the target configuration hash doesn’t change when you use buckconfig, so your remote cache is going to end up with whatever version was specified in the user’s buckconfig who built it last.

Maybe it’s not a problem in practice, but at that point it seems like it’s effectively a constant, so why even have a buckconfig setting for it?

@ndmitchell
Copy link
Contributor

The remote cache isn't poisoned by buckconfig. The buckconfig doesn't change the output path, but if you change the -D C++ defines or similar those will be reflected in the command line, and then you'll get a different action digest. There is deliberately no way to poison the remote cache.

The remote cache is indexed by digest, which is the full command line and the hash of all inputs. The target configuration hash changes the output path, and the output path always ends up in the command line somewhere (otherwise you wouldn't know where to put the output), so target configuration changing does mean the digest changes. But that's more a side effect than how we track it.

@zjturner
Copy link
Contributor Author

Ok, so IIUC if Alice builds messaging-library with [messaging-library] foo = bar and Bob builds messaging-library with [messaging-library] foo = baz, the output paths for Alice and Bob will be the same on disk, but Alice will correctly get the cache entry for foo = bar and Bob will correctly get the cache entry for foo = baz? Furthermore, you're saying that the target configuration hash is part of the cache key, but the actual command line is another part, and so if a buckconfig value changes the command line, it will still be keyed differently.

If I understood correctly, then what do you mean exactly by having "part" of the messaging library built differently? Are you referring to a situation like this?

cxx_library(
    name = "foo",
    sources = ["a.cpp"] + select({
        "config//foo:bar": ("b.cpp", "-O0"),
        "DEFAULT": ("b.cpp", "-O1")
    })
)

So in this case, two different users can share the artifact for a.cpp regardless of the value of config//foo:bar? But if this were done with a .buckconfig, they would not? Or did you mean something else?

@ndmitchell
Copy link
Contributor

Furthermore, you're saying that the target configuration hash is part of the cache key, but the actual command line is another part, and so if a buckconfig value changes the command line, it will still be keyed differently.

The target configuration hash isn't part of the cache key, only the command line and input files are. But, the target configuration hash always shows up in the command line as -o output-path or whatever, so in practice the target configuration hash ends up being part of the command line which is part of the cache key. But conclusion is correct - if a buckconfig value changes the command line or inputs, it will still be keyed differently.

As to your example, if you have different configuration that the select works on, the the output path of a.cpp will be different (as it contains the target configuration hash in the output path), so you won't share a.o or b.o. The values of b.o will be meaningfully different, the values of a.o won't be meaningfully different.

If you change to a .buckconfig you get:

cxx_library(
    name = "foo",
    sources = ["a.cpp"] + [("b.cpp", "-O1" if read_config("options", "should_i_optimise_b") else "-O0")]
)

Now you will share a.cpp because it has the same command line, output path etc. You won't share b.cpp because they have different command lines but the same location. There are some downsides to this change:

  1. If you flip the value of should_i_optimise_b regularly you end up invalidating some of the build graph.
  2. You can't have a build where say the messaging library depends on optimised b and the video library depends on unoptimised b. There is a single value for should_i_optimise_b across the whole build.

But those downsides don't seem a massive problem for you. Internally we use .buckconfig for release vs debug, and then configurations for things like arm vs x86 since a Java binary might want to depend on both versions of native code. We think we should move to configurations for release/debug, but Buck1 compatibility means that will be a long time coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants