Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildGoModule: find build/test targets with go list #284568

Draft
wants to merge 9 commits into
base: staging
Choose a base branch
from

Conversation

katexochen
Copy link
Contributor

@katexochen katexochen commented Jan 28, 2024

Description of problem

buildGoModule uses the internal getGoDirs function to find targets to build/test:

getGoDirs() {
local type;
type="$1"
if [ -n "$subPackages" ]; then
echo "$subPackages" | sed "s,\(^\| \),\1./,g"
else
find . -type f -name \*$type.go -exec dirname {} \; | grep -v "/vendor/" | sort --unique | grep -v "$exclude"
fi
}

No/not all relevant tests executed when subPackage is set

If the subPackages attribute is set, we will call go test on the specified subpackages.

This problematic as a common pattern in Go is to place the main packages of binaries in a cmd directory, often without test files. The most common pattern to test Go could is likely executing go test ./... in the module root directory to recursively tests all packages of the module. When we only test the main packages, we are likely missing many relevant tests for that binary.

Using find to find targets for Go does not respect Go conventions

When using find to discover packages to build, Go conventions will be ignored. Most Go commands will stop to recurse into submodules, and the dependency derivation will only contain the dependencies of the main module.

As a result, such packages must currently be explicitly excluded. In the following example, vendored/toml is a submodule:

# Fix this build error:
# main module (github.com/glauth/glauth/v2) does not contain package github.com/glauth/glauth/v2/vendored/toml
excludedPackages = [ "vendored/toml" ];

Test performance

Right now we test each package in separate as we invoke the go test command for each package. This is slow and really bad for build time in cases where there are many packages but each package has only a few or no fast tests. While packages from different tests can run in parallel, tests within a package are executed sequentially by default. When we pass all the packages to the go command and just invoke it once, Go has the chance to parallelize test execution.

Description of changes

  • Use go list to discover packages, both for builds and tests.
  • For tests, we use a more sophisticated go list expression to identify the packages that our built targets actually depend on. This is an optimization over running tests for the full module which will save us some time when building a binary with only some local dependencies in a bigger Go module.

Open TODOs

  • Fix failing tests (~300-400). Mostly need to exclude tests with network etc.
  • Re-enable excludedPackages
  • Pass flags to go list
  • Add documentation
  • Migrate most packages from excludedPackage to subPackages (followup PR) and document that subPackages is prefered

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@kalbasit
Copy link
Member

kalbasit commented Feb 1, 2024

I like the idea, it should speed up Go builds. Thank you!

@katexochen
Copy link
Contributor Author

I've implemented excludedPackages and flags for go list commands and started fixing the many test failures. An early feedback on the general design of the changes to the builder would be much appreciated. Putting this PR from draft to ready-for-review so people know they can start reviewing, even there are still fixes for many packages missing (currently at 'd' in alphabetical order).

One point I thought about is whether we should move most of the functions we define in the builder into the configure flag. Currently, the checkPhase depends on buildPhase, so checkPhase will always fail when buildPhase if overridden. Maybe even into an external file, coding this amount of bash inline isn't that fun.

@katexochen katexochen marked this pull request as ready for review February 11, 2024 22:31
@kirillrdy kirillrdy self-requested a review February 28, 2024 10:33
@Ma27
Copy link
Member

Ma27 commented Mar 2, 2024

One point I thought about is whether we should move most of the functions we define in the builder into the configure flag

You mean configurePhase?

I think it might be preferable to put this into a setup hook and add this via nativeBuildINputs automatically. The hook can then define its own phase. Rust and Python subsystems are essentially doing that these days. This also has the benefit that Go builds become a little more composable: when having a derivation with multiple build steps, simply add the Go hook for the relevant parts[1].

Also, @Mic92 has brought up the valid concern about too long argument lists which I agree with. I think testing whether this would happen with e.g. kubernetes is a good indication on whether we need to take care of that or not. See also my suggestion about how to do that in #285362 (comment).

When these two things are done - I'm happy to assist if needed - I'm happy to help finding and fixing regressions, this is a very nice change that I'd really love to see in 24.05.

[1] Sure, there's still the topic on how to get dependencies then, but it's still a step forward.

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Mar 3, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Mar 3, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
@Ma27
Copy link
Member

Ma27 commented Mar 3, 2024

FWIW given what I've learned in #292997 I'm wondering if there's any case where you actually want subPackages to be set automatically.

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Mar 4, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Mar 15, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Mar 17, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
@katexochen
Copy link
Contributor Author

@Ma27 Thanks for your feedback, and sorry for reaching back late.

When these two things are done - I'm happy to assist if needed - I'm happy to help finding and fixing regressions, this is a very nice change that I'd really love to see in 24.05.

I won't/didn't find much time for working on nixpkgs this month as I'm moving. Also, the breakage that his PR causes is quite huge. I thought I could handle this with some support, but I didn't even to get my previous PRs on the builder merged that had far less impact, so I'm not that optimistic anymore. I don't think there is much chance to get it into 24.05, and I'm not even sure I'll continue work on this at all.

I think it might be preferable to put this into a setup hook and add this via nativeBuildINputs automatically. The hook can then define its own phase. Rust and Python subsystems are essentially doing that these days. This also has the benefit that Go builds become a little more composable: when having a derivation with multiple build steps, simply add the Go hook for the relevant parts[1].

I had similar thoughts while working on this PR. I'd love to work on a design for a build hook or buildGoModule2, but I'd likely draft that out of tree. If you'd like to collaborate on that let me know.

@Ma27
Copy link
Member

Ma27 commented Mar 19, 2024

No worries! I don't really mind if this will be delayed to 24.11 (or even 25.05 though I don't hope it'll take that long) as long as the situation improves eventually.

but I didn't even to get my previous PRs on the builder merged that had far less impact, so I'm not that optimistic anymore

Is that a prerequisite for this? Would you mind linking to it?

I've not been too involved with Go stuff, but after being diving into how our current builder works (after being too often too annoyed by how long Grafana used to take for a build), I'm rather interested in getting these improvements out.

My next todo for this will be to take a closer look what we're doing here. Is there anything important (short of fixing the fallout) do be done? Otherwise we might want to request a Hydra jobset so we can compare against staging-next which packages regressed

I had similar thoughts while working on this PR. I'd love to work on a design for a build hook or buildGoModule2, but I'd likely draft that out of tree. If you'd like to collaborate on that let me know.

I'd certainly be interested.
But considering what you wrote here, I think we may want to finish this first.

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Mar 23, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Apr 6, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Apr 14, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Apr 20, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
Comment on lines +30 to +45
checkFlags =
let
skippedTests = [
# Tests panic for unknown reason.
"TestPostWorkflowHookRunner_Run"
"TestPreWorkflowHookRunner_Run"
"TestNewServer"
"TestCommandRunnerVCSClientInitialized"
# Tests require Terraform, which is unfree.
"TestGitHubWorkflow"
"TestSimpleWorkflow_terraformLockFile"
"TestGitHubWorkflowWithPolicyCheck"
"TestDefaultProjectCommandBuilder_TerraformVersion"
];
in
[ "-skip=^${lib.concatStringsSep "$|^" skippedTests}$" ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add something like skipTestNames or so to automatically populate this like pytestCheckHook is doing

Comment on lines -307 to 364
lib.warnIf (buildFlags != "" || buildFlagsArray != "")
lib.warnIf
(buildFlags != "" || buildFlagsArray != "")
"Use the `ldflags` and/or `tags` attributes instead of `buildFlags`/`buildFlagsArray`"
lib.warnIf (builtins.elem "-buildid=" ldflags) "`-buildid=` is set by default as ldflag by buildGoModule"
lib.warnIf
(builtins.elem "-buildid=" ldflags) "`-buildid=` is set by default as ldflag by buildGoModule"
package
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo the formatting, to have less noise

go list \
-f '{{ .Dir }}' \
"''${flags[@]}" \
"''${subPkgs[@]}" | \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fallback this to ./... to avoid the duplication?

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request May 4, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request May 10, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request May 18, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
@RaghavSood RaghavSood mentioned this pull request May 20, 2024
13 tasks
Copy link
Contributor

@malt3 malt3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested those changes in some Go packages (downstream in flakes and the ones I maintain in nixpkgs). It should speed up builds / tests and I think this should be implemented (after being rebased).

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request May 26, 2024
The build itself is pretty quick now:

    buildPhase completed in 2 minutes 46 seconds

in contrast to

    buildPhase completed in 5 minutes 22 seconds

before on the same machine (Hetzner AX41 NVMe with
AMD Ryzen 5 3600 6-Core Processor and 64GiB RAM).

Downside of that is that no tests are now executed, but I'm inclined to
make that sacrifice for now. Especially considering that a fix for that
is on the horizon with NixOS#284568.

(cherry picked from commit f220ea4)
@katexochen katexochen marked this pull request as draft June 14, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants