Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Earthly build issues and feedback #5750

Open
10 tasks done
negz opened this issue May 29, 2024 · 24 comments
Open
10 tasks done

Earthly build issues and feedback #5750

negz opened this issue May 29, 2024 · 24 comments
Assignees
Labels
Milestone

Comments

@negz
Copy link
Member

negz commented May 29, 2024

What happened?

In #5711 we decided to try using https://earthly.dev for a release (i.e. until we ship v1.17).

I expect we'll probably run into issues immediately after cutting over the build system to a new tool, so I'm opening this to track any bugs we find in the new build setup that need to be fixed. General non-bug feedback is also welcome. 😄

@negz negz added bug Something isn't working build labels May 29, 2024
@jbw976 jbw976 added this to the v1.17 milestone May 29, 2024
@negz
Copy link
Member Author

negz commented May 29, 2024

Something that didn't occur to me until today - do we want to back port this change to our currently maintained release branches? If not, we'll be working with two build systems.

Perhaps that's okay while we run the experiment? We could update the release branches if and when we commit to Earthly.

@negz negz removed the bug Something isn't working label May 29, 2024
@negz negz changed the title Earthly build issues Earthly build issues and feedback May 29, 2024
@negz negz self-assigned this May 29, 2024
@negz
Copy link
Member Author

negz commented May 29, 2024

I've seen earthly/earthly#3736 once or twice today (but not before).

It seems like we can run with --disable-remote-registry-proxy to workaround this if it happens a lot. It's unclear to me what the downside of running without the remote registry proxy is.

@negz
Copy link
Member Author

negz commented May 29, 2024

Here's an example of a PR with a full remote registry cache hit.

Screenshot 2024-05-28 at 8 20 00 PM

You can see that the lint, check-diff and unit-test jobs all complete in less than a minute. This is because none of the build inputs changed between the latest master build and this PR. Earthly knows there's no point re-running them, so it skips them just like a docker build would with unchanged inputs.

On the other hand, the codeql, e2e, and publish-artifacts jobs don't benefit much in this run.

I'm pretty confident the publish-artifacts job doesn't benefit from cache because one of its inputs is CROSSPLANE_VERSION, which includes the current git commit. So any new git commit means the build inputs change.

ARG GOFLAGS="-ldflags=-X=github.com/crossplane/crossplane/internal/version.version=${CROSSPLANE_VERSION}"

It's not clear to me why the codeql and e2e test jobs run, though. It looks like the build step of the e2e tests is cached, but not the step that actually runs the tests. I would have expected them both to be fully cached.

Edit: I figured out why CodeQL and E2E tests weren't being cached. It was also due to CROSSPLANE_VERSION. See #5755.

@negz
Copy link
Member Author

negz commented May 29, 2024

Just realized we need to teach Renovate about earthly - #5753 (comment)

@negz
Copy link
Member Author

negz commented May 29, 2024

The E2E tests should be writing e2e-tests.xml for Buildpulse when they fail, but they don't.

SAVE ARTIFACT --if-exists e2e-tests.xml AS LOCAL _output/tests/e2e-tests.xml

I think this error has something to do with it:

unlazy force execution: process "/bin/sh -c EARTHLY_DOCKERD_DATA_ROOT=\"/var/earthly/dind/944cc3b666f2b62bc87a5159517f6f8d1619489218402a5931707a886c2d7dea\" EARTHLY_DOCKERD_CACHE_DATA=\"false\" EARTHLY_DOCKER_LOAD_FILES=\"\" EARTHLY_IMAGES_WITH_DIGESTS=\"crossplane-e2e/crossplane:latest@sha256:2427e9fcd06f0f5efde8c7157db825d08402e2239ce69d457272f9c96f3bd182\" EARTHLY_START_COMPOSE=\"false\" EARTHLY_COMPOSE_FILES=\"\" EARTHLY_COMPOSE_SERVICES=\"\" FLAGS='-test.failfast -fail-fast --test-suite ssa-claims' GOLANG_VERSION=1.22.3 GOPATH=/go GOTOOLCHAIN=local GO_VERSION=1.22.3 PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /var/earthly/dockerd-wrapper.sh execute /usr/bin/earth_debugger /bin/sh -c 'gotestsum --no-color=false --format testname --junitfile e2e-tests.xml --raw-command go tool test2json -t -p E2E ./e2e -test.v ${FLAGS}'" did not complete successfully: exit code: 1

I notice this happens every time an E2E test fails, but not when other things (e.g. earthly +lint) fail. I imagine it's something to do with WITH DOCKER.

Edit: Opened earthly/earthly#4173 for clarification.

@negz
Copy link
Member Author

negz commented May 29, 2024

ARG GOFLAGS="-ldflags=-X=github.com/crossplane/crossplane/internal/version.version=${CROSSPLANE_VERSION}"

If we could somehow inject this version after the binary was built it would help a lot with build caching.

No idea how to do that though. Maybe patch the binary? For the OCI image we could just set an env var or inject a file, but that doesn't help with the binaries, especially crank which is run outside a container.

@adamgordonbell
Copy link

If we could somehow inject this version after the binary was built it would help a lot with build caching.

Not quite what you are asking for, but you can move the arg down below the copy and it should save you redoing the COPY when the version changes but not the files. That may never happen, but the general principle of pushing the args down to right before they are used might be helpful in general.

@negz
Copy link
Member Author

negz commented May 29, 2024

@adamgordonbell Thanks! Unfortunately the bulk of the time is definitely spent in the RUN go build step, since we compile every PR for all supported platforms. Since we want to compile the version into the binaries, and we want to derive the version from the git commit it seems like a rebuild when the git commit changes is probably unavoidable.

@jbw976 I do wonder whether we really need to compile for every architecture on every PR, though. I presume it's only going to catch fairly rare cases where some change affects only certain platforms.

That said, the publish artifacts job is only really going to hold up the build in cases where nothing has changed and every other job (except fuzz testing 😒) is a no-op. For any PR that actually touches Go code the E2E tests are going to take longer. So maybe not worth optimizing for.

@negz
Copy link
Member Author

negz commented May 29, 2024

I've seen actions/runner-images#7897 maybe 5-6 times since we made the switch yesterday. Anecdotally mostly on E2E tests, but also once on codeql and once or twice on publish-artifacts. It seems like it might be a symptom of using too many compute resources for the runner. Unclear whether we're computing harder (more parallelism?) with the switch to Earthly, or if it's just coincidental timing and something's going on with GitHub Actions.

@negz
Copy link
Member Author

negz commented May 30, 2024

Renovate can't run earthly

I setup our Renovate GitHub Action to use https://github.com/earthly/actions-setup, but noticed Renovate was complaining it couldn't run earthly. Turns out our Renovate action is running renovate inside a Docker container.

It's also not going to find an Earthfile on the release branches. 🙃

@mergenci
Copy link
Member

As a feedback, I wanted to note that Earthly had once switched from the original MPL-2.0 to BSL, before reversing course (see license history). The design doc mentions the latter, but not the former. I wanted to let everyone know, in case it might affect our decision.

@negz
Copy link
Member Author

negz commented May 30, 2024

@chlunde
Copy link
Contributor

chlunde commented May 30, 2024

Not an issue, just surprising: https://docs.earthly.dev/docs/earthfile#options-4 - the _output binary crank and crossplane gets timestamp Apr 16 2020

@negz
Copy link
Member Author

negz commented May 30, 2024

@chlunde Thanks! I'm not opposed to setting that flag. I wonder if there are any downsides?

@jbw976
Copy link
Member

jbw976 commented May 30, 2024

Was just looking into installing latest from master for #5151 (comment), and made a couple observations @negz - let me know if you'd like me to dig into any of them!

❯ helm install crossplane \
--namespace crossplane-system \
--create-namespace crossplane-master/crossplane \
--devel --debug

install.go:178: [debug] Original chart version: ""
install.go:180: [debug] setting version to >0.0.0-0
Error: INSTALLATION FAILED: failed to fetch https://charts.crossplane.io/crossplane-1.17.0-rc.0.175.g93610dc7.tgz : 404 Not Found
helm.go:84: [debug] failed to fetch https://charts.crossplane.io/crossplane-1.17.0-rc.0.175.g93610dc7.tgz : 404 Not Found
  • https://marketplace.upbound.io/account/crossplane/crossplane has what looks to be PR builds being published to that repo. Not sure if that is new behavior, but in my mind at least I thought we were only publishing builds from master to there.
    • e.g. v1.17.0-rc.0.177.g8122f120 corresponds to 8122f120, which says "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

@negz
Copy link
Member Author

negz commented May 30, 2024

Thanks @jbw976!

instructions to install from master channel with helm in https://docs.crossplane.io/latest/software/install/#install-pre-release-crossplane-versions are failing

https://charts.crossplane.io/crossplane-1.17.0-rc.0.175.g93610dc7.tgz looks like an incorrect URL, right? I think it should be https://charts.crossplane.io/master/crossplane-1.17.0-rc.0.175.g93610dc7.tgz.

I'm guessing something is wrong with this target:

crossplane/Earthfile

Lines 410 to 432 in 93610dc

# ci-promote-build-artifacts is used by CI to promote binary artifacts and Helm
# charts to a channel. In practice, this means copying them from one S3
# directory to another.
ci-promote-build-artifacts:
ARG --required CROSSPLANE_VERSION
ARG --required CHANNEL
ARG HELM_REPO_URL=https://charts.crossplane.io
ARG EARTHLY_GIT_BRANCH
ARG BUCKET_RELEASES=crossplane.releases
ARG BUCKET_CHARTS=crossplane.charts
ARG PRERELEASE=false
ARG AWS_DEFAULT_REGION
FROM amazon/aws-cli:2.15.57
COPY +helm-setup/helm /usr/local/bin/helm
RUN --secret=AWS_ACCESS_KEY_ID --secret=AWS_SECRET_ACCESS_KEY aws s3 sync --only-show-errors s3:https://${BUCKET_CHARTS}/${CHANNEL} repo
RUN --secret=AWS_ACCESS_KEY_ID --secret=AWS_SECRET_ACCESS_KEY aws s3 sync --only-show-errors s3:https://${BUCKET_RELEASES}/build/${EARTHLY_GIT_BRANCH}/${CROSSPLANE_VERSION}/charts repo
RUN helm repo index --url ${HELM_REPO_URL} repo
RUN --push --secret=AWS_ACCESS_KEY_ID --secret=AWS_SECRET_ACCESS_KEY aws s3 sync --delete --only-show-errors repo s3:https://${BUCKET_CHARTS}/${CHANNEL}
RUN --push --secret=AWS_ACCESS_KEY_ID --secret=AWS_SECRET_ACCESS_KEY aws s3 cp --only-show-errors --cache-control "private, max-age=0, no-transform" repo/index.yaml s3:https://${BUCKET_CHARTS}/${CHANNEL}/index.yaml
RUN --push --secret=AWS_ACCESS_KEY_ID --secret=AWS_SECRET_ACCESS_KEY aws s3 sync --delete --only-show-errors s3:https://${BUCKET_RELEASES}/build/${EARTHLY_GIT_BRANCH}/${CROSSPLANE_VERSION} s3:https://${BUCKET_RELEASES}/${CHANNEL}/${CROSSPLANE_VERSION}
IF [ "${PRERELEASE}" = "false" ]
RUN --push --secret=AWS_ACCESS_KEY_ID --secret=AWS_SECRET_ACCESS_KEY aws s3 sync --delete --only-show-errors s3:https://${BUCKET_RELEASES}/build/${EARTHLY_GIT_BRANCH}/${CROSSPLANE_VERSION} s3:https://${BUCKET_RELEASES}/${CHANNEL}/current
END

Maybe the --url flag for helm repo index should be ${HELM_REPO_URL}/${CHANNEL}?

https://marketplace.upbound.io/account/crossplane/crossplane has what looks to be PR builds being published to that repo.

Yeah it's definitely not supposed to be publishing PR builds there, but it seems like it is - e.g. https://github.com/crossplane/crossplane/actions/runs/9299206909/job/25592661840?pr=5764. The "Configure Earthly to Push Artifacts" step should be skipped for PRs. Technically I gated it on the required credentials existing, since I thought no PRs had access to those. I guess some do, though. 🤔

- name: Configure Earthly to Push Artifacts
if: env.DOCKER_USR != '' && env.UPBOUND_MARKETPLACE_PUSH_ROBOT_USR != '' && env.AWS_USR != ''
run: echo "EARTHLY_PUSH=true" >> $GITHUB_ENV

@negz
Copy link
Member Author

negz commented May 30, 2024

#5765 should fix the Helm issue.

$ helm repo update

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "crossplane-stable" chart repository
...Successfully got an update from the "crossplane-master" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
$ helm install crossplane \
--namespace crossplane-system \
--create-namespace crossplane-master/crossplane \
--devel

NAME: crossplane
LAST DEPLOYED: Thu May 30 18:29:20 2024
NAMESPACE: crossplane-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Release: crossplane

Chart Name: crossplane
Chart Description: Crossplane is an open source Kubernetes add-on that enables platform teams to assemble infrastructure from multiple vendors, and expose higher level self-service APIs for application teams to consume.
Chart Version: 1.17.0-rc.0.195.g11333cf6
Chart Application Version: 1.17.0-rc.0.195.g11333cf6

Kube Version: v1.27.3

@negz
Copy link
Member Author

negz commented Jun 4, 2024

One small downside. Because we're building in a container, and because Earthly (and Docker) won't let you COPY from a parent directory, you can't use local replace statements very easily in go.mod.

For example when developing outside of a container you might use:

replace sig.k8s.io/e2e-framework => ../../kubernetes-sigs/e2e-framework

In theory you could COPY that directory into e.g. vendor/sig.k8s.io/e2e-framework and replace to that, but you can't because you can't copy from above the Earthfile directory.

@jbw976
Copy link
Member

jbw976 commented Jun 6, 2024

As mentioned in #5779 (comment), I was surprised that a plain earthly +build did not output the crossplane/crank binaries to the _output directory.

The _output dir is populated with these binaries if earthly +multiplatform-build or earthly +go-build is run, but not with a plain earthly +build.

A common workflow I've used in the past when working on the Crossplane CLI was just plain make build and then running the _output/<platform>/crank binary directly to test the changes. It would be nice if earthly +build also saved these binary artifacts to the _output directory just like make build used to.

@tampakrap
Copy link
Contributor

As mentioned in #5779 (comment), I was surprised that a plain earthly +build did not output the crossplane/crank binaries to the _output directory.

The _output dir is populated with these binaries if earthly +multiplatform-build or earthly +go-build is run, but not with a plain earthly +build.

A common workflow I've used in the past when working on the Crossplane CLI was just plain make build and then running the _output/<platform>/crank binary directly to test the changes. It would be nice if earthly +build also saved these binary artifacts to the _output directory just like make build used to.

Here is the PR to address this #5782

@jbw976
Copy link
Member

jbw976 commented Jun 18, 2024

I observed something today while testing #5786 that I don't understand yet - so I'm not sure it's a problem but was wondering what others thought.

Basically, the crossplane image digest changes across builds if a earthly prune is run. I'm wondering if that also means that two builds for an identical source commit hash/tag could also result in different image digests.

See the Image digest section of #5786 (review) for more details.

@jbw976
Copy link
Member

jbw976 commented Jun 18, 2024

I also wonder if we're seeing earthly/earthly#2823 and it is affecting us too:

❯ docker images --digests build-69fd693a/crossplane
REPOSITORY                  TAG                                      DIGEST    IMAGE ID       CREATED          SIZE
build-69fd693a/crossplane   master                                   <none>    898d135985cb   14 minutes ago   77.4MB
build-69fd693a/crossplane   master_linux_arm64                       <none>    898d135985cb   14 minutes ago   77.4MB
build-69fd693a/crossplane   v0.0.0-1718156806-69fd693a               <none>    898d135985cb   14 minutes ago   77.4MB
build-69fd693a/crossplane   v0.0.0-1718156806-69fd693a_linux_arm64   <none>    898d135985cb   14 minutes ago   77.4MB

@tampakrap
Copy link
Contributor

I also wonder if we're seeing earthly/earthly#2823 and it is affecting us too

2823 seems to be fixed by 2853 (probably not closed yet by accident). I believe the issue that we are seeing with the missing digests is earthly/earthly#2851 which is still open

@tampakrap
Copy link
Contributor

@jbw976 I tried the release-1.15 branch today and actually it has the same behavior: no digest is shown, plus the image ID changes after every build (after cleaning up the cache dirs):

➜ make build
...
11:31:43 [ OK ] docker build build-d52750be/crossplane-amd64

➜ docker images --digests build-d52750be/crossplane-amd64
REPOSITORY                        TAG       DIGEST    IMAGE ID       CREATED          SIZE
build-d52750be/crossplane-amd64   latest    <none>    946742876039   34 seconds ago   54.4MB

➜ rm -rf .cache .work _output
➜ docker rmi 946742876039
➜ docker system prune

➜ make build
...
11:35:10 [ OK ] docker build build-d52750be/crossplane-amd6

➜ docker images --digests build-d52750be/crossplane-amd64
REPOSITORY                        TAG       DIGEST    IMAGE ID       CREATED         SIZE
build-d52750be/crossplane-amd64   latest    <none>    e7eb5731e4e0   3 minutes ago   54.4MB

I understand that this is not ideal, but it doesn't seem like a regression of earthly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

6 participants