Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build, run: record hash or digest in image history for sources used in --mount #5691

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

flouthoc
Copy link
Collaborator

When using --mount=type=bind or --mount=type=cache the hash or digest of source in these flags should be added to image history so buildah can burst cache if files on host or image which is being used as source is changed.

Closes

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake
/kind other

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

build, run: record hash or digest in image history for sources used in `--mount`

When using `--mount=type=bind` or `--mount=type=cache` the hash or
digest of source in these flags should be added to image history so
buildah can burst cache if files on host or image which is being used as
source is changed.

Signed-off-by: flouthoc <[email protected]>
Copy link
Contributor

openshift-ci bot commented Aug 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@flouthoc
Copy link
Collaborator Author

Need to add tests, will undraft then.

@flouthoc
Copy link
Collaborator Author

Will rebase after this #5693

@nalind
Copy link
Member

nalind commented Sep 17, 2024

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory) to account for different permissions/ownership/datestamps/xattrs and to safely handle soft and hard links.

@flouthoc
Copy link
Collaborator Author

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory)

If i'm understanding this correctly, did you mean instead of computeDirectoryHash or computeFileHash separately and recursively I should just create a temporary archive of the context dir and write its hash in history ? And on everyrun I can do the same process again to check if hash is matching with what is written in the history ? The idea sounds good to me.

@nalind
Copy link
Member

nalind commented Sep 17, 2024

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory)

If i'm understanding this correctly, did you mean instead of computeDirectoryHash or computeFileHash separately and recursively I should just create a temporary archive of the context dir and write its hash in history?

I wouldn't expect the archive to be written anywhere, but the digest of an archive is something we already use as a way of describing contents, when handling COPY and ADD instructions. I don't know yet about doing this over the entire build context or additional build context if only a portion of it is being used at that point (i.e., if "src" is set to a subdirectory).

@flouthoc
Copy link
Collaborator Author

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory)

If i'm understanding this correctly, did you mean instead of computeDirectoryHash or computeFileHash separately and recursively I should just create a temporary archive of the context dir and write its hash in history?

I wouldn't expect the archive to be written anywhere, but the digest of an archive is something we already use as a way of describing contents, when handling COPY and ADD instructions. I don't know yet about doing this over the entire build context or additional build context if only a portion of it is being used at that point (i.e., if "src" is set to a subdirectory).

This sounds good to me, i will amend the PR.

@sanmai-NL
Copy link

@flouthoc

I'm seeing this behavior with podman build, will this be fixed by this change too?

ARG PATH_1=mydirectory
ARG SELINUXRELABEL=,z
ARG DISTRO=PATH_1
RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?} \
  echo Nop

Subsequently, /tmp/mydirectory ends up in an image layer rather than being ephemeral and only mounted at build time. This bloats the image and leaks information.

@flouthoc
Copy link
Collaborator Author

flouthoc commented Oct 3, 2024

RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?}

@sanmai-NL Are contents of PATH_1 ending up in your final built image as well ? A layer is created so that i can be used later but I don't think contents of RUN --mount are part of your image.

@sanmai-NL
Copy link

sanmai-NL commented Oct 4, 2024

Thanks for your response. Yes indeed, they do end up in there, which I find surprising. And which is the reason for my comment.

Some factors which may cause this in case the problem isn't general
This happens when running podman build from a containerized Podman. Maybe ID mapping also causes some sort of copy operation instead of a bind mount.

@sanmai-NL
Copy link

RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?}

@sanmai-NL Are contents of PATH_1 ending up in your final built image as well ? A layer is created so that i can be used later but I don't think contents of RUN --mount are part of your image.

Re-reading your comment... So you do find this ending up in a layer by-design, do you? But this way, information leaks and the image bloats.

@flouthoc
Copy link
Collaborator Author

RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?}

@sanmai-NL Are contents of PATH_1 ending up in your final built image as well ? A layer is created so that i can be used later but I don't think contents of RUN --mount are part of your image.

Re-reading your comment... So you do find this ending up in a layer by-design, do you? But this way, information leaks and the image bloats.

@sanmai-NL What you are describing is a different bug, would it be possible to create a small reproducer and open a new issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants