Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APT package restore for QEMU packages isn't working #44

Closed
crdoconnor opened this issue Aug 4, 2022 · 25 comments
Closed

APT package restore for QEMU packages isn't working #44

crdoconnor opened this issue Aug 4, 2022 · 25 comments
Assignees
Labels
bug Something isn't working

Comments

@crdoconnor
Copy link

crdoconnor commented Aug 4, 2022

Example here:

Workflow: https://github.com/crdoconnor/apt-cache-bug/blob/main/.github/workflows/example.yml

Actions: https://github.com/crdoconnor/apt-cache-bug/actions

There are 3 runs of the github actions. First is an expected failure. Second is an expected pass (apt-get install fixes the issue).

Third is an unexpected failure - the cache restore should restore everything as before, but it doesn't quite seem to. It might be missing some files, I guess?

Anyway, very grateful for any help you could provide.

@awalsh128
Copy link
Owner

Hi @crdoconnor, could you provide the run logs for the action?

@crdoconnor
Copy link
Author

crdoconnor commented Aug 5, 2022

logs_1.zip
logs_2.zip
logs_3.zip

If you fork the repo and tweak the README 2 times you should also be able to see the same effect.

@awalsh128 awalsh128 transferred this issue from awalsh128/cache-apt-pkgs-action-ci Aug 5, 2022
@awalsh128
Copy link
Owner

Sorry I didn't realize you set up that repo just to demonstrate the issue. Very helpful.

It looks like the OS is complaining that the uname binary is for a different arch. I am guessing this may be because I am not capturing the arch with the cached package information. It is probably caching a different arch version which causes the exec format error when the container calls uname.

I am not as familiar with podman and QEMU so would appreciate your input here. My initial guess is that it is caching a non-ARM version of one of the QEMU components. I will debug against the repo you provided in the meantime.

@awalsh128
Copy link
Owner

awalsh128 commented Aug 6, 2022

I've added the regression test to the CI (awalsh128/cache-apt-pkgs-action-ci@b969f31) just for dev branch as of now and can repro it.

https://github.com/awalsh128/cache-apt-pkgs-action-ci/runs/7707354683?check_suite_focus=true

You can delete your repo if you want.

@awalsh128
Copy link
Owner

Below are the affected packages

@awalsh128 awalsh128 changed the title APT package restore isn't working APT package restore for QEMU packages isn't working Aug 13, 2022
@awalsh128
Copy link
Owner

Iterative installation after install shows that qemu-user-static fixes issue as shown in the log 6_Run debug script after action execution..txt. Need to examine the file structure of the post install and post cache cases.

@awalsh128
Copy link
Owner

This package allows for multi-arch binary support in an emulator container. The package files themselves seems to match the restored files. I am guessing the other unknown would be any make artifacts aside from the files themselves.

Not a Debian package dev expert by any means but may see if I can understand a bit better. In the meantime I would suggest installing qemu-user-static independently.

@crdoconnor
Copy link
Author

crdoconnor commented Sep 3, 2022 via email

@awalsh128
Copy link
Owner

The Debian Policy Manual goes over how the control data is executed. postinst is probably what we want to look at. To extract the control data from a downloaded package.

dpkg -e /var/cache/apt/archives/<pkg>*.deb /tmp/<action>/<pkg>/control
cat /tmp/<action>/<pkg>/control/postinst

The one for this particular package is. Where the arg in this case is configure as noted in the policy manual.

#!/bin/sh
set -e
binfmt_update() {
  test configure = "$1" || return 0
  command -v update-binfmts >/dev/null || return 0
  # do not touch binfmts inside a container
  # should this be done by update-binfmts instead?
  if command -v systemd-detect-virt >/dev/null; then
     systemd-detect-virt --quiet --container && return 0
  fi
  grep -zqs ^container= /proc/1/environ && return 0
  for fmt in  alpha armeb cris hexagon hppa i386 m68k microblaze mips mipsel mipsn32 mipsn32el mips64 mips64el ppc ppc64 ppc64le riscv32 riscv64 s390x sh4 sh4eb sparc sparc32plus sparc64 x86_64 xtensa xtensaeb; do
    update-binfmts --import qemu-$fmt
  done
}
binfmt_update "$1"

I tried running this in a post restore test (log) but it encountered some errors. I am assuming there is something else I am missing because the post install didn't have errors on the initial install. It is interesting that it involves update-binfmts because this has interaction with the Linux kernel, and could hint at why a file restore isn't working.

I'm a bit uncomfortable with the idea of caching and running the post-install scripts because this would cut into the runtime performance of the action and its intended goal. I do want to solve this issue though and how we can most gracefully address it. I still need to replicate a fix though.

If you want to help, you can clone this entire repo, modify:
https://github.com/awalsh128/cache-apt-pkgs-action-ci/blob/master/devtools/debug/restore_post_action.sh
... and then run the "Debug action" workflow.

When I have some more time I will keep digging.

@awalsh128
Copy link
Owner

Actually I realized the script wouldn't be the same on my box because it is a different version.

Did this instead and it worked (run log).

#!/bin/bash -x
set -e
apt-get install --download-only qemu-user-static
mkdir /tmp/qus-ctrl
dpkg -e /var/cache/apt/archives/qemu-user-static_1%3a4.2-3ubuntu6.23_amd64.deb /tmp/qus-ctrl
sh -x /tmp/qus-ctrl/postinst configure
podman run --rm -it docker.io/arm64v8/alpine:3.14 uname -m

I need to give this issue some thought. Packages like these have some interaction with the kernel, which makes them a special case. I'd almost like to just put them on an explicit special handling list but we need something that scales well. Will think about this some more.

@awalsh128 awalsh128 added the bug Something isn't working label Sep 4, 2022
@awalsh128 awalsh128 self-assigned this Sep 4, 2022
@crdoconnor
Copy link
Author

crdoconnor commented Sep 4, 2022

My understanding from this is that postinst is just... skipped when restoring the packages? Is that correct?

If that's true then that would break a whole range of debian packages not just this one. It's pretty common to have a postinst, I think. It doesn't seem ungraceful to just try and run it every time for every package if there is one even if it does slow it down. It would probably enable a whole lot of packages to be useable with this action.

It raises the even thornier question of what to do about preinst.

Thank you for digging into this issue and for creating this action and debugging all of its thorny, dark corners :)

@crdoconnor
Copy link
Author

If you want to help, you can clone this entire repo, modify

I'm assuming since you've managed to replicate a fix you no longer want this?

Let me know if I misinterpreted though, or there's another way I can help.

@awalsh128
Copy link
Owner

awalsh128 commented Sep 6, 2022

That's right, you don't have to now that it has been replicated.

This action is a time saver but it is based on the assumption that restoring files is good enough, which as we know isn't totally true. A lot of the control data files work around this idea (chmod, other file generation, etc). So the foundation of the action is not entirely sound but is established assuming that any issues will be outliers. Some things to note:

  • All stats below were done on my WSL VM, with specs lower than GitHub runners.
  • Installed packages are downloaded to /var/cache/apt/archives.
  • Extracting control data from a Debian package on my VM was 7ms, extracting 1207 packages was 5s.
  • Analyzing my own cache, only 172 out of 1207 packages had postinst and preinst was 64.
  • It should be considered that of these *inst scripts, most are I/O based and not something dealing with a registry or the kernel.

Sorry, I didn't mean to imply running the postinst would be ungraceful. Only as a one off for this situation. I'd like to avoid preinst for now until it gets flagged as an issue, especially given the small usage and technical problems posed. As far as postinst, I'll run some benchmarks to see how it performs. Like all of my tests in this comment, they aren't exactly reliable given how fast and loose I have been playing with them. I'll need to hunt out some meatier packages with control data. In the meantime I'll see if I can introduce this as an experimental feature controlled with an input flag.

Happy to do the digging. Always like a good challenge and glad you find the action useful.

@awalsh128
Copy link
Owner

Hacking on a draft in dev and may have something ready for preview. I am not sure how broken it is but you can always switch your action to @dev and specify the execute_postinst input as true to test drive (which may run off a cliff and explode). Jokes aside, nothing should cause an issue with the cache since this is all restore behavior being experimented with.

@awalsh128
Copy link
Owner

After working with the experimental version, I think I have hit a hard spots. In my case I am testing the restore of xdot. It contains the following postinst script.

#!/bin/sh
set -e

# Automatically added by dh_python3:
if which py3compile >/dev/null 2>&1; then
        py3compile -p xdot 
fi
if which pypy3compile >/dev/null 2>&1; then
        pypy3compile -p xdot  || true
fi

# End automatically added section

Unfortunately py3compile performs a dpkg lookup, giving me the error (verbose enabled).

+ echo 21:42:20 '- Executing /tmp/cache-apt-pkgs-action/restore/var/lib/dpkg/info/xdot.postinst...'
21:42:20 - Executing /tmp/cache-apt-pkgs-action/restore/var/lib/dpkg/info/xdot.postinst...
+ sh -x /tmp/cache-apt-pkgs-action/restore/var/lib/dpkg/info/xdot.postinst
+ set -e
+ which py3compile
+ py3compile -p xdot
dpkg-query: package 'xdot' is not installed
Use dpkg --contents (= dpkg-deb --contents) to list archive files contents.
Traceback (most recent call last):
  File "/usr/bin/py3compile", line 290, in <module>
    main()
  File "/usr/bin/py3compile", line 269, in main
    compile(files, versions,
  File "/usr/bin/py3compile", line 154, in compile
    for fn, versions_to_compile in filter_files(files, e_patterns, versions):
  File "/usr/bin/py3compile", line 106, in filter_files
    for fn in files:
  File "/usr/share/python3/debpython/files.py", line 71, in filter_public
    for fn in files:
  File "/usr/share/python3/debpython/files.py", line 53, in from_package
    raise Exception("cannot get content of %s" % package_name)
Exception: cannot get content of xdot

Going to think about this some more. We could just ignore failures or try to reflect the package state in Debian's APT. I am very reluctant to do anything with the latter since we will be reverse engineering APT at that point.

@crdoconnor
Copy link
Author

crdoconnor commented Sep 13, 2022

Sorry I haven't tried out the dev branch yet, I was busy. I will though!

We could just ignore failures or try to reflect the package state in Debian's APT. I am very reluctant to do anything with the latter since we will be reverse engineering APT at that point.

It kind of feels like that's a rabbit hole you dove into when you first started this project :)

Caching the APT database as well probably would fix this but I have a feeling once you've dealt with that one another package problem will rear its head.

I was thinking about this last night and it actually kind of occurred to me that the cleanest approach might be to simply just cache all the downloaded deb files themselves and install from scratch each time without needing a network. This would slow things down quite a bit but it would be guaranteed to work.

I actually originally came looking for you when I had a CI pipeline I wanted to finish in a hurry and the apt-get step took 4 minutes because downloads were slow that day rather than the usual 45 seconds. If it was consistently 25-30 seconds I'd still call that a win.

@awalsh128
Copy link
Owner

Yeah the whole endeavor is based an assumption that largely holds but not always as we have seen. apt-fast does a good job with download and installation if you are looking to optimize that process. As far as this action goes, I am going to lean towards keeping it at the cost of these cases like yours but we may still be able to find an approximation that could work. Understand if you want to just move forward.

@crdoconnor
Copy link
Author

Hmm, @dev still seems to be netting me an exec format error when I tried it on a real workflow.

@crdoconnor
Copy link
Author

crdoconnor commented Sep 18, 2022

apt-fast looks like something that will download packages faster by opening multiple connections, which is not quite what I was looking for. Moreover it would require an additional download step to install apt-fast.

I was more thinking of something that would "apt-get install [package] -d", which would download to /var/cache/apt/archives, then cache the /var/cache/apt/archives folder and then on restore, would do apt-get install, bypassing the download step.

@crdoconnor
Copy link
Author

crdoconnor commented Oct 11, 2022 via email

@awalsh128
Copy link
Owner

Hey, sorry I haven't responded for awhile. I should be able to take a look this week. Thanks for sharing your results.

@awalsh128
Copy link
Owner

It doesn't seem to be picking up on the presence of the postinst script in the package when in the restore phase. Probably an incorrect evaluation. Will need to look further.

@awalsh128
Copy link
Owner

I was able to debug some of the install script behavior to work correctly but the regression is still failing. The execute script feature is in dev. Let me know if you see the same results on your side.

uses: awalsh128/cache-apt-pkgs-action@dev
with:
    packages: yourpackage
    version: 1
    execute_install_scripts: true

WARNING: The outputs are broken on the action. This shouldn't matter though if you don't use them.

@awalsh128
Copy link
Owner

Here are my latest findings. TL;DR is that we may not have a solution from this action.

#57 (comment)

awalsh128 added a commit that referenced this issue Nov 24, 2022
* Fix permission denied error.

* Fix permission denied error. (#51)

* Remove compression from file caching. (#53)

* Draft of postinst support from issue #44.

* Remove bad option.

* Removed extraneous line.

* Cover no packages edge case when writing manifest.

* Fix postinst bugs and add docs to lib.

* Made cache directory variable and more refinements to postinst.

* Update deprecated option.

https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

* Rollback accidental commit of new postinst feature.

* Minor edit ands full install script execution FR.

* Fix execute_install_scripts message to show the right param name.

* Fix param check.

* Minor fix to doc.

* Upload action logs for debugging.

* Make artifact names unique.

* Add debug option.

* Update description.

* Debug package list issue.

* Rollback 76128c6

* Revert outputs set behavior to see if it fixes outputs issue in dev.

* Restore updated outputs behavior. So strange it is working when I revert.

* Fix bugs in install script execution.

* Add error suppression on file testing.

* Debug feature.

* Link to the issue that started the postinst troubleshooting.

* Describe action version usage.

* Fix package outputs command.
awalsh128 added a commit that referenced this issue Nov 24, 2022
…ed fix. (#65)

* Execute installation scripts and debug mode features. (#64)

* Provide the ability to call Debian package manager installation scripts (i.e. `*.[preinst, postinst]`).
* Introduce a debug mode that runs the scripts in verbose mode and uploads the logs for retrieval.
* Updated README to reflect new features and provided more info on how to use the action versions.

* Dev (#66)

* Fix permission denied error.

* Fix permission denied error. (#51)

* Remove compression from file caching. (#53)

* Draft of postinst support from issue #44.

* Remove bad option.

* Removed extraneous line.

* Cover no packages edge case when writing manifest.

* Fix postinst bugs and add docs to lib.

* Made cache directory variable and more refinements to postinst.

* Update deprecated option.

https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

* Rollback accidental commit of new postinst feature.

* Minor edit ands full install script execution FR.

* Fix execute_install_scripts message to show the right param name.

* Fix param check.

* Minor fix to doc.

* Upload action logs for debugging.

* Make artifact names unique.

* Add debug option.

* Update description.

* Debug package list issue.

* Rollback 76128c6

* Revert outputs set behavior to see if it fixes outputs issue in dev.

* Restore updated outputs behavior. So strange it is working when I revert.

* Fix bugs in install script execution.

* Add error suppression on file testing.

* Debug feature.

* Link to the issue that started the postinst troubleshooting.

* Describe action version usage.

* Fix package outputs command.
awalsh128 added a commit that referenced this issue Nov 24, 2022
* Fix permission denied error.

* Fix permission denied error. (#51)

* Remove compression from file caching. (#53)

* Draft of postinst support from issue #44.

* Remove bad option.

* Removed extraneous line.

* Cover no packages edge case when writing manifest.

* Fix postinst bugs and add docs to lib.

* Made cache directory variable and more refinements to postinst.

* Update deprecated option.

https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

* Rollback accidental commit of new postinst feature.

* Minor edit ands full install script execution FR.

* Fix execute_install_scripts message to show the right param name.

* Fix param check.

* Minor fix to doc.

* Upload action logs for debugging.

* Make artifact names unique.

* Add debug option.

* Update description.

* Debug package list issue.

* Rollback 76128c6

* Revert outputs set behavior to see if it fixes outputs issue in dev.

* Restore updated outputs behavior. So strange it is working when I revert.

* Fix bugs in install script execution.

* Add error suppression on file testing.

* Debug feature.

* Link to the issue that started the postinst troubleshooting.

* Describe action version usage.

* Fix package outputs command.

* Execute installation scripts and debug mode features. (#64)

* Provide the ability to call Debian package manager installation scripts (i.e. `*.[preinst, postinst]`).
* Introduce a debug mode that runs the scripts in verbose mode and uploads the logs for retrieval.
* Updated README to reflect new features and provided more info on how to use the action versions.

* Restore execute bit for script.

* Fix execute bit on script.
@awalsh128
Copy link
Owner

Published the latest version of the action. Going to close this out.

https://github.com/awalsh128/cache-apt-pkgs-action/releases/tag/v1.2.0

@awalsh128 awalsh128 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants