Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker-compose up returns resin-init error #46

Open
puccaso opened this issue Dec 29, 2021 · 8 comments
Open

Docker-compose up returns resin-init error #46

puccaso opened this issue Dec 29, 2021 · 8 comments

Comments

@puccaso
Copy link

puccaso commented Dec 29, 2021

Hello.

I am running the container inside ubuntu, and the image although the image seems up, the system gets stuck at a loop between the supervisor starting up and starting libcontainer.

_1  | [  OK  ] Started DNS forwarder and DHCP server.
os_1  |          Starting Balena Application Container Engine...
os_1  |          Starting Resin proxy configuration service...
os_1  |          Starting Hostname Service...
os_1  | [FAILED] Failed to start Resin init service.
os_1  | See 'systemctl status resin-init.service' for details.
os_1  | [  OK  ] Started Hostname Service.
os_1  |          Starting Network Manager Script Dispatcher Service...
os_1  | [  OK  ] Started Network Manager Script Dispatcher Service.
os_1  | [  OK  ] Started Resin proxy configuration service.
os_1  | [  OK  ] Started Balena Application Container Engine.
os_1  |          Starting Balena supervisor...
os_1  |          Starting Load balena healthcheck image...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Started Load balena healthcheck image.
os_1  | [  OK  ] Started libcontainer conta…745d4ae5626c43d75e9836f37bb74.
os_1  | [  OK  ] Started libcontainer conta…d7efc105230bb319508a1b5d078cf.
[  OK  ] Stopped Balena supervisor.
os_1  |          Starting Balena supervisor...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Stopped OpenVPN.
os_1  |          Starting Prepare OpenVPN...
os_1  | [  OK  ] Started Prepare OpenVPN.
os_1  | [  OK  ] Started OpenVPN.
os_1  | [  OK  ] Started libcontainer conta…2c709edb0797d08d637bf7c9fea05.
[  OK  ] Stopped Balena supervisor.
os_1  |          Starting Balena supervisor...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Started libcontainer conta…9e1c648e573cba083734aaf754cbe.
os_1  | [  OK  ] Stopped OpenVPN.
os_1  |          Starting Prepare OpenVPN...
os_1  | [  OK  ] Started Prepare OpenVPN.
os_1  | [  OK  ] Started OpenVPN.


then I get other errors before the process seems to complete.

os_1  |          Starting Balena supervisor...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Started libcontainer conta…413b0ee5ee749cd38ab3a2a45ab2f.
os_1  | [  OK  ] Stopped OpenVPN.
os_1  |          Starting Prepare OpenVPN...
os_1  | [  OK  ] Started Prepare OpenVPN.
os_1  | [  OK  ] Started OpenVPN.
os_1  | [ TIME ] Timed out waiting for device /dev/zram0.
os_1  | [DEPEND] Dependency failed for Enab…sed swap in memory using zram.
os_1  | [ TIME ] Timed out waiting for device /dev/ttyS0.
os_1  | [DEPEND] Dependency failed for Serial Getty on ttyS0.
os_1  | [  OK  ] Reached target Login Prompts.
os_1  | [  OK  ] Reached target Multi-User System.
os_1  |          Starting Update UTMP about System Runlevel Changes...
os_1  | [  OK  ] Started Update UTMP about System Runlevel Changes.

I can get to the console of the container, but i cant seem to see the system on the cloud dashboard.

Mem: 2234672K used, 1420568K free, 70948K shrd, 155736K buff, 1072308K cached
CPU:   7% usr   3% sys   0% nic  89% idle   0% io   0% irq   0% sirq
Load average: 0.59 0.51 0.51 5/613 3381
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
  275   242 root     S     867m  24%   0% balena-engine-containerd --config /var/run/balena-engine/containerd/
 3278  3243 root     R     4248   0%   0% top
  242     1 root     S     939m  26%   0% /usr/bin/balenad --experimental --log-driver=journald -s overlay2 -H
 1210     1 root     S     721m  20%   0% {runc:[2:INIT]} balena-engine-runc init
  696     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
  796     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 1415     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 1522     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 2143     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 2947     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init

any ideas?

puc

@puccaso
Copy link
Author

puccaso commented Dec 29, 2021

i worked out that the resin-init-service error has something to do with
resin-init-board not completing successfully

lsblk: /dev/sda3[/var/lib/docker/volumes/balenaos-in-container_boot/_data]: not a block device

when i goto the volume dir in var/lib, i can see the config.json in the boots _data dir i know thats working.

i've tried configs for both generic x86 and intel nuc. i can't seem to progress beyond this point.

@puccaso puccaso changed the title Docker-compose rails resin-init and loops at Balena supervisor Docker-compose up returns resin-init error Dec 29, 2021
@klutchell
Copy link
Contributor

This is the snippet of code that is failing:

# make sure the bootstrap code (boot.img) is removed in case we are using EFI boot
if [ -d /sys/firmware/efi ] ; then
    device="/dev/"$(findmnt --noheadings --canonicalize --output SOURCE /mnt/boot/ | xargs lsblk -no pkname)
    dd if=/dev/zero of=$device bs=446 count=1
fi

https://github.com/balena-os/balena-intel/blob/master/layers/meta-balena-genericx86/recipes-support/resin-init/resin-init-board/resin-init-board

Still investigating the proper workaround for running in a container.

@jellyfish-bot
Copy link

[klutchell] This issue has attached support thread https://jel.ly.fish/5c6f6bf4-0bc4-4f2f-b97e-eb66ee51573e

@zumby
Copy link

zumby commented Jul 6, 2022

@puccaso Hi. Did you manage to run balenaos-in-container on Linux server with docker somehow?

I'm struggling with these errors still. Perhaps this has something to do with the way aufs or overlay is configured on Host server, or the cgroups configuration

balenaos-in-container-os-1  | Failed to attach 21 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/dev-hugepages.mount: No such file or directory
balenaos-in-container-os-1  |          Mounting Huge Pages File System...
balenaos-in-container-os-1  | Failed to attach 21 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/dev-hugepages.mount: No such file or directory
balenaos-in-container-os-1  | Failed to attach 22 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/sys-kernel-debug.mount: No such file or directory
balenaos-in-container-os-1  |          Mounting Kernel Debug File System...
balenaos-in-container-os-1  | Failed to attach 22 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/sys-kernel-debug.mount: No such file or directory
balenaos-in-container-os-1  | Failed to attach 23 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/kmod-static-nodes.service: No such file or directory
balenaos-in-container-os-1  |          Starting Resin NTP server configure service...
balenaos-in-container-os-1  |          Starting DNS forwarder and DHCP server...
balenaos-in-container-os-1  | [  OK  ] Started DNS forwarder and DHCP server.
balenaos-in-container-os-1  | [  OK  ] Started Resin NTP server configure service.
[ TIME ] Timed out waiting for device /dev/zram0.
balenaos-in-container-os-1  | [DEPEND] Dependency failed for Enab…sed swap in memory using zram.

@klutchell maybe you have some advice

@klutchell
Copy link
Contributor

@zumby have you checked to make sure you are using cgroups v1 and not v2 as per the readme?
https://unix.stackexchange.com/questions/619681/how-can-i-find-out-what-version-of-cgroups-i-have

Does the behaviour change if you use a different OS release?
https://github.com/balena-os/balenaos-in-container/blob/master/docker-compose.yml#L9

@zumby
Copy link

zumby commented Jul 6, 2022

@klutchell thanks for response

FYI - this is a simple EC2 on AWS with Amazon Linux on board (x86_64)

  1. cgroups seems to be fine, but im not that expert:

image

  1. As for the images, i've tried the default (from docker-compose.yml) which is 2.95.12_rev1-genericx86-64-ext
    Also tried these:
  • latest one - 2.98.33-genericx86-64-ext
  • intel-nuc latest cone - 2.98.33-intel-nuc

And today I've tried the very fresh one 2.99.27_rev2-genericx86-64-ext and 2.99.27_rev2-intel-nuc

FYI I've built them like that:

docker-compose build --build-arg OS_VERSION=2.99.27_rev2

The result for all of them is the same:
WARNINGS + ERROR + STUCK in the end.

image

image

In the BalenaCloud, it does add the device but it always has check_localdisk issues:
image

image

Generally, I want to achieve a device "simulation" or a "virtual" device that I can add to Balena. So the plan was to start this on EC2 in AWS and once it's in Balena - do other stuff with it, like deploy actual app. There is a slight chance it has something to do with the virtualisation at AWS EC2.

@klutchell
Copy link
Contributor

@zumby it looks to me like your device is booted, and you can ignore those warnings. They are expected when running balenaos-in-container due to the way systemd handles pids.

I would not expect your root partition to fully expand since it's a virtual docker volume, so our partition utilities cannot determine the maximum partition size. This is also expected.

Is there anything blocking you from using this device as a simulation? You could also flash the genericx86-64-ext image directly to an AWS instance if you want to skip the extra layer of virtualization. Maybe this forum post can help get you started: https://forums.balena.io/t/host-balena-os-on-aws-as-a-virtual-device/304444

@zumby
Copy link

zumby commented Jul 6, 2022

@klutchell interesting. i actually never moved forward yet (hehe) - but i will surely try to do other things and see if it blocks me.
As for putting image directly into AWS - that seems harder to me and yet not too clear. But we'll see.

Thanks for the help and i'll comment more if i notice some troubles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants