Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System completely hangs after upgrading from 11.5 to 12.0 #3206

Closed
gjobin opened this issue Feb 27, 2024 · 18 comments
Closed

System completely hangs after upgrading from 11.5 to 12.0 #3206

gjobin opened this issue Feb 27, 2024 · 18 comments
Labels

Comments

@gjobin
Copy link

gjobin commented Feb 27, 2024

Describe the issue you are experiencing

Followed this tutorial to initially install HAOS on TrueNAS scale as a VM.

Host :

  • TrueNAS-SCALE-23.10.1.3

Symptoms :

  • Console access (Spice) is unresponsive to keyboard.
  • Serial Shell is responsive to keyboard, but any command I issue seems to hang.
  • Web UI is loading some pages, but many are not loading properly.
  • I get notifications on the bottom left that integration XYZ are loading, but it eventually goes to a more generic one that keeps coming.
  • I tried installing 12.0 from scratch then restore my backup (partial or complete) with the same result after a bit.

Add-ons :

  • Advanced SSH & Web Terminal
  • Cloudflared
  • Studio Code Server

Integrations (Other than default) :

  • Cync Lights
  • Dreame Vacuum
  • Ecobee
  • HACS
  • Jandy iAqualink
  • Orbit B-hyve
  • Ring
  • Roku
  • Simplisafe
  • SmartThings
  • TP-Link Omada
  • Tuya

What operating system image do you use?

generic-x86-64 (Generic UEFI capable x86-64 systems)

What version of Home Assistant Operating System is installed?

11.5

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Upgrade
  2. Wait
  3. System Hanging

Anything in the Supervisor logs that might be useful for us?

Can't really do that once upgraded, cause the system is unresponsive, including the VM console.

Anything in the Host logs that might be useful for us?

Can't really do that once upgraded, cause the system is unresponsive, including the VM console.

System information

System Information

version core-2024.2.4
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.1
os_name Linux
os_version 6.1.74-haos
arch x86_64
timezone America/New_York
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 5000
Installed Version 1.34.0
Stage running
Available Repositories 1407
Downloaded Repositories 6
HACS Data ok
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 11.5
update_channel stable
supervisor_version supervisor-2024.02.0
agent_version 1.6.0
docker_version 24.0.7
disk_total 48.5 GB
disk_used 6.0 GB
healthy true
supported true
board ova
supervisor_api ok
version_api ok
installed_addons Studio Code Server (5.15.0), Advanced SSH & Web Terminal (17.1.1), Cloudflared (5.1.4), Home Assistant Google Drive Backup (0.112.1)
Dashboards
dashboards 1
resources 0
mode auto-gen
Recorder
oldest_recorder_run February 23, 2024 at 1:45 AM
current_recorder_run February 26, 2024 at 6:22 PM
estimated_db_size 37.74 MiB
database_engine sqlite
database_version 3.44.2

Additional information

No response

@gjobin gjobin added the bug label Feb 27, 2024
@fwartner
Copy link

Can confirm. Restults in crashes randomly.

@jmcollin78

This comment was marked as off-topic.

@sairon
Copy link
Member

sairon commented Feb 29, 2024

I tried installing 12.0 from scratch then restore my backup (partial or complete) with the same result after a bit.

Does it mean it only starts to happen after you restore the configuration, but the vanilla OS doesn't show these issues?

Can you share any details of the HW the TrueNAS OS is running on?

The symptoms are similar to out-of-memory issues, do you have any insights about the memory usage of the VM?

@sairon sairon added the hypervisor/kvm KVM related issues label Feb 29, 2024
@Silther

This comment was marked as off-topic.

@gjobin
Copy link
Author

gjobin commented Feb 29, 2024

I tried installing 12.0 from scratch then restore my backup (partial or complete) with the same result after a bit.

Does it mean it only starts to happen after you restore the configuration, but the vanilla OS doesn't show these issues?

Can you share any details of the HW the TrueNAS OS is running on?

The symptoms are similar to out-of-memory issues, do you have any insights about the memory usage of the VM?

I have not tried to create a fresh config on the fresh installation, but it did boot up and allow me to restore the configurations, yes. I also did not wait longer to validate if it would fail after a while, sitting waiting for initialization.

This is my host machine currently running all my apps and HOAS 11.5
image

My current 11.5 VM is configured this way
image

With the 12.0 OS crashing, I did try to bump both Minimum Memory Size and Memory Size to 6 GiB, without success.

EDIT : This is what the /config/hardware page shows in 11.5 :
image

@Silther

This comment was marked as off-topic.

@sairon
Copy link
Member

sairon commented Mar 4, 2024

@gjobin This really looks like the HA VM goes out of memory - the Memory graph in HA does show the actual memory consumption (without buffers/caches), so if it's hovering around 98%, it means it's getting out of memory and probably swapping heavily, showing the symptoms you describe. Here's memory usage of my instance, running way more custom integrations and add-ons than yours:

image

It can't be ruled out that the OS update triggered something to misbehave, for start I will start restarting HA in the safe mode to check if any custom integrations isn't to blame. But most likely the memory consumption was always on the edge even in 11.5 and with some of the recent changes it just went too high.

I also recommend setting the "Minimum memory size" and actual "Memory size" to the same value for the VM. I expect this to disable memory ballooning, i.e. the hypervisor will allocate the fixed amount of RAM instead of increasing it on demand. This can also rule out some lower-level issues.

@teijosantala
Copy link

I have similar issues, os crashes randomly (less than once per day).
Here is the call stack of latest crash:
VirtualBox_HA_04_03_2024_16_39_34

Seems to be related to usb. I took out my bluetooth adapter to see if that is the cause.

@gjobin
Copy link
Author

gjobin commented Mar 4, 2024

It can't be ruled out that the OS update triggered something to misbehave, for start I will start restarting HA in the safe mode to check if any custom integrations isn't to blame. But most likely the memory consumption was always on the edge even in 11.5 and with some of the recent changes it just went too high.

I also recommend setting the "Minimum memory size" and actual "Memory size" to the same value for the VM. I expect this to disable memory ballooning, i.e. the hypervisor will allocate the fixed amount of RAM instead of increasing it on demand. This can also rule out some lower-level issues.

It makes sense to allocated static memory, so I did set a fixed amount of RAM to 8GiB (for now) and here what it looks like now on 11.5. Interestingly I have added more integrations/plugins on 11.5 than when I started this thread. I am pretty new to HA and was still adding integrations.
image

How do I start in safe mode, after updating, once it is crashing and unresponsive ?

@gjobin
Copy link
Author

gjobin commented Mar 4, 2024

I just redid the update to 12 with 8GiB and it seems stable so far.

Here is current usage
image

Would you think it's Okay to bring it back to 4 GiB ?

@gjobin
Copy link
Author

gjobin commented Mar 4, 2024

Also, in between my initial report of the issue, there has been both, a Core and a Supervisor update. Iwonder if they might have fixed any potential Memory issue.

Edit : changed "Core and a Supervisor issue" to "Core and a Supervisor update"

@sairon
Copy link
Member

sairon commented Mar 4, 2024

I just redid the update to 12 with 8GiB and it seems stable so far.

Hmm, that looks good indeed. I wonder if there isn't something wrong with the ballooning driver in the newer kernel 🤔 If you're willing to do some more tests, could you set the "minimum memory size" to 512M again and check if it starts to eat the RAM again?

Would you think it's Okay to bring it back to 4 GiB ?

My guess is that it should be okay to do so. I'd say that many people run it on systems with that (or even lower) amount of RAM.

Also, in between my initial report of the issue, there has been both, a Core and a Supervisor issue. Iwonder if they might have fixed any potential Memory issue.

I am not aware of any recent issues in Core or Supervisor causing memory to leak, so likely not.

@gjobin
Copy link
Author

gjobin commented Mar 4, 2024

Changed it back to 512MiB /4GiB. System is hanging again.
Moved to 1GiB/8GiB, this is what I see on the hardware page:
image

It seems to me that your assumption is right.

And to further prove it, I set it back to 4GiB/4GiB without any issues.
image

Glad it's working for me now. But it seems at least my issue is reproductible. Which is always a good news .

@kevtuning
Copy link

Thank you for your investigation !

I have the same issue with proxmox from 12.0 I will definitely check my VM memory config when back home... (I know that there's 4GB allocated but I am not sure about minimum and I don't have access to it from the office)

@redzoro01
Copy link

Same issue when running 12.0 on virtual machine manager of a synology NAS. There is definitely a memory problem and several processes are killed by the kernel´s Out Of Memory Killer. You can see that on console messages. These are not allways the same processes. Sometimes it is even impossible to get a console connection and a complete virtual machine power cycle is required.
Downgrade to 11.1 or 11.5 and everything works fine.

@vuisme

This comment was marked as off-topic.

@xoatrash
Copy link

xoatrash commented Mar 18, 2024

Same here. Running on Synology VMM and got freezes randomly every few hours since the latest Update. Seems like it run out of memory, because I once got that message in the console.

IMG_8754

And the I got problems like:

ha > [ 7413.4383091 CIFS: VFS: server 192.168.20.2 does not advertise interfaces I 7413.440727] CIFS: VFS: server 192.168.20.2 does not advertise

or

ha > [30798.303328] systemd[1]: systemd-resolved.service: Watchdog timeout (limit 3min)! [30899.5774191 systemd-coredump[39466]: Process 109 (systemd-journal) of user 0 dumped core.

And sometimes there is no message, because the console is frozen.

This also happens with 12.1.

Copy link

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 17, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants