Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

home-assistant.local thread border router disappears #3189

Closed
itpeters opened this issue Aug 23, 2023 · 6 comments
Closed

home-assistant.local thread border router disappears #3189

itpeters opened this issue Aug 23, 2023 · 6 comments

Comments

@itpeters
Copy link

Describe the issue you are experiencing

Hi,

I'm experiencing an issue where my home-assistant.local thread border router disappears from my thread network details page. The message displayed is "No border routers where found, maybe the border router is not configured correctly. You can try to reset it to the factory settings." (Unrelated to this, "where" should read "were".)

Screenshot 2023-08-23 14 27 00

I am using the Silicon Labs Multiprotocol add-on with the incorporated OTBR.

Even once this happens, my configured matter over thread devices continue to function correctly inside HA. I have not tried to configure a new matter over thread device in this state, as I don't have any more matter over thread devices.

I am able to fix the issue by restarting the Silicon Labs Multiprotocol add-on. Every time I have observed the border router disappearing, I have also found the add-on consuming 25% CPU (on a 4-core Intel NUC). Restarting the add-on both resolves the high CPU utilization and causes the border router to reappear, but the issue will come back within a few minutes.

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

Almond

What is the version of the add-on?

2.3.1

Steps to reproduce the issue

  1. Start HA
  2. Wait for a few minutes
  3. Observe that the previously available home-assistant.local border router is gone (see attached screenshot above)

System Health information

System Information

version core-2023.8.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.4
os_name Linux
os_version 6.1.45
arch x86_64
timezone America/Denver
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 5000
Installed Version 1.32.1
Stage running
Available Repositories 1269
Downloaded Repositories 19
Home Assistant Cloud
logged_in true
subscription_expiration September 19, 2023 at 6:00 PM
relayer_connected true
relayer_region us-east-1
remote_enabled true
remote_connected true
alexa_enabled false
google_enabled true
remote_server us-east-1-8.ui.nabu.casa
certificate_status ready
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 10.5
update_channel stable
supervisor_version supervisor-2023.08.1
agent_version 1.5.1
docker_version 23.0.6
disk_total 228.5 GB
disk_used 49.1 GB
healthy true
supported true
board generic-x86-64
supervisor_api ok
version_api ok
installed_addons File editor (5.6.0), Terminal & SSH (9.7.1), Node-RED (14.4.5), Home Assistant Google Drive Backup (0.111.1), Z-Wave JS UI (1.15.6), Studio Code Server (5.10.1), ESPHome (2023.8.2), Mosquitto broker (6.2.1), MQTT Explorer (browser-1.0.1), Frigate Proxy (1.3), SQLite Web (3.9.2), Advanced SSH & Web Terminal (15.0.7), Silicon Labs Multiprotocol (2.3.1), Matter Server (4.9.0)
Dashboards
dashboards 5
resources 7
views 3
mode storage
Recorder
oldest_recorder_run August 16, 2023 at 2:39 PM
current_recorder_run August 23, 2023 at 2:53 AM
estimated_db_size 237.90 MiB
database_engine sqlite
database_version 3.41.2

Anything in the Supervisor logs that might be useful for us?

Nothing seems relevant or correlated to the right time raange.

Anything in the add-on logs that might be useful for us?

otbr-agent[300]: 00:00:00.105 [N] RoutingManager: BR ULA prefix: fd15:cc5d:48de::/48 (loaded)
otbr-agent[300]: 00:00:00.105 [N] RoutingManager: Local on-link prefix: fdc9:e5ac:bb3b:5de4::/64
otbr-agent[300]: 00:00:00.124 [N] Mle-----------: Role disabled -> detached
otbr-agent[300]: 00:00:00.144 [N] Platform------: [netif] Changing interface state to up.
s6-rc: info: service otbr-agent successfully started
s6-rc: info: service otbr-agent-rest-discovery: starting
[13:37:56] INFO: Successfully sent discovery information to Home Assistant.
s6-rc: info: service otbr-agent-rest-discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
Listening on port 9999 for connection...
Accepting connection.
Accepted connection 7.
otbr-agent[300]: 00:00:26.938 [N] Mle-----------: RLOC16 2800 -> fffe
otbr-agent[300]: 00:00:26.941 [W] Platform------: [netif] Failed to process request#5: Unknown error -95
otbr-agent[300]: 00:00:27.658 [N] Mle-----------: Attach attempt 1, AnyPartition reattaching with Active Dataset
otbr-agent[300]: 00:00:34.158 [N] RouterTable---: Allocate router id 10
otbr-agent[300]: 00:00:34.158 [N] Mle-----------: RLOC16 fffe -> 2800
otbr-agent[300]: 00:00:34.161 [N] Mle-----------: Role detached -> leader
otbr-agent[300]: 00:00:34.161 [N] Mle-----------: Partition ID 0x10b144f8
otbr-agent[300]: 00:00:34.168 [W] Platform------: [netif] Failed to process request#6: Unknown error -17
otbr-agent[300]: [NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary!
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::a84b:5ff:feff:29d4/vethb145f91/363
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::a84b:5ff:feff:29d4/vethb145f91/363
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::a84b:5ff:feff:29d4/vethb145f91/363
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::a84b:5ff:feff:29d4/vethb145f91/363
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::a84b:5ff:feff:29d4/vethb145f91/363
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::a84b:5ff:feff:29d4/vethb145f91/363

Additional information

@agners Opening a new issue per your comment on #3139. I'm currently in the strange state and I'm able to reproduce it seemingly at will, so please let me know if there's more information you need -- I'm happy to iterate on things!

@itpeters
Copy link
Author

The 2.3.2 update containing #3169 just showed up for me, but I'm going to hold off on updating until I hear something on this just in case there's value in debugging the current state.

@agners
Copy link
Member

agners commented Aug 24, 2023

It seems that mdnsd being at 100% CPU really makes it unusable for the otbr-agent (e.g. the agent tries to publish service via mDNS, but fails, see #3139 (comment)). I think it's not worth continuing to investigate with the old version. Let's try with 2.3.2 and the latest mDNSResponder, to see how things behave in that version.

@itpeters
Copy link
Author

I updated to 2.3.2 at 03:35 locally, and it's currently 08:00 here now, so 4.5 hours roughly. Everything has been working correctly since the upgrade. In contrast, yesterday while running 2.3.1 the border router would stop being recognized by HA after no more than 30 minutes.

@itpeters
Copy link
Author

Everything is still good here a day later.

@itpeters
Copy link
Author

itpeters commented Sep 4, 2023

Still rock solid after the update to the multiprotocol add-on 2.3.2. Seems like this did have the same root cause as the 100% utilization issue. Happy to have this closed or close it if no one objects.

@agners
Copy link
Member

agners commented Sep 4, 2023

Good to hear, thanks for the update!

@agners agners closed this as not planned Won't fix, can't repro, duplicate, stale Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants