Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All send attempts fail after aborting SendData with timed out callback #6260

Closed
AlCalzone opened this issue Sep 7, 2023 · 52 comments · Fixed by #6296 or #6343
Closed

All send attempts fail after aborting SendData with timed out callback #6260

AlCalzone opened this issue Sep 7, 2023 · 52 comments · Fixed by #6296 or #6343
Labels
bug Something isn't working
Projects

Comments

@AlCalzone
Copy link
Member

AlCalzone commented Sep 7, 2023

https://gist.github.com/robarnold/c94d61f91fa4c78002e87e4805c4ca95 has a log that shows this. We should consider soft-resetting the controller when this happens for a longer stretch of time.

@AlCalzone AlCalzone added the bug Something isn't working label Sep 7, 2023
@zwave-js-bot zwave-js-bot added this to Needs triage in Triage Sep 7, 2023
@Qwerty1979Swe
Copy link

I have this exact problem.
I have been strugeling for days trying to solve this my zwave-js crashes multiple times a day so i really cant use it.
I have tried changing from a pc to Rpi4 a new z stick lon and short usb cables and powered usb hub.
My log files looks exactly like this one.
Do you know of a fix or do i need to restart zwave js multiple times a day?

Many Thanks Jonas

@AlCalzone
Copy link
Member Author

@Qwerty1979Swe can you share your log? On second look the controller in the log above actually failed to send a callback before the infinite loop.

@AlCalzone
Copy link
Member Author

Another log: https://github.com/home-assistant/core/files/12593063/zwavejs_current.log
Again, controller got stuck and failed to callback. After that, all send attempts fail immediately.

@AlCalzone AlCalzone changed the title Controller jammed can result in an infinite loop All send attempts fail after aborting SendData with timed out callback Sep 13, 2023
@wljohnson05
Copy link

This could be the same issue I've had since the beginning of August as well.

@wljohnson05
Copy link

I just wanted to update this thread with my specs since I'm still having the problem...

Home Assistant is running in a VM on UnRAID version 6.12.4
Home Assistant 2023.9.2
Supervisor 2023.09.2
Operating System 10.5
Frontend 20230911.0 - latest

zwave-js-ui: 8.25.1
zwave-js: 11.14.2

Aeotec Z-Stick 7 - firmware version 7.19.3

@AlCalzone
Copy link
Member Author

I have a plan on how to work around this in node-zwave-js, but I'm not sure it will work first try. Also it's entirely possible that the 7.19.3 firmware is bugged.

@wljohnson05
Copy link

I have a plan on how to work around this in node-zwave-js, but I'm not sure it will work first try. Also it's entirely possible that the 7.19.3 firmware is bugged.

I'm up for testing anything I need to at this point since mine has been broken since at least the beginning of August.

It would have also had to have the same bug in older firmware as well. I had the issue on my original z-stick which was running a different version (honestly can't remember what the exact numbers were at this point) when this started. I upgrade the firmware on it at that point during some of my troubleshooting. Then when I first got this stick to replace it thinking maybe the stick itself was failing, it was running another older version as well. Then I upgraded it to the current version available, and still have the issue. There's a possibility that there's a firmware bug, but I hadn't upgraded the old z-stick any time in probably years when this issue first popped up for me.

I do have one question. Is it possible to back up the z-stick and restore to a different brand stick? Like if I picked up another 700 series stick, could I restore to that? That might be a last ditch effort if we can't find a fix. I'd rather not buy yet another stick, but if there is no software fix, I'd probably try more physical troubleshooting.

@AlCalzone
Copy link
Member Author

AlCalzone commented Sep 15, 2023

Yeah. With zwave-js you can even go back to 500 series, as long as the SDK version of the target firmware is 6.61 or higher.
Use NVM backup and restore in zwave-js ui.

@bubbzy
Copy link

bubbzy commented Sep 16, 2023

@wljohnson05 I have the exakt same problem and the exact same setup and versions. Tried updating the stick to firmware 7.19.4 today but that didn't fix the problem...

Have you guys made any progress?

@bubbzy
Copy link

bubbzy commented Sep 16, 2023

It seems to be a version 7.20.1.0 available but according to the dates that version was released before the one marked as "latest" wich is 7.18.8.0.

I don't get it? worth trying?

https://github.com/SiliconLabs/gecko_sdk/releases

@wljohnson05
Copy link

wljohnson05 commented Sep 17, 2023

It seems to be a version 7.20.1.0 available but according to the dates that version was released before the one marked as "latest" wich is 7.18.8.0.

I don't get it? worth trying?

https://github.com/SiliconLabs/gecko_sdk/releases

I just stuck with the newest guide on the aeotec site that I could find. That had the 7.19.3 on it.
https://aeotec.freshdesk.com/support/solutions/articles/6000263744-update-z-stick-7-with-z-wavejs-ui

I don't mind trying the newer version on there though if something pops up with a higher version number. What's the worst to happen...it already doesn't work.

@bubbzy
Copy link

bubbzy commented Sep 17, 2023

True but are we sure the problem is in the sticks firmware?

I would gladly buy another stick and backup and move if I knew i would work, is there a better stick out there without this problem?

Or is it just waiting för the Z-wave JS UI to update? Been havin trouble for several weeks now and really sick of it😃

@wljohnson05
Copy link

True but are we sure the problem is in the sticks firmware?

I would gladly buy another stick and backup and move if I knew i would work, is there a better stick out there without this problem?

Or is it just waiting för the Z-wave JS UI to update? Been havin trouble for several weeks now and really sick of it😃

I don't think anyone knows. That was just a potential issue. Mine has been acting up since before August. I have z-wave for every switch in my house with tons of automation built into double/triple taps that is basically unusable now after an hour or so. WAF has basically gone to zero on that. I definitely feel your pain on it not working. My z-stick is actually a new one that I bought while doing my initial troubleshooting (same as my original stick). Yesterday I went ahead and ordered the zooz stick that I'm going to try as well. At least if I'm still having the issues, we'll have more evidence towards it being a software issue with the plugin instead of the stick. Then I'll just have to figure out how to use the extra ones somewhere else.

@bubbzy
Copy link

bubbzy commented Sep 17, 2023

I have had it about the same but I didnt realize it because I rebuilt my z-wave network from the beginning with a new stick. I was having some problems with my old one but I walk right in to bigger trouble apparently...

Please tell me right away if the new stick works for you and if the backup/restore process from one stick to another works or gives you pain...

When the other issue similar to this got closed, is there no work being done to solve this?

@wljohnson05
Copy link

wljohnson05 commented Sep 17, 2023

I have had it about the same but I didnt realize it because I rebuilt my z-wave network from the beginning with a new stick. I was having some problems with my old one but I walk right in to bigger trouble apparently...

Please tell me right away if the new stick works for you and if the backup/restore process from one stick to another works or gives you pain...

When the other issue similar to this got closed, is there no work being done to solve this?

Will do. The only one I've been in any correspondence with is @AlCalzone, but I don't know if he's on the zwave js UI side or the zwave js side of development. He said it should be fine to restore to a different 700 series stick.

EDIT: Looks like AlCalzone is a zwave js developer, so hopefully he can find something.

@bubbzy
Copy link

bubbzy commented Sep 17, 2023

Ok thanks! To get closer quicker, do you want me to test with yet another stick?

You went for the zooz, I dont really know wich are avalible except theese two?

Only positive with this problem is that my network have never been faster(when it works). All the troubleshooting have lead to perfecting everything...to bad it crashes all the time😂😂

@wljohnson05
Copy link

wljohnson05 commented Sep 17, 2023

Ok thanks! To get closer quicker, do you want me to test with yet another stick?

You went for the zooz, I dont really know wich are avalible except theese two?

Only positive with this problem is that my network have never been faster(when it works). All the troubleshooting have lead to perfecting everything...to bad it crashes all the time😂😂

I'd say probably just wait. If a different stick shows the same issue, I'd say that proves more that it's a plugin related issue and not the sticks. If the zooz stick works fine, then we'll know it's a firmware issue that aeotec will have to fix.

Looks like my zooz stick will be here Wednesday, so I'll be testing it then once it arrives.

@wljohnson05
Copy link

zooz stick showed up today. I got my backup restored to it and everything has been up an running for about 80 minutes. We'll see how it is later tonight after it's been up and running for a few hours.

@wljohnson05
Copy link

wljohnson05 commented Sep 19, 2023

Well...300 minutes of home assistant being up, z-wave has stopped working again. I'm seeing the same jammed messages in the log, so I'd say we've proved that two different sticks are running into the same symptoms. @AlCalzone, did you find a fix in the code?

EDIT: Attached current log

zwave-js-ui-store.zip

@wljohnson05
Copy link

With the last thing I can try, I've stood up a bare metal home assistant server and restored a backup to it. I'm going to run that for the day and see if I have the same issues.

@AlCalzone
Copy link
Member Author

@wljohnson05 is this also on firmware 7.19.3? So far all reports of this that I've seen were on that version.

I was on vacation the last 7 days, will try to implement a workaround soon.

@wljohnson05
Copy link

@wljohnson05 is this also on firmware 7.19.3? So far all reports of this that I've seen were on that version.

I was on vacation the last 7 days, will try to implement a workaround soon.

This stick is running 7.19.2. I looked, but I couldn't find a version of 7.19.3 for that stick. Closest thing I could see to the other is a file with "7.18.3" in the name. Not sure if that's a typo or what, so I didn't try to push that.
https://www.support.getzooz.com/kb/article/931-how-to-perform-an-ota-firmware-update-on-your-zst10-700-z-wave-stick/

@AlCalzone
Copy link
Member Author

This stick is running 7.19.2

Oh man, then the entire 7.19 release line is just fucked.

@wljohnson05
Copy link

This stick is running 7.19.2

Oh man, then the entire 7.19 release line is just fucked.

Oh so you're thinking that was intentional, and that 7.19 has a major bug that's so bad that they are rolling back to 7.18?

@AlCalzone
Copy link
Member Author

Not sure, but I don't think Zooz ever officially recommended 7.19

@wljohnson05
Copy link

So it will essentially act the same way as if when mine stops working, I go in and do a soft reset on it? Do you need logs of that whole process of me doing that?

Would be nice.

I guess the only downside to a soft reset is that it will take the z-wave network down for a minute or two

Normally, this should take in the order of a few seconds. Soft-reset just restarts the stick, not the entire stack. There's a command for this in Z-Wave JS UI in the advanced panel.

I hadn't updated my stick firmware for probably years

Unlikely, since 7.19 came out in December 22: https://github.com/SiliconLabs/gecko_sdk/releases/tag/v4.2.0 and the likely first release breaking this in March 23: https://github.com/SiliconLabs/gecko_sdk/releases/tag/v4.2.2

there was an update somewhere in either zwavejs [...]

I recently added detection for a jammed stick with retries. Before that, the node would just be marked dead when in reality the controller was the issue. What you're seeing now is an unfortunate infinite loop because transmitting is retried when the controller is considered jammed. The assumption here is that it eventually stops being jammed / is able to transmit again, but in this case the controller is just in a state it never recovers from on its own.

I just got debug logging to a file set back up on it, and next time it acts up, I'll try just doing a soft reset from the GUI and see how that reacts and logs.

I hadn't updated that original stick I had until after I ran into issues. I honestly don't remember what it was on before that though. I hadn't had any reason to update it probably since I bought it when I moved to home assistant from homeseer running a z-net device (probably the best move I decided to do in the home automation world). That hasn't been years, but probably at least early last year I believe.

That makes sense. I had actually been running an automation that I found that kept track of any dead nodes and pings them if that number of devices changes, so I guess that's why I never really had an issue before.

@robarnold
Copy link

Should I be able to manually soft reset for now on 11.x when this occurs? Because that hasn't worked for me - I can leave the stick plugged in and restart ZUI to get it to start working again.

@asayler
Copy link

asayler commented Sep 22, 2023

I recently updates my Zooz ZAC93 controller from FW 1.0 (SDK 7.18.?, I think) to 1.2 (SDK 7.19.3) and I belive I've begun to see this same issue. I opened #6300 before I saw this issue and have attached logs and other details there. Zooz recently released the 1.2 (7.19.3) firmware at https://www.support.getzooz.com/kb/article/1158-zooz-ota-firmware-files/, so I expect a number of folks will be upgrading and hitting this issue. I can try to roll back to 1.1 (7.18.3) to see if that fixes it.

@AlCalzone
Copy link
Member Author

I can try to roll back to 1.1

Sadly, you can't.

@AlCalzone
Copy link
Member Author

Should I be able to manually soft reset for now on 11.x when this occurs?

I'm not sure if Z-Wave JS actually performs the soft-reset in that case, since it is still busy trying to get the other command done.

@wljohnson05
Copy link

wljohnson05 commented Sep 22, 2023

@AlCalzone , I'm seeing it work with the soft reset using the UI. It takes a little bit of time for it to actually reset, but eventually it came back to working, and any commands I had put it, caught up. It should be near the bottom of the log since I just did the fix this morning.
error
zwave versions
zwavejs_current.zip

I will say that it seems like the controller goes back to being unable to transmit pretty quickly though. When I restart the z-wave js UI plugin it usually works for several hours before seeing it have issues. The soft reset doesn't seem to clear that up as well (at least doing it manually).

@AlCalzone
Copy link
Member Author

Thanks for the test. Looks like the soft-reset helped for about an hour, so the workaround will at least do something.

@wljohnson05
Copy link

Thanks for the test. Looks like the soft-reset helped for about an hour, so the workaround will at least do something.

Yeah, and when the code is handling the resets, it might not even be noticeable that it's happening since that would catch it before a human would most likely.

@txwindsurfer
Copy link

Apologies if this isn't helpful but I am not having any of these issues on a 36-node network controlled by a Zooz ZST39 running firmware 1.20 SDK 7.19.3. HA 2023.9.2, Supervisor 10.5 on a Home Assistant Blue. Also, no problems on a 23-node network controlled by a Zooz ZAC93 (same firmware/SDK) on a Home Assistant Yellow. Both locations running Z-Wave-JS UI. Pretty much rock solid for over 2 months in both locations. My experience would indicate that this is not an issue with 7.19.3 on Zooz 800 sticks.

@asayler
Copy link

asayler commented Sep 23, 2023

@txwindsurfer I'm using a Zooz ZAC93 (with is the serial version of your ZST39) with a 95-node network and the issue is definitely present. It started as soon as I upgraded to the 1.20 (7.19.3) firmware from the stock 1.0 firmware. So the issue does impact Zooz 800-series sticks. There's likely some other variable at play here (perhaps certain types of messages being sent across the network) that triggers the controller to enter this state, and that must not occur in your setup. But I don't think it's because the Zooz 800 gear is immune to the problem.

@asayler
Copy link

asayler commented Sep 23, 2023

I don't suppose there's a way to trigger the soft reset from within HA that anyone knows of? I have a script the currently power cycles my HA Yellow when this starts to occur, but it would be faster to just soft reset the controller as a stop gap if there's a way to do that from within HA.

@asayler
Copy link

asayler commented Sep 23, 2023

Also, has anyone reported the issue to Zooz?

@millercentral
Copy link

FWIW, I have this issue with an Aeotec Z-Stick 700 on 7.19.2, using Zwave JS addon in HA. The stick had been flashed to 7.19 last December and the network had been stable until early August when this issue was first noticed. I rolled back the HAOS Zwave JS addon to 0.186 (which I think uses ZWave JS 11.9.2) and haven't seemed to have the issue since.

@AlCalzone
Copy link
Member Author

Also, has anyone reported the issue to Zooz?

Yep

@madbrain76
Copy link

I'm having this problem on a ZST39 800LR with FW: v1.20, SDK: v7.19.3 .
I can trigger the condition at will by trying to flash my most distant ZEN76 800LR switch. The flashing never completes, and the stick ends up in a bad state.

@AlCalzone
Copy link
Member Author

AlCalzone commented Sep 27, 2023

See my comment here, Z-Wave JS v12 includes a workaround, HA and Z-Wave JS UI will pick this version up very soon.

The actual fix needs to happen in the controller firmware - I'm told Silabs are on it and have it as a top priority.

If this continues to be an issue after updating, please open a new issue.

@jamiepenney

This comment was marked as off-topic.

@wljohnson05
Copy link

@AlCalzone , sorry to report...This didn't fix the issue for me. About the same amount of time before everything just stops getting commands. I've attached the log where it includes where I upgraded zwave js ui earlier today up through a few minutes ago. Looking at the log, it does have times where it said it was jammed, and then no longer jammed. I'm not sure if that is your workaround doing it's job or what, but eventually every command becomes a fail. Even manually triggering a soft reset in the gui doesn't fix it.
zwavejs_current.zip
firefox_JOqWc9yOaR
firefox_Wf0jwkYhvG

@AlCalzone
Copy link
Member Author

@wljohnson05 looks like the recovery mechanism isn't working properly if there are still commands on the queue --> #6342

@AlCalzone AlCalzone reopened this Sep 29, 2023
Triage automation moved this from Closed to Needs triage Sep 29, 2023
Triage automation moved this from Needs triage to Closed Sep 29, 2023
@wljohnson05
Copy link

Looks like with the latest release of HA there was a z-wave js update that came along with it. Since yesterday my z-wave network has been running fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Triage
Closed
10 participants