-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HEALTHCHECK
to recover from dump1090 failures
#29
Comments
Hi @mik3y, thanks for the idea and for the detail. I'm looking to implement this over the next day or so. When your container is unhealthy, can you let me know the exit code of You can issue the following:
I'm interested in the number that is printed from the Something else to consider - when your SDR disconnects/reconnects, does it have the same USB device path? It looks like from your Thanks. |
OK, I've added the first iteration of the healthcheck script in 34bda93. I'm just waiting on the build to finish, however hopefully by the time you read this you should be able to try I'm also running this, however I use an external Beast provider (no USB radio mapped as-per https://github.com/mikenye/docker-readsb/wiki/Guide-to-ADS-B-Data-Receiving,-Decoding-and-Sharing,-Leveraging-RTLSDR-and-Docker), so I can't test for your exact use case. If you could run this in anger for a couple of days and let me know if this is stable and fixes your problem, it would be appreciated. Assuming all is good, I'll merge the changes into the master branch. Thanks! |
Awesome! I will give this a try.
Exit status is
Good question! I'm stepping around that by giving the container the whole usb device tree:
|
Worked like a charm - almost! After updating to your tag, I restarted the services and then manually pulled the USB device. That triggered the wedge. Soon after, the health check reported its first failure:
Why it's only almost: Unfortunately I had a slight misunderstanding about this feature; docker does not automatically restart an unhealthy container. It seems a ~simple workaround folks have devised is to kill I guess that would work, but it does mean the first failure will immediately restart the container, as opposed to waiting for a |
Instead of having the container kill itself, may I suggest that you consider willfarrell/autoheal instead (as-per the article you linked, and I also use this personally). I'm not sure that killing the init process is a good idea, and may break existing deployments if a user doesn't have Let me know your thoughts. |
Oh yeah, that's a way better approach & decoupling. I'll give that a shot! Meantime I'll let this run for the next few weeks and report back if the health check doesn't do its thing. |
Thanks very much! |
Hi @mik3y, how's it been running for you? |
No issues thus far! |
OK cool. I'm going to merge these changes into master so they'll get built into the |
Issue
I seem to have a slightly unreliable SDR (or perhaps pi/power supply). Once every week or two, the USB device disconnects and promptly reconnects. This can be seen in
dmesg
:This seems to leave dump1090 in a bad state, in a loop like this:
Bouncing the container, i.e.
docker-compose restart
, does the trick every time.Possible solutions
No question to me this is a bug in
dump1090
's device management, i.e. not reinitializing/re-probing after the disconnect or otherwise grabbing a handle to the 'new' device. But I don't have any familiarity with that code..As a blunt instrument, we could instead add a Docker
HEALTHCHECK
directive in order to cause docker to detect and restart the unhealthy container.One idea would be to run & parse the
piaware-status
status command. Looks like it detects both conditions:Unhealthy state:
Healthy:
I started down the road of putting together a PR, but realized I don't know enough how to do this correctly/without breaking other setups (eg whether any of the other stuff in
piaware-status
is relevant). Let me know what you think!The text was updated successfully, but these errors were encountered: