Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/<uuid>/shim.sock": listen unix /containerd-shim/moby/<uuid>/shim.sock: bind: address already in use: unknown #643

kolbitsch-lastline · 2019-04-06T17:25:17Z

This is a bug report
This is a feature request
I searched existing issues before opening this one

I run containers using the "restart always" policy, but in some situations (the trigger is unclear to me at this point), a subset of containers fail to be restarted by the docker daemon.

In this example, I have a bunch of services that all have (almost) identical configs, and a random subset of service container are suddenly down (after days of running fine). :

root@analyst:~# docker ps | grep worker.000.802
root@analyst:~# docker ps -a | grep worker.000.802
6d504138f7f7        <my-image>    "/entrypoint.sh"         4 weeks ago          Exited (255) 8 days ago                                                                                                                                                                                                                                                                             worker-000-802_1

other container instances of the service are running fine (and are restarted every once in a while):

root@analyst:~# docker ps | grep worker.000.801
832f53c0f4ce        <my-image>    "/entrypoint.sh"         4 weeks ago          Up 28 minutes                                                                                                                                                                                                                                                                                 worker-000-801_1

When I try (for testing) to manually restart the container that the daemon failed to restart automatically, this fails:

root@analyst:~# docker start 6d504138f7f7
Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock": listen unix /containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock: bind: address already in use: unknown
Error: failed to start containers: 6d504138f7f7

Investigating the problem, I found that the unix socket mentioned above does not exist on the file-system, but the error message says "already in use", so I searched via lsof:

root@analyst:~# lsof -U | grep 6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f
docker-co 37032            root    3u  unix 0xffff88030db67800      0t0 502614215 @/containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock
docker-co 37032            root    6u  unix 0xffff880dd67da1c0      0t0 323429479 @/containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock

so, indeed the socket is in use, but not on the file-system... which makes me wonder if the process (PID 37032) actually removed it, but didn't properly close it (yet?) while shutting down?

stracing the process shows that it's currently waiting on a mutex:

root@analyst:~# strace -p 37032
Process 37032 attached
futex(0x7fd008, FUTEX_WAIT, 0, NULL

with no other behavior.

To test further, I decided to kill the process that's supposed to provide the unix socket, and now I can start the container successfully:

root@analyst:~# kill 37032

root@analyst:~# docker start 6d504138f7f7
6d504138f7f7
root@analyst:~# docker ps -a | grep worker.000.802
6d504138f7f7        <my-image>    "/entrypoint.sh"         4 weeks ago          Up 3 seconds                                                                                                                                                                                                                                                                                        worker-000-802_1

Expected behavior

Docker restart policy "always" always restarts a container.

Actual behavior

Docker restart policy "always" randomly fails after a service has been running longer periods of time (maybe because containerd does not correctly terminate/release the unix-socket)

Steps to reproduce the behavior

I have not been able to trigger the problem in a reproducible way, but I have seen dozens of instances over weeks of running services. Interestingly it happens on different services that are using completely unrelated images (aside that they have a common Debian-based base-image)

Output of docker version:

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        5f88b8b
 Built:             Fri Sep 28 15:50:02 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       5f88b8b
  Built:            Fri Sep 28 15:49:28 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 51
 Running: 50
 Paused: 0
 Stopped: 1
Images: 7
Server Version: 18.06.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 134
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 3.13.0-157-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 62.87GiB
Name: analyst
ID: AKNM:4XYS:MIJI:G2E6:5DRO:MP2I:Q2MY:CXPE:WDJW:MI4D:WS32:O3ON
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Physical host , under constant and high load. The containers that show the problem have memory limits in place using docker-compose:

version: "2.4"

services:
  worker-000-801:
    image: "<my-image>"
    network_mode: "host"
...
    mem_limit: 4294967296

Note that I'm using a private container registry, which is why I decided to replace the image data with my-image.

The only potentially-related bug I managed to find online is this:

moby/moby#38726

The text was updated successfully, but these errors were encountered:

1028866041 · 2019-05-09T03:16:14Z

I've got the same issue in my project...

slhck · 2019-05-24T13:58:44Z

Same problem here with Docker updates that don't re-start the containers.

Running docker rm for the affected containers and re-creating them works, but is not very ideal.

Edit: Gave myself a +1 a few months later because I had the same issue and found my own answer as a solution…

sergiomafra · 2019-05-24T17:03:57Z

Same problem here with Docker updates that don't re-start the containers.

Running docker rm for the affected containers and re-creating them works but is not very ideal.

I've got the same problem here and I can't remove it, unfortunately.

ravensorb · 2019-05-29T14:56:59Z

Any update one this? I too am hitting it

sergiomafra · 2019-05-30T17:40:59Z

Any update one this? I too am hitting it

The solution that worked for me was to destroy the container and create a new one with the same volume from the old one.

windli2018 · 2019-06-19T02:57:49Z

Try find the docker process and kill it will resolve the issue:

for pid in $(ps -ef|grep docker|awk '{print $2}'); do
lsof -p $pid|grep $uuid
done
kill -9 <pid>

chenz-svsarrazin · 2019-06-25T06:12:02Z

I also have this problem whenever the docker package in Ubuntu gets updated. Not sure whether this is a problem with the packaging or with docker itself.

limadm · 2019-07-10T13:22:42Z

@chenz-svsarrazin Thanks! An apt update + upgrade worked and I didn't need to recreate the container. (Ubuntu 18.04.2 LTS + Docker version 18.09.7, build 2d0083d).

Rich43 · 2019-07-14T19:14:41Z

Reproduced on Ubuntu 18.10, not sure what caused it but all my servers/containers randomly went down. Could have been an update.

slhck · 2019-07-14T19:23:32Z

There was the 18.09.7 update a few days ago (security update) which restarted the Docker service and, for me, brought down four web servers and corrupted one database. Regular start-up didn't work due to these errors.

airomyas · 2019-08-19T11:59:57Z

kill -9 $(netstat -lnp |grep containerd-sh |awk '{print $9}'|cut -d / -f 1)

sanekmihailow · 2019-08-22T09:16:51Z

I tryed kill pid and downgrade docker but it don't help me
My solution:

upgrade docker from 18.0.9.2 to 18.09.7 (build 2d0083d)
upgrade docker-compose to 1.24.0
and i start container but recieve same error:

# docker start freepbx
...
bind: address already in use: unknown
Error: failed to start containers: freepbx

then i reboot
After reboot at me the error is gone and container start

ajay-awachar · 2019-08-30T11:18:38Z

Solution which worked for me.

Reboot the node/server
Restart the docker service
Start the docker container

Mattie112 · 2019-10-16T13:04:27Z

We just had the same issue after updating the docker package to version docker-ce-19.03.3-3.el7.x86_64. On CentOS Linux release 7.7.1908 (Core).

Exactly the same as in the first post, however killing the docker pid did not work for us. A docker restart did not work for us. A reboot of the entire server solved the problem.

Any more on this issue? It is really scary that this can happen to our production services.

dasDaniel · 2019-10-17T15:21:10Z

same issue, couldn't run after update

Error: Cannot start service odfenode: failed to listen to abstract unix socket

Version

Client: Docker Engine - Community
 Version:           19.03.3
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        a872fc2
 Built:             Tue Oct  8 00:59:54 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.3
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       a872fc2
  Built:            Tue Oct  8 00:58:28 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Granjow · 2019-10-18T08:34:25Z

Same issue here.

In my case, it helped to downgrade to an earlier docker version and then restart the system (just restarting docker did not help). No need to redeploy/remove existing containers.

Example for Ubuntu Xenial:

sudo apt install docker-ce=5:19.03.1~3-0~ubuntu-xenial

Redsandro · 2019-11-13T01:35:07Z

I am seeing the same issue after updating to Docker version 19.03.4. I cannot reboot this Debian machine without a lot of hassle. I wish I hadn't upgraded Docker.

Captain Hindsight advise: Pin the docker version.

You wouldn't expect this from the non-edge channel.

ducmanhnguyen · 2019-11-22T12:22:00Z

Try find the docker process and kill it will resolve the issue:
for pid in $(ps -ef|grep docker|awk '{print $2}'); do
lsof -p $pid|grep $uuid
done
kill -9 <pid>  

this one saved my day! thanks

cpuguy83 · 2019-11-22T16:21:07Z

@thaJeztah Is this line same as other issues where it was some packaging related problem?

slimsag · 2020-03-16T23:43:26Z

This SO post seems to indicate this may be an issue with the Ubuntu Snap package and that the following may resolve it:

# Remove snap installation, any prior Docker installations
sudo snap remove docker
sudo apt-get remove docker docker-engine docker.io

# Install latest Docker.io version
sudo apt-get update
sudo apt install docker.io

# Run Docker on startup
sudo systemctl start docker
sudo systemctl enable docker

slimsag · 2020-03-17T00:17:50Z

Based on the comments it seems this happens on 19.03.3 and 19.03.4, and we had someone reproduce it on Xenial with 19.03.8 as well, but I was NOT able to reproduce it with the following:

Add the apt repository if you don't already have it:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update

Install Docker CE 19.03.8 (latest) explicitly:

sudo apt install docker-ce=19.03.8~3-0~ubuntu-xenial

Jack-2001 · 2022-03-11T06:08:49Z

does someone know what is the actual reason for this issue?

nvkhoi112358 · 2024-07-24T09:34:32Z

Still get similar issue, but with version 27.1.1.
In this case, the first time nginx start (using docker compose 2.29.1) is ok, but after down then up again it got error.
with same configuration on the host, it is ok.
with traditional unix socket (visible on filesystem), it is ok.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/<uuid>/shim.sock": listen unix /containerd-shim/moby/<uuid>/shim.sock: bind: address already in use: unknown #643

Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/<uuid>/shim.sock": listen unix /containerd-shim/moby/<uuid>/shim.sock: bind: address already in use: unknown #643

kolbitsch-lastline commented Apr 6, 2019

1028866041 commented May 9, 2019

slhck commented May 24, 2019 •

edited

Loading

sergiomafra commented May 24, 2019

ravensorb commented May 29, 2019

sergiomafra commented May 30, 2019

windli2018 commented Jun 19, 2019

chenz-svsarrazin commented Jun 25, 2019

limadm commented Jul 10, 2019 •

edited

Loading

Rich43 commented Jul 14, 2019 •

edited

Loading

slhck commented Jul 14, 2019

airomyas commented Aug 19, 2019

sanekmihailow commented Aug 22, 2019

ajay-awachar commented Aug 30, 2019

Mattie112 commented Oct 16, 2019

dasDaniel commented Oct 17, 2019

Granjow commented Oct 18, 2019

Redsandro commented Nov 13, 2019

ducmanhnguyen commented Nov 22, 2019 •

edited

Loading

cpuguy83 commented Nov 22, 2019

slimsag commented Mar 16, 2020

slimsag commented Mar 17, 2020

Jack-2001 commented Mar 11, 2022

nvkhoi112358 commented Jul 24, 2024

Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/<uuid>/shim.sock": listen unix /containerd-shim/moby/<uuid>/shim.sock: bind: address already in use: unknown #643

Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/<uuid>/shim.sock": listen unix /containerd-shim/moby/<uuid>/shim.sock: bind: address already in use: unknown #643

Comments

kolbitsch-lastline commented Apr 6, 2019

Expected behavior

Actual behavior

Steps to reproduce the behavior

1028866041 commented May 9, 2019

slhck commented May 24, 2019 • edited Loading

sergiomafra commented May 24, 2019

ravensorb commented May 29, 2019

sergiomafra commented May 30, 2019

windli2018 commented Jun 19, 2019

chenz-svsarrazin commented Jun 25, 2019

limadm commented Jul 10, 2019 • edited Loading

Rich43 commented Jul 14, 2019 • edited Loading

slhck commented Jul 14, 2019

airomyas commented Aug 19, 2019

sanekmihailow commented Aug 22, 2019

ajay-awachar commented Aug 30, 2019

Mattie112 commented Oct 16, 2019

dasDaniel commented Oct 17, 2019

Granjow commented Oct 18, 2019

Redsandro commented Nov 13, 2019

ducmanhnguyen commented Nov 22, 2019 • edited Loading

cpuguy83 commented Nov 22, 2019

slimsag commented Mar 16, 2020

slimsag commented Mar 17, 2020

Jack-2001 commented Mar 11, 2022

nvkhoi112358 commented Jul 24, 2024

slhck commented May 24, 2019 •

edited

Loading

limadm commented Jul 10, 2019 •

edited

Loading

Rich43 commented Jul 14, 2019 •

edited

Loading

ducmanhnguyen commented Nov 22, 2019 •

edited

Loading