Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosquitto broker stops working, it stalls after working for a while #986

Closed
alwashe opened this issue Oct 10, 2018 · 13 comments
Closed

Mosquitto broker stops working, it stalls after working for a while #986

alwashe opened this issue Oct 10, 2018 · 13 comments

Comments

@alwashe
Copy link

alwashe commented Oct 10, 2018

Raspberry Pi 3b, where i have Raspbian 4.14, mosquito 1.5.3. and mosquito-clienets

After about 30 to 120 min i have to restart the service because no connection is anymore possible.

pi@mqtt:~ $ sudo cat /var/log/mosquitto/mosquitto2.log
1539118234: mosquitto version 1.5.3 starting
1539118234: Config loaded from /etc/mosquitto/mosquitto.conf.
1539118234: Opening ipv4 listen socket on port 1883.
1539118234: Opening ipv6 listen socket on port 1883.
1539118235: New connection from 192.168.178.20 on port 1883.
1539118235: New client connected from 192.168.178.20 as ESP807d3a6e40c1 (c1, k15, u'openhab').
1539118238: New connection from 192.168.178.79 on port 1883.
1539118238: New client connected from 192.168.178.79 as ESP807d3a6e423c (c1, k15, u'openhab').
1539118256: New connection from 192.168.178.201 on port 1883.
1539118256: New client connected from 192.168.178.201 as ohBroker (c1, k60, u'openhab').
1539118320: Socket error on client ESP807d3a6e423c, disconnecting.
1539118321: New connection from 192.168.178.79 on port 1883.
1539118321: New client connected from 192.168.178.79 as ESP807d3a6e423c (c1, k15, u'openhab').
1539118577: Socket error on client ESP807d3a6e423c, disconnecting.
pi@mqtt:~ $ sudo service mosquitto status
● mosquitto.service - LSB: mosquitto MQTT v3.1 message broker
   Loaded: loaded (/etc/init.d/mosquitto; generated; vendor preset: enabled)
   Active: active (exited) since Tue 2018-10-09 20:50:34 UTC; 11h ago
     Docs: man:systemd-sysv-generator(8)
  Process: 3376 ExecStop=/etc/init.d/mosquitto stop (code=exited, status=0/SUCCESS)
  Process: 3415 ExecStart=/etc/init.d/mosquitto start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/mosquitto.service

Oct 09 20:50:34 mqtt systemd[1]: Starting LSB: mosquitto MQTT v3.1 message broker...
Oct 09 20:50:34 mqtt mosquitto[3415]: Starting network daemon:: mosquitto.
Oct 09 20:50:34 mqtt systemd[1]: Started LSB: mosquitto MQTT v3.1 message broker.
pi@mqtt:~ $ mosquitto_sub -t '$SYS/#' -v -d -u ***** -P *****
Error: Connection refused
pi@mqtt:~ $ sudo service mosquitto restart
pi@mqtt:~ $ sudo service mosquitto status
● mosquitto.service - LSB: mosquitto MQTT v3.1 message broker
   Loaded: loaded (/etc/init.d/mosquitto; generated; vendor preset: enabled)
   Active: active (running) since Wed 2018-10-10 08:27:39 UTC; 5s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 9201 ExecStop=/etc/init.d/mosquitto stop (code=exited, status=0/SUCCESS)
  Process: 9209 ExecStart=/etc/init.d/mosquitto start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/mosquitto.service
           └─9215 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf

Oct 10 08:27:39 mqtt systemd[1]: Starting LSB: mosquitto MQTT v3.1 message broker...
Oct 10 08:27:39 mqtt mosquitto[9209]: Starting network daemon:: mosquitto.
Oct 10 08:27:39 mqtt systemd[1]: Started LSB: mosquitto MQTT v3.1 message broker.
pi@mqtt:~ $ mosquitto_sub -t '$SYS/#' -v -d -u ***** -P *****
Client mosqsub|9234-mqtt sending CONNECT
Client mosqsub|9234-mqtt received CONNACK (0)
Client mosqsub|9234-mqtt sending SUBSCRIBE (Mid: 1, Topic: $SYS/#, QoS: 0)
Client mosqsub|9234-mqtt received SUBACK
Subscribed (mid: 1): 0
Client mosqsub|9234-mqtt received PUBLISH (d0, q0, r1, m0, '$SYS/broker/version', ... (23 bytes))
$SYS/broker/version mosquitto version 1.5.3
@alwashe alwashe changed the title Broker stops working, it stalls after working for a while Mosquitto broker stops working, it stalls after working for a while Oct 10, 2018
@karlp
Copy link
Contributor

karlp commented Oct 10, 2018

Can you use lsof to see if it has run out of file handles? Can you strace the running process to see where it's stopped?

@alwashe
Copy link
Author

alwashe commented Oct 10, 2018

i have installed both, i am not sure how to use them

pi@mqtt:~ $ sudo strace -c -p 9215
strace: Process 9215 attached
 ^Cstrace: Process 9215 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 56.38    0.300066           8     36885           read
 19.14    0.101864           8     12294     12294 write
 15.44    0.082198           7     12294           rt_sigprocmask
  9.03    0.048073           8      6147           epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00    0.532201                 67620     12294 total
pi@mqtt:~ $ sudo lsof -p 9215
COMMAND    PID      USER   FD      TYPE DEVICE SIZE/OFF   NODE NAME
mosquitto 9215 mosquitto  cwd       DIR  179,2     4096      2 /
mosquitto 9215 mosquitto  rtd       DIR  179,2     4096      2 /
mosquitto 9215 mosquitto  txt       REG  179,2   162296   2049 /usr/sbin/mosquitto
mosquitto 9215 mosquitto  mem       REG  179,2    38560   1945 /lib/arm-linux-gnueabihf/libnss_files-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2    38588   1953 /lib/arm-linux-gnueabihf/libnss_nis-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2    26456   1943 /lib/arm-linux-gnueabihf/libnss_compat-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2   129840   1068 /usr/lib/arm-linux-gnueabihf/libuv.so.1.0.0
mosquitto 9215 mosquitto  mem       REG  179,2    46832    973 /usr/lib/arm-linux-gnueabihf/libev.so.4.0.0
mosquitto 9215 mosquitto  mem       REG  179,2    91884   1990 /lib/arm-linux-gnueabihf/libz.so.1.2.8
mosquitto 9215 mosquitto  mem       REG  179,2    71604   1942 /lib/arm-linux-gnueabihf/libnsl-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2   127300   1967 /lib/arm-linux-gnueabihf/libpthread-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2  1234700   1906 /lib/arm-linux-gnueabihf/libc-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2   133604   1095 /usr/lib/arm-linux-gnueabihf/libwebsockets.so.8
mosquitto 9215 mosquitto  mem       REG  179,2     9964   1899 /lib/arm-linux-gnueabihf/libanl-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2    14048   1988 /lib/arm-linux-gnueabihf/libuuid.so.1.3.0
mosquitto 9215 mosquitto  mem       REG  179,2  1827956  10467 /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
mosquitto 9215 mosquitto  mem       REG  179,2   327952  10582 /usr/lib/arm-linux-gnueabihf/libssl.so.1.1
mosquitto 9215 mosquitto  mem       REG  179,2    30548   1989 /lib/arm-linux-gnueabihf/libwrap.so.0.7.6
mosquitto 9215 mosquitto  mem       REG  179,2    26632   1971 /lib/arm-linux-gnueabihf/librt-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2   452152   1933 /lib/arm-linux-gnueabihf/libm-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2     9800   1915 /lib/arm-linux-gnueabihf/libdl-2.24.so
mosquitto 9215 mosquitto  mem       REG  179,2    21868  10441 /usr/lib/arm-linux-gnueabihf/libarmmem.so
mosquitto 9215 mosquitto  mem       REG  179,2   138576   1859 /lib/arm-linux-gnueabihf/ld-2.24.so
mosquitto 9215 mosquitto    0u      CHR    1,3      0t0   1028 /dev/null
mosquitto 9215 mosquitto    1u      CHR    1,3      0t0   1028 /dev/null
mosquitto 9215 mosquitto    2u      CHR    1,3      0t0   1028 /dev/null
mosquitto 9215 mosquitto    3u     sock    0,8      0t0  27983 protocol: TCP
mosquitto 9215 mosquitto    4w      REG  179,2    29062 131108 /var/log/mosquitto/mosquitto2.log
mosquitto 9215 mosquitto    5u     IPv4  27991      0t0    TCP *:1883 (LISTEN)
mosquitto 9215 mosquitto    6u     IPv6  27992      0t0    TCP *:1883 (LISTEN)
mosquitto 9215 mosquitto    7u  a_inode   0,11        0   6408 [eventpoll]
mosquitto 9215 mosquitto    8u     IPv4  34831      0t0    TCP mqtt.fritz.box:1883->ESP-6E423C.fritz.box:60225 (ESTABLISHED)
mosquitto 9215 mosquitto    9u     IPv4  29677      0t0    TCP mqtt.fritz.box:1883->ESP-6E40C1.fritz.box:60200 (ESTABLISHED)
mosquitto 9215 mosquitto   11u     IPv4  25200      0t0    TCP mqtt.fritz.box:1883->JmDNS-IP-1.fritz.box:38106 (ESTABLISHED)
pi@mqtt:~ $ sudo strace -p 9215
rt_sigprocmask(SIG_SETMASK, [HUP INT USR1 USR2 TERM], [], 8) = 0
epoll_wait(7, [{EPOLLIN, {u32=8, u64=8}}, {EPOLLIN, {u32=9, u64=9}}], 1000, 100) = 2
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(8, "\202", 1)                      = 1
read(8, "\r", 1)                        = 1
read(8, "\317z\0\10MQTT_ACT\0", 13)     = 13
write(8, "\3\na\0", 4)                  = -1 EAGAIN (Resource temporarily unavailable)
read(9, "\202", 1)                      = 1
read(9, "\r", 1)                        = 1
read(9, "\361\240\0\10MQTT_ACT\0", 13)  = 13
write(9, "\3\nn\0", 4)                  = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_SETMASK, [HUP INT USR1 USR2 TERM], [], 8) = 0
epoll_wait(7, [{EPOLLIN, {u32=8, u64=8}}, {EPOLLIN, {u32=9, u64=9}}], 1000, 100) = 2
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(8, "\202", 1)                      = 1
read(8, "\r", 1)                        = 1
read(8, "\317{\0\10MQTT_ACT\0", 13)     = 13
write(8, "\3\na\0", 4)                  = -1 EAGAIN (Resource temporarily unavailable)
read(9, "\202", 1)                      = 1
read(9, "\r", 1)                        = 1
read(9, "\361\241\0\10MQTT_ACT\0", 13)  = 13
write(9, "\3\nn\0", 4^C)                  = -1 EAGAIN (Resource temporarily unavailable)
strace: Process 9215 detached

at the moment it worked for more than 3h, i did not change anything, it will stall again.

pi@mqtt:~ $ sudo service mosquitto status
● mosquitto.service - LSB: mosquitto MQTT v3.1 message broker
   Loaded: loaded (/etc/init.d/mosquitto; generated; vendor preset: enabled)
   Active: active (running) since Wed 2018-10-10 08:27:39 UTC; 3h 35min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 9201 ExecStop=/etc/init.d/mosquitto stop (code=exited, status=0/SUCCESS)
  Process: 9209 ExecStart=/etc/init.d/mosquitto start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/mosquitto.service
           └─9215 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf

Oct 10 08:27:39 mqtt systemd[1]: Starting LSB: mosquitto MQTT v3.1 message broker...
Oct 10 08:27:39 mqtt mosquitto[9209]: Starting network daemon:: mosquitto.
Oct 10 08:27:39 mqtt systemd[1]: Started LSB: mosquitto MQTT v3.1 message broker.

@karlp
Copy link
Contributor

karlp commented Oct 10, 2018 via email

@alwashe
Copy link
Author

alwashe commented Oct 10, 2018

Yes, this makes sense, are the commands right?

@timoline
Copy link

timoline commented Oct 11, 2018

I have exact the same problem after upgrading and fresh reinstall of 1.5.3 (on debian)

It works for a while and then the service is exited (random time range)

@alwashe
Copy link
Author

alwashe commented Oct 11, 2018

the Server ran for about 12 h, longest time.
want to be able to close console.
how do i need to use strace for unattended execution?
sudo strace -p 20095 &> stra.txt

@timoline
Copy link

maybe

timeout 2d strace -p PID

this is for 2 days
DURATION is a floating point number with an optional suffix:
's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.

@timoline
Copy link

timoline commented Oct 12, 2018

Hope this helps

poll([{fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=14, events=POLLIN}], 3, 0) = 0 (Timeout) rt_sigprocmask(SIG_SETMASK, [HUP INT USR1 USR2 TERM], [], 8) = 0 epoll_wait(10, [{EPOLLIN, {u32=14, u64=14}}], 1000, 100) = 1 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 recvfrom(14, "\210\202\343\206\372\266\340o", 4096, 0, NULL, NULL) = 8 poll([{fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=14, events=POLLIN|POLLOUT}], 3, 0) = 1 ([{fd=14, revents=POLLIN|POLLOUT}]) sendto(14, "\210\2\3\351", 4, MSG_NOSIGNAL, NULL, 0) = 4 shutdown(14, SHUT_WR) = 0 epoll_ctl(10, EPOLL_CTL_ADD, 14, {EPOLLIN|EPOLLOUT, {u32=14, u64=14}}) = -1 EEXIST (File exists) epoll_ctl(10, EPOLL_CTL_MOD, 14, {EPOLLIN|EPOLLOUT, {u32=14, u64=14}}) = 0 rt_sigprocmask(SIG_SETMASK, [HUP INT USR1 USR2 TERM], [], 8) = 0 epoll_wait(10, [{EPOLLIN|EPOLLOUT|EPOLLHUP, {u32=14, u64=14}}], 1000, 100) = 1 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 recvfrom(14, "", 4096, 0, NULL, NULL) = 0 write(4, "1539358230: Socket error on clie"..., 119) = 119 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x190} --- +++ killed by SIGSEGV +++

@karlp
Copy link
Contributor

karlp commented Oct 15, 2018

we're seeing this too, but the server was restarted before I could grab any debug. WIll be watching closer in future....

@timoline
Copy link

wow, still no official reaction, how is this possible.?
I cant beleive we are the only one...

@timoline
Copy link

timoline commented Oct 19, 2018

really dont know what I changed,
I did a few purges and (re)installs, now it lookslike its working fine again....for a few days in a row...

@alwashe
Copy link
Author

alwashe commented Oct 20, 2018

same here

@alwashe alwashe closed this as completed Oct 20, 2018
@imZack
Copy link
Contributor

imZack commented Oct 20, 2018

@alwashe How did you resolve this one? I'm also investigating an issue similar to you have. Thank you.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants