Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosquitto 4.12 on Debian, hangs and eat 100% CPU #496

Closed
nik-sharky opened this issue Jul 17, 2017 · 21 comments
Closed

Mosquitto 4.12 on Debian, hangs and eat 100% CPU #496

nik-sharky opened this issue Jul 17, 2017 · 21 comments

Comments

@nik-sharky
Copy link

Hi,

I have strange problem with mosqutto on debian, it was on 4.10, update to 4.12 doesn't help.
After several days of normal work it hangs and eat 100% of CPU.
mosquttohang

Maybe You can help to investigate reason of it, maybe I can run mosquiito in debug mode?

Mosquitto log has some strange lines with rejected connections like:
1500027624: ACL denying access to client with dangerous client id "paho/4DD5940EF291D60EE2"
1500027634: ACL denying access to client with dangerous client id "paho/4DD5940EF291D60EE2"
1500027644: Socket error on client paho/4DD5940EF291D60EE2, disconnecting.

But as I remember previous logs when it hangs doesn't has this lines.
mosquittolog.txt

Part of strace of existing mosqutto process:
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}], 3, 100) = 1 ([{fd=5, revents=POLLNVAL}]) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 poll([{fd=8, events=POLLIN}], 1, 0) = 0 (Timeout) rt_sigprocmask(SIG_SETMASK, [INT], [], 8) = 0 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}], 3, 100) = 1 ([{fd=5, revents=POLLNVAL}]) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 poll([{fd=8, events=POLLIN}], 1, 0) = 0 (Timeout) rt_sigprocmask(SIG_SETMASK, [INT], [], 8) = 0 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}], 3, 100) = 1 ([{fd=5, revents=POLLNVAL}]) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

Part of ltrace of existing mosquitto process:
13259 libwebsocket_service(0xa8e5d0, 0, 0xa6e010, 2) = 0 13259 clock_gettime(1, 0x7ffd3d7bf340, 0, -1) = 0 13259 clock_gettime(1, 0x7ffd3d7bf240, 0xd3f135, 24) = 0 13259 memset(0xa96460, '\377', 48) = 0xa96460 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 0xa8e1c8, 6) = 0 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 9, 1) = 0 13259 sigprocmask(SIG_UNBLOCK, <1>, <>) = 0 13259 poll(0xa96460, 3, 100, -1) = 1 13259 sigprocmask(SIG_UNBLOCK, <>, nil) = 0 13259 libwebsocket_service(0xa8e5d0, 0, 0xa6e010, 2) = 0 13259 clock_gettime(1, 0x7ffd3d7bf340, 0, -1) = 0 13259 clock_gettime(1, 0x7ffd3d7bf240, 0xd3f135, 24) = 0 13259 memset(0xa96460, '\377', 48) = 0xa96460 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 0xa8e1c8, 6) = 0 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 9, 1) = 0 13259 sigprocmask(SIG_UNBLOCK, <1>, <>) = 0 13259 poll(0xa96460, 3, 100, -1) = 1 13259 sigprocmask(SIG_UNBLOCK, <>, nil) = 0 13259 libwebsocket_service(0xa8e5d0, 0, 0xa6e010, 2) = 0 13259 clock_gettime(1, 0x7ffd3d7bf340, 0, -1) = 0 13259 clock_gettime(1, 0x7ffd3d7bf240, 0xd3f135, 24) = 0 13259 memset(0xa96460, '\377', 48) = 0xa96460 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 0xa8e1c8, 6) = 0 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 9, 1) = 0 13259 sigprocmask(SIG_UNBLOCK, <1>, <>) = 0 13259 poll(0xa96460, 3, 100, -1) = 1 13259 sigprocmask(SIG_UNBLOCK, <>, nil) = 0 13259 libwebsocket_service(0xa8e5d0, 0, 0xa6e010, 2) = 0 13259 clock_gettime(1, 0x7ffd3d7bf340, 0, -1) = 0 13259 clock_gettime(1, 0x7ffd3d7bf240, 0xd3f135, 24) = 0 13259 memset(0xa96460, '\377', 48) = 0xa96460 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 0xa8e1c8, 6) = 0 13259 time(0) = 1500282490 13259 clock_gettime(1, 0x7ffd3d7bf340, 9, 1) = 0 13259 sigprocmask(SIG_UNBLOCK, <1>, <>) = 0 13259 poll(0xa96460, 3, 100, -1) = 1 13259 sigprocmask(SIG_UNBLOCK, <>, nil) = 0 13259 libwebsocket_service(0xa8e5d0, 0, 0xa6e010, 2) = 0 13259 clock_gettime(1, 0x7ffd3d7bf340, 0, -1) = 0 13259 clock_gettime(1, 0x7ffd3d7bf240, 0xd3f135, 24) = 0 13259 memset(0xa96460, '\377', 48) = 0xa96460 13259 time(0) = 1500282490

@ralight
Copy link
Contributor

ralight commented Jul 17, 2017

Not much of an answer I realise, but could you try 1.4.14?

@nik-sharky
Copy link
Author

nik-sharky commented Jul 18, 2017

Is there any already built deb package for it?
Today is hangs again, I disable websockets listener as experiment.
Will monitor, if it hangs again will try to run 1.4.14

@ralight
Copy link
Contributor

ralight commented Jul 18, 2017

It was all updated apart from for jessie - that is now fixed as well. repo.mosquitto.org.

@ralight
Copy link
Contributor

ralight commented Jul 18, 2017

(thanks for making me realise jessie wasn't up to date!)

@nik-sharky
Copy link
Author

Hi, 3 days ago I disable websockets listener and it works normal without hangs till today.
Maybe it was a problem with websockets listeners in 1.4.12, continue to monitor.

Thanks for debian release.

@nik-sharky
Copy link
Author

It's still work normally, so I think on 99% it was websocket listener issue

@VeeNee
Copy link

VeeNee commented Jul 26, 2017

Same thing here. With websockets activated the mosquitto process sometimes may even not survive for 1 hour. It is not a load dependent issue, production system with 1200 clients and testing environment with 0 clients (and no messages at all) stops processing messages within the same amount of time. There is nothing in the logs at the time of hanging.
Debian Jessie, latest packages from repo.mosquitto.org.

@ralight
Copy link
Contributor

ralight commented Jul 26, 2017

@VeeNee Do you have websockets logging enabled and with a high verbosity?

@VeeNee
Copy link

VeeNee commented Jul 27, 2017

We are using "log_type all" which i believe includes websockets and everything else.

@ralight
Copy link
Contributor

ralight commented Jul 27, 2017

@VeeNee Please also set websockets_log_level 255.

@ralight
Copy link
Contributor

ralight commented Jul 27, 2017

It's possible this bug is related to #490. Would you be able to give the latest changes in the fixes branch a try?

@VeeNee
Copy link

VeeNee commented Jul 28, 2017

Enabled websockets_log_level 255, now waiting till process hangs.
I'am glad to check the changes, but i'am failing to compile mosquitto with websockets enabled (tested the master branch and v1.4.14) due to the following error:

loop.c: In function 'loop_handle_reads_writes':
loop.c:492:22: error: storage size of 'wspoll' isn’t known
    struct lws_pollfd wspoll;

Debian Jessie, GCC 4.9.2

@ralight
Copy link
Contributor

ralight commented Jul 28, 2017

Here is a patch that should work on Jessie:

lws-fixes.patch.txt

@VeeNee
Copy link

VeeNee commented Jul 29, 2017

Thanks, succesfully built with this lws-fixes patch and cherry-picked 2 commits from #490.
After a couple of hours one of the mosquittos (i ran 3) started to eat CPU again. Looks like the port i used for websockets (8080) got scanned by some bots (http proxy detect or something like that). I tried to bring down another mosquitto instance using wget with http_proxy set to mosquitto's websocket port, but after tens of tries i got nothing, mosquitto worked good. I'am attaching the log file from the mosquitto which stopped working.
mosquitto.log.txt

@ralight
Copy link
Contributor

ralight commented Jul 30, 2017

That's a shame but it's a good starting point that you noticed it being scanned.

@dzach
Copy link

dzach commented Aug 21, 2017

@ralight
Thanks so much for the patch! Compiling on Raspberry Pi was failing without it.

@AdamMiltonBarker
Copy link

I am actually getting the same on Centos also, is there a patch available?

@AdamMiltonBarker
Copy link

Here is output I have had issues with not being able to get mosquitto built with websockets for over a year, finally came back to try and tackle it today :) Any help with the below would be appreciated :)

[~/mosquitto-1.4.14]# sudo make
set -e; for d in lib client src; do make -C ${d}; done
make[1]: Entering directory `/home/dcrwslz/mosquitto-1.4.14/lib'
make -C cpp
make[2]: Entering directory `/home/dcrwslz/mosquitto-1.4.14/lib/cpp'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/home/dcrwslz/mosquitto-1.4.14/lib/cpp'
make[1]: Leaving directory `/home/dcrwslz/mosquitto-1.4.14/lib'
make[1]: Entering directory `/home/dcrwslz/mosquitto-1.4.14/client'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/dcrwslz/mosquitto-1.4.14/client'
make[1]: Entering directory `/home/dcrwslz/mosquitto-1.4.14/src'
cc -Wall -ggdb -O2  -I. -I.. -I../lib  -DVERSION="\"1.4.14\"" -DTIMESTAMP="\"2017-08-31 13:26:21+0200\"" -DWITH_BROKER -DWITH_TLS -DWITH_TLS_PSK -DWITH_UUID -DWITH_BRIDGE -DWITH_PERSISTENCE -DWITH_MEMORY_TRACKING -DWITH_SYS_TREE -DWITH_WEBSOCKETS -DWITH_EC -c mosquitto.c -o mosquitto.o
cc -Wall -ggdb -O2  -I. -I.. -I../lib  -DVERSION="\"1.4.14\"" -DTIMESTAMP="\"2017-08-31 13:26:21+0200\"" -DWITH_BROKER -DWITH_TLS -DWITH_TLS_PSK -DWITH_UUID -DWITH_BRIDGE -DWITH_PERSISTENCE -DWITH_MEMORY_TRACKING -DWITH_SYS_TREE -DWITH_WEBSOCKETS -DWITH_EC -c bridge.c -o bridge.o
cc -Wall -ggdb -O2  -I. -I.. -I../lib  -DVERSION="\"1.4.14\"" -DTIMESTAMP="\"2017-08-31 13:26:21+0200\"" -DWITH_BROKER -DWITH_TLS -DWITH_TLS_PSK -DWITH_UUID -DWITH_BRIDGE -DWITH_PERSISTENCE -DWITH_MEMORY_TRACKING -DWITH_SYS_TREE -DWITH_WEBSOCKETS -DWITH_EC -c conf.c -o conf.o
cc -Wall -ggdb -O2  -I. -I.. -I../lib  -DVERSION="\"1.4.14\"" -DTIMESTAMP="\"2017-08-31 13:26:21+0200\"" -DWITH_BROKER -DWITH_TLS -DWITH_TLS_PSK -DWITH_UUID -DWITH_BRIDGE -DWITH_PERSISTENCE -DWITH_MEMORY_TRACKING -DWITH_SYS_TREE -DWITH_WEBSOCKETS -DWITH_EC -c context.c -o context.o
cc -Wall -ggdb -O2  -I. -I.. -I../lib  -DVERSION="\"1.4.14\"" -DTIMESTAMP="\"2017-08-31 13:26:21+0200\"" -DWITH_BROKER -DWITH_TLS -DWITH_TLS_PSK -DWITH_UUID -DWITH_BRIDGE -DWITH_PERSISTENCE -DWITH_MEMORY_TRACKING -DWITH_SYS_TREE -DWITH_WEBSOCKETS -DWITH_EC -c database.c -o database.o
cc -Wall -ggdb -O2  -I. -I.. -I../lib  -DVERSION="\"1.4.14\"" -DTIMESTAMP="\"2017-08-31 13:26:21+0200\"" -DWITH_BROKER -DWITH_TLS -DWITH_TLS_PSK -DWITH_UUID -DWITH_BRIDGE -DWITH_PERSISTENCE -DWITH_MEMORY_TRACKING -DWITH_SYS_TREE -DWITH_WEBSOCKETS -DWITH_EC -c logging.c -o logging.o
cc -Wall -ggdb -O2  -I. -I.. -I../lib  -DVERSION="\"1.4.14\"" -DTIMESTAMP="\"2017-08-31 13:26:21+0200\"" -DWITH_BROKER -DWITH_TLS -DWITH_TLS_PSK -DWITH_UUID -DWITH_BRIDGE -DWITH_PERSISTENCE -DWITH_MEMORY_TRACKING -DWITH_SYS_TREE -DWITH_WEBSOCKETS -DWITH_EC -c loop.c -o loop.o
loop.c: In function ‘loop_handle_reads_writes’:
loop.c:492: error: storage size of ‘wspoll’ isn’t known
loop.c:496: warning: implicit declaration of function ‘lws_service_fd’
loop.c:496: warning: implicit declaration of function ‘lws_get_context’
loop.c:492: warning: unused variable ‘wspoll’
make[1]: *** [loop.o] Error 1
make[1]: Leaving directory `/home/dcrwslz/mosquitto-1.4.14/src'
make: *** [mosquitto] Error 2

@hbli
Copy link

hbli commented Oct 9, 2017

I can still reproduce the issue with 1.4.14. Running for 1 or 2 hours. Then 1 cpu core starts to max at 100%.

@hbli
Copy link

hbli commented Oct 9, 2017

I also tried to compile mqtt with websocket=no. Same result.

@ralight
Copy link
Contributor

ralight commented Aug 12, 2022

The versions involved in this issue are now very out of date, so I'm closing it. If this is really still a problem with 2.0.14 then please do reopen the bug and add more details.

@ralight ralight closed this as completed Aug 12, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants