Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault using Mosquitto websockets #303

Closed
wiebeytec opened this issue Nov 2, 2016 · 44 comments
Closed

Segfault using Mosquitto websockets #303

wiebeytec opened this issue Nov 2, 2016 · 44 comments

Comments

@wiebeytec
Copy link

Using Mosquitto 1.4.10 (and 1.4.8), we get segfaults when a websocket client has been running for a while. The segfault seems to orinate in libwebsockets 1.7 (shipped with Ubuntu 16.04), but but because it's running a callback function, it may actually be Mosquitto. I tried using the newest libwebsockets 2.1 stable, but then it crashes on handshake.

Perhaps related, there is a fair amount of SIGPIPE broken pipe signals for which I see no apparent reason. We can reproduce this with many TCP clients and one websocket client. There doesn't seem to be a good reason why that websocket client keeps losing its connection. Additionally, before the stack trace, it always seems to have done authentication with HTTP (http:https://127.0.0.1:80/auth/). The current client is programmed to reconnect after a broken connection, so it appears as though the sequence is disconnect-reconnect-auth-crash.

See below for a stack trace when running Mosquitto 1.4.10 with libwebsockets 1.7.

A core dump is also available, but that probably does contain sensitive information, because we could only reproduce this issue on our live server with enough traffic.

1478092472: |-- aclcheck(xxx, N/04a316b4f85b/battery/260/History/TimeSinceLastFullCharge, 2) CACHEDAUTH: 0
1478092472: |-- mosquitto_auth_acl_check(..., xxx, xxx, N/6cecebc44ede/hub4/0/MaxChargePower, MOSQ_ACL_WRITE)
1478092472: |-- aclcheck(xxx, N/6cecebc44ede/hub4/0/MaxChargePower, 2) CACHEDAUTH: 0
1478092472: |-- mosquitto_auth_acl_check(..., xxx, ccxxx, N/6cecebc44ede/hub4/0/MaxDischargePower, MOSQ_ACL_WRITE)
1478092472: |-- aclcheck(xxx, N/6cecebc44ede/hub4/0/MaxDischargePower, 2) CACHEDAUTH: 0
1478092472: |-- mosquitto_auth_acl_check(..., xxx, xxx, N/d05fb8999e85/system/0/Ac/PvOnGrid/L1/Power, MOSQ_ACL_WRITE)
1478092472: |-- aclcheck(xxx, N/d05fb8999e85/system/0/Ac/PvOnGrid/L1/Power, 2) CACHEDAUTH: 0
1478092472: |-- mosquitto_auth_acl_check(..., xxx, xxx, N/d05fb8999e85/battery/258/Dc/0/Power, MOSQ_ACL_WRITE)
1478092472: |-- aclcheck(xxx, N/d05fb8999e85/battery/258/Dc/0/Power, 2) CACHEDAUTH: 0
1478092472: |-- mosquitto_auth_acl_check(..., ccxxx, xxx, N/6ceceb80aeb1/vebus/257/Dc/0/Voltage, MOSQ_ACL_WRITE)
1478092472: |-- aclcheck(xxx, N/6ceceb80aeb1/vebus/257/Dc/0/Voltage, 2) CACHEDAUTH: 0
1478092472: |-- mosquitto_auth_acl_check(..., xxx, xxx, N/6ceceb8d5c2b/solarcharger/0/Yield/Power, MOSQ_ACL_WRITE)
1478092472: |-- aclcheck(xxx, N/6ceceb8d5c2b/solarcharger/0/Yield/Power, 2) CACHEDAUTH: 0
1478092472: |-- mosquitto_auth_unpwd_check([email protected])
1478092472: |-- ** checking backend http
1478092472: |-- url=http:https://127.0.0.1:80/auth/
1478092472: |-- data=username=info%40example.com&password=hello123&topic=&acc=-1&clientid=
1478092473: |-- getuser([email protected]) AUTHENTICATED=1 by http

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6bf7a0c in ?? () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
(gdb) 
(gdb) 
(gdb) where
#0  0x00007ffff6bf7a0c in ?? () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#1  0x00007ffff6bf8099 in lws_callback_on_writable () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#2  0x000000000040d591 in do_disconnect (db=db@entry=0x6268e0 <int_db>, context=context@entry=0x2805000) at loop.c:404
#3  0x0000000000414b2c in mqtt3_handle_connect (db=0x6268e0 <int_db>, context=0x7a6f40) at read_handle_server.c:476
#4  0x000000000041ccd6 in callback_mqtt (wsi=<optimized out>, reason=<optimized out>, user=<optimized out>, in=0x665e10, len=66) at websockets.c:363
#5  0x00007ffff6bf6797 in ?? () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#6  0x00007ffff6bfae13 in ?? () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#7  0x00007ffff6c03140 in ?? () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#8  0x00007ffff6bf4e5e in lws_read () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#9  0x00007ffff6bf7564 in lws_service_fd_tsi () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#10 0x00007ffff6c00f5b in lws_plat_service_tsi () from /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
#11 0x000000000040dfeb in mosquitto_main_loop (db=db@entry=0x6268e0 <int_db>, listensock=listensock@entry=0x1ca3a20, listensock_count=listensock_count@entry=2, listener_max=listener_max@entry=6) at loop.c:378
#12 0x0000000000404b36 in main (argc=<optimized out>, argv=<optimized out>) at mosquitto.c:385
@ralight
Copy link
Contributor

ralight commented Nov 2, 2016

It looks like you're using an authentication plugin, could you provide details of what that is?

@wiebeytec
Copy link
Author

We use JPMens's auth plugin.

auth_plugin /usr/lib/mosquitto-auth-plugin/auth-plugin.so
auth_opt_backends http
auth_opt_http_ip 127.0.0.1
auth_opt_http_port 80
auth_opt_http_getuser_uri /auth/
auth_opt_http_superuser_uri /superuser/
auth_opt_http_aclcheck_uri /acl/

@wiebeytec
Copy link
Author

wiebeytec commented Nov 2, 2016

I just learned that our one javascript websocket client was actually many, in different browser tabs, all with the same clientid. That probably accounted for the many reconnects (because the clientid wasn't unique), which seem to be related to the problem. But, it's not the root trigger; there was also a confirmed crash with one client, but it did take a long time for it to crash that time.

@ralight
Copy link
Contributor

ralight commented Nov 2, 2016

Ah, thanks that's very useful. You're saying that the crash occurred when it was only a single WS client connected and it was definitely not multiple clients sharing the same ID?

@wiebeytec
Copy link
Author

Yes, but it took a whole evening, as the frontend dev in question reported. The other case takes 1-10 mins, repeatedly.

@wiebeytec
Copy link
Author

wiebeytec commented Nov 8, 2016

I think I found the cause of the crash. I analyzed the core dump, and found the following:

The libwebsocket_callback_on_writable crashes, so I inspected the variables:

(gdb) frame 2
#2  0x000000000040d591 in do_disconnect (db=db@entry=0x6268e0 <int_db>, context=context@entry=0x2805000) at loop.c:404
404                             libwebsocket_callback_on_writable(context->ws_context, context->wsi);
(gdb) list
399             if(context->wsi){
400                     if(context->state != mosq_cs_disconnecting){
401                             context->state = mosq_cs_disconnect_ws;
402                     }
403                     if(context->wsi){
404                             libwebsocket_callback_on_writable(context->ws_context, context->wsi);
405                     }
406                     context->sock = INVALID_SOCKET;
407             }else
408     #endif
(gdb) print context->ws_context
There is no member named ws_context.
(gdb)

Then in mosquitto_internal.h, I see this:

#  ifdef WITH_WEBSOCKETS
#    if defined(LWS_LIBRARY_VERSION_NUMBER)
    struct lws *wsi;
#    else
    struct libwebsocket_context *ws_context;
    struct libwebsocket *wsi;
#    endif
#  endif

The /usr/include/lws_config.h that comes with libwebsockets 1.7 in Ubuntu 16.04 has:

#define LWS_LIBRARY_VERSION_NUMBER (LWS_LIBRARY_VERSION_MAJOR*1000000)+(LWS_LIBRARY_VERSION_MINOR*1000)+LWS_LIBRARY_VERSION_PATCH

So, the callback is called with non-existing fields in the struct. Didn't even know the compiler didn't break on that...

It was introduced with libwebsocket 1.6 support.

(edit: we were just able to reproduce it with a client)

@wiebeytec
Copy link
Author

Apparently the cause is the compatability macros like #define libwebsocket_callback_on_writable(A, B) lws_callback_on_writable((B)). Discarding A does not make context->ws_context not being accessed.

Fix?:

--- ../../mosquitto-1.4.10/lib/mosquitto_internal.h     2016-11-09 17:58:15.905118435 +0100
+++ mosquitto_internal.h        2016-11-09 17:58:42.917550579 +0100
@@ -215,6 +215,7 @@
 #  ifdef WITH_WEBSOCKETS
 #    if defined(LWS_LIBRARY_VERSION_NUMBER)
        struct lws *wsi;
+       void *ws_context; // Backwards compatability
 #    else
        struct libwebsocket_context *ws_context;
        struct libwebsocket *wsi;

@wiebeytec
Copy link
Author

Actually, the compatability macro's are fine; that's why GCC approved it too.

I was able to crash it numerous times at different locations, but all of them are on libwebsocket_callback_on_writable. I can't quite figure out why.

@ralight
Copy link
Contributor

ralight commented Dec 4, 2016

As I said on #319:

I have really struggled to reproduce this but have now been able to do it consistently. It looks as though the problem is related to the libwebsockets binary in ubuntu. I couldn't reproduce it at first because I had a separate version of libwebsockets (1.7.x) installed not from ubuntu. Even recompiling the version which is in ubuntu myself doesn't result in any crashes.

I'm not really sure what the general solution is here, it seems a bit out of my control. Could you try compiling your own version of libwebsockets though?

@wiebeytec
Copy link
Author

I reported an Ubuntu bug.

On our production server, I just use Websockify in front of Mosquitto

@hillkim7
Copy link

I got similar problem with mosquitto in Ubuntu 16.04.1 LTS.
The mosquitto is suddenly dead leaving sys log like following.

traps: mosquitto[1012] general protection ip:7f7e8536ba0c sp:7ffd4817ea70 error:0 in libwebsockets.so.7[7f7e85363000+1f000]
Or
mosquitto[31696]: segfault at 8 ip 00007f361877aa29 sp 00007ffdc2be3ed0 error 4 in libwebsockets.so.7[7f3618772000+1f000]

@wiebeytec
Copy link
Author

@hillkim7 you should comment on the bug report at Ubuntu I linked to above.

@ralight
Copy link
Contributor

ralight commented Feb 21, 2017

Out of interest could you try again with the new 1.4.11?

@hillkim7
Copy link

I got two mosquitto servers having problem with the libwebsockets.so.
So I am quite sure that there must be bug within combination of mosquitto and libwebsockets.so in Ubuntu 16.04.1 LTS.
As @ralight suggested, I updated mosquitto with the new 1.4.11.
I will let you know if problem is gone later.

@wiebeytec
Copy link
Author

I think we should keep replying to that Ubuntu bug I submitted. Canonical is far too lax when it comes to fixing bugs, in my opinion.

@hillkim7
Copy link

hillkim7 commented Mar 8, 2017

The mosquitto halt problem still exists with 1.4.11 mosquitton.
Number of occurrences is reduced in 1.4.11 mosquitto.
Anyway I am going to switch to Websockify. What is Websockify?

@maecky
Copy link

maecky commented Mar 20, 2017

For me, mosquitto is also crashing when using websockets. One normal publisher / subscriber and one publisher / subscriber with websockets is enough to get i crashing.

Can someone explain the Websockify workaround or push me in the right direction? Or is there another solution?

@wiebeytec
Copy link
Author

Just Google websocikfy. It's a wrapper you can start that serves any TCP socket as websocket.

@hillkim7
Copy link

hillkim7 commented Mar 24, 2017

For me, it wasn't easy to utilize websockfy for MQTT web access.
Instead I made my own mosquitto init.d script that restarts when it gets segmentation fault.
I am free from the problem now.
You can find this init.d script here: https://github.com/hillkim7/mosquitto-init.d

@wiebeytec
Copy link
Author

In that case, you may just want to uninstall Mosquitto and libwebsockets7, take the most recent Mosquitto deb file, and obtain the libwebsockets3 from a current Debian jessie mirror if you run Ubuntu (because the mosquitto deb file has a dependency on libwebsockets3 and Ubuntu 16.04 at least doesn't have it).

I suspect that will work fine, because then you have a recent mosquitto, with a libwebsockets that has been used during development.

I actually already set up our server like this, but our websocket MQTT project is on hold. When I'll get back to it again, I'll check if Mosquitto works without Websockify this way.

@fezeev
Copy link

fezeev commented Nov 14, 2017

I have faced this bug some time ago. Ubuntu 16.04, mosquitto 1.4.8 and libwebsockets 1.7.1. mosquitto process crashed restarted several times per day. Segfault in logs.

Then, I recompile libwebsockets 1.7.8 from sources (it is the last version with same filename with 1.7.1) and substitute compiled library on place.

No crash in two weeks.

@SFSDevel
Copy link

I could easily reproduce it with Ubuntu 16.04, mosquitto 1.4.8 and libwebsockets 1.7.1 by refreshing browser a lot. Instead of building libwebsockets, I just updated mosquitto to 1.4.14.
I have not been able to reproduce since updating.. Test in progress.

@ismailyavuz
Copy link

As I understand from forums, since 1.4.12 libwebsockets problems are gone. I've the same problem with 1.4.10 and I'll try the 1.4.14 on Ubuntu 16.04.

@hillkim7
Copy link

hillkim7 commented Feb 7, 2018

I am sorry to bring bad news. The segmentation fault problem is still lingering on Ubunto 16.04.

  • mosquitto version 1.4.14

Today I got mosquitto dead couple of times while I was testing my web app that use the MQTT through websocket.
See:
[747847.985189] traps: mosquitto[19773] general protection ip:7f032b89b072 sp:7fff5c4de410 error:0 in libwebsockets.so.7[7f032b886000+1f000]

@fezeev
Copy link

fezeev commented Feb 7, 2018

Over two month still have no crashes with mosquitto 1.4.8 and libwebsockets 1.7.8 on Ubuntu 16.04.

@Tifaifai
Copy link
Contributor

Tifaifai commented Feb 7, 2018

My config is OK :
Ubuntu 16.04.3 LTS
mosquitto version 1.4.14 or 1.4.90 (with bridge dynamic ;) )
libwebsockets-2.4.1 (libwebsockets.so.12)

@ralight
Copy link
Contributor

ralight commented Feb 7, 2018

@hillkim7 Are you able to reproduce it?

@hillkim7
Copy link

hillkim7 commented Feb 8, 2018

@ralight Probably.. I'll let you know if I reproduce it.
By the way, how can I see my libwebsockets version.
I could only identify its version 7.
~$ ls -al /usr/lib/x86_64-linux-gnu/libwebsockets.so.7
-rw-r--r-- 1 root root 134160 Feb 22 2016 /usr/lib/x86_64-linux-gnu/libwebsockets.so.7

@karlp
Copy link
Contributor

karlp commented Feb 8, 2018

https://libwebsockets.org/abi/timeline/libwebsockets/index.html has the SO version for each release.

@Tifaifai
Copy link
Contributor

Tifaifai commented Feb 8, 2018

libwebsockets v2.4 released 2017-10-16 with soname .12

@hillkim7
Copy link

hillkim7 commented Feb 10, 2018

The version of my libwebsockets is like following:

$ dpkg -l|grep websockets
ii  libwebsockets7:amd64   1.7.1-1  amd64     lightweight C websockets library

This is default websocket package that Ubuntu 16.04 provides.
@Tifaifai How did you update libwebsocket of your Ubuntu machine?

@Tifaifai
Copy link
Contributor

$ wget https://github.com/warmcat/libwebsockets/archive/v2.4.1.zip
$ mv v2.4.1.zip libwebsockets-2.4.1.zip
$ unzip libwebsockets-2.4.1.zip 
$ cd libwebsockets-2.4.1/
$ mkdir build
$ cd build/
$ cmake ..
$ make
 
After cp bin/* include/* lib/* with your correct path system (for exemple /usr/local/*)

It's OK for you @hillkim7 ?

@WilliamHua
Copy link

WilliamHua commented Apr 13, 2018

@Tifaifai if I follow your instructions, will mosquitto automatically use the correct libwebsockets.so?

I have used your instructions for websockets 1.7.8 (I have put the corresponding files within /usr/local/*).

However when I run the following command: ldd /usr/sbin/mosquitto

I get the following output (for websockets):

libwebsockets.so.7 => /usr/lib/x86_64-linux-gnu/libwebsockets.so.7 (0x00007f9c0ece7000)

Making me think that mosquitto isn't using the correct library? My understanding is that the binary should be using /usr/local/lib/libwebsockets.so.7

I am running Xenial (Ubuntu 16.04.2)

@Tifaifai
Copy link
Contributor

Hi,
You can enter :

sudo ldconfig

@sanhardik
Copy link

I tried with libwebsockets v2.4.1 and haven't seen an issue in the last couple of days.

@ralight
Copy link
Contributor

ralight commented May 1, 2018

In my testing I can occasionally get crashes to occur using lws 1.7.1, but haven't been able to track down the cause of the problem. With 2.4.1 I have seen no crashes. I think the problem is with libwebsockets.

@phreaker0
Copy link

Using mosquitto (1.4.15) provided with Ubuntu 18.04 LTS, crashes on each tls websocket connect attempt:

mosquitto[24363]: segfault at 10640 ip 00007fd198cf9a6e sp 00007ffcbf23fcb0 error 4 in libwebsockets.so.8[7fd198ce6000+25000]

Using websockets without tls seems to work so far.

@Tifaifai
Copy link
Contributor

'Regression in libwebsockets 2.1.0 - unable to connect to mosquitto via websocket #774' warmcat/libwebsockets#774

Update your libwebsockets with minimum 2.4.1 version.
Explain in this top topic ;)

@phreaker0
Copy link

@Tifaifai
I tried to use the updated library with the distribution provided mosquitto binary, but it's linked to the old one and replacing the old one with the new one resulted in a segmentation fault on startup. So i compiled the latest mosquitto master against the 2.4.2 library and this works. thx

@Tifaifai
Copy link
Contributor

Félicitations ;) It's a good pratice ;)

@robernio
Copy link

robernio commented Nov 9, 2018

@Tifaifai
Hi, I have installed mosquitto in my Ubuntu 16.04.5 LTS following https://mosquitto.org/download/ instructions.
But, its mosquitto 1.4.8 and libwebsockets 1.7.1

Please, could you explay how to upgrade to?:

  • mosquitto version 1.4.14
  • libwebsockets-2.4.1 (libwebsockets.so.12)

I have follow your instructions but have same problem as @WilliamHua, mosquitto continues using libwebsockets.so.7:
$ wget https://github.com/warmcat/libwebsockets/archive/v2.4.1.zip
$ mv v2.4.1.zip libwebsockets-2.4.1.zip
$ unzip libwebsockets-2.4.1.zip
$ cd libwebsockets-2.4.1/
$ mkdir build
$ cd build/
$ cmake ..
$ make
After cp bin/* include/* lib/* with your correct path system (for exemple /usr/local/*)

@Tifaifai
Copy link
Contributor

Tifaifai commented Nov 9, 2018

You can use v2.4.2 :
https://github.com/warmcat/libwebsockets/archive/v2.4.2.zip

and maybe type
$ ldconfig
for update your cache link

@cagdasdoner
Copy link

Updating libwebsocket solves the problem indeed! v2.4.2 compiled and installed from source.

@ralight
Copy link
Contributor

ralight commented May 22, 2019

It seems like moving to libwebsockets 2.4.2 works for a lot of people, so I'm closing this issue.

@ralight ralight closed this as completed May 22, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Aug 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests