Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABORT due to failing assertion #1700

Closed
smurfix opened this issue May 21, 2020 · 8 comments
Closed

ABORT due to failing assertion #1700

smurfix opened this issue May 21, 2020 · 8 comments

Comments

@smurfix
Copy link

smurfix commented May 21, 2020

1.6.9, two brokers with persistence=false retain_available=false.

Mai 21 18:12:30 pi-c3 mosquitto[7800]: Connecting bridge (step 1) upstream (10.107.0.90:51883)
Mai 21 18:12:30 pi-c3 mosquitto[7800]: Connecting bridge (step 2) upstream (10.107.0.90:51883)
Mai 21 18:12:30 pi-c3 mosquitto[7800]: mosquitto: /build/mosquitto-coWg4m/mosquitto-1.6.9/src/loop.c:730: loop_handle_reads_writes: Assertion `pollfds[context->pollfd_index].fd == context->sock' failed.
Mai 21 18:12:30 pi-c3 systemd[1]: mosquitto.service: Main process exited, code=killed, status=6/ABRT
Mai 21 18:12:30 pi-c3 systemd[1]: mosquitto.service: Failed with result 'signal'.
@ralight
Copy link
Contributor

ralight commented May 21, 2020

What platform are you building on, and what build options are you using? Is there anything else in your config file apart from the bridge and retain_available false?

@smurfix
Copy link
Author

smurfix commented May 21, 2020

Log:

1590079596: mosquitto version 1.6.9 starting
1590079596: Config loaded from /etc/mosquitto/mosquitto.conf.
1590079596: Opening ipv4 listen socket on port 51883.
1590079596: Connecting bridge (step 1) dev (10.107.3.2:51883)
1590079596: Connecting bridge (step 2) dev (10.107.3.2:51883)
1590079596: Socket error on client local.dev, disconnecting.
1590079601: Connecting bridge (step 1) dev (10.107.3.2:51883)
[ death here ]

strace:

[pid 77380] write(8, "\0204\0\4MQTT\204.\0\5\0\3dev\0 $SYS/broker/connection/dev/state\0\0010", 54) = 54
[pid 77380] read(8, "", 1)              = 0

while the other side does (different bridge peer but same problem)

4281  rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
4281  read(8, "\20", 1)                 = 1
4281  read(8, "8", 1)                   = 1
4281  read(8, "\0\4MQTT\204,\0\5\0\5pi-f3\0\"$SYS/broker/connection/pi-f3/state\0\0010", 56) = 56
4281  clock_gettime(CLOCK_MONOTONIC, {tv_sec=1112514, tv_nsec=277688747}) = 0
4281  gettimeofday({tv_sec=1590079989, tv_usec=16567}, NULL) = 0
4281  gettimeofday({tv_sec=1590079989, tv_usec=16766}, NULL) = 0
4281  getpid()                          = 4281
4281  send(5, "<29>May 21 18:53:09 mosquitto[4281]: Socket error on client <unknown>, disconnecting.", 85, MSG_NOSIGNAL) = 85
4281  close(8)                          = 0

so apparently there are two bugs here:

  • aborting a bridge connection causes the connecting server to die an ignoble death some time later
  • when retain_available is off, that $SYS/…/state LWT message should be sent without retain

@smurfix
Copy link
Author

smurfix commented May 21, 2020

Platform is Debian, I'm using their 1.6.9 packages. I can build a minimum reproducer with the current mosquitto master as soon as I finish the job which this bug has rudely interrupted. ;-)

@ralight
Copy link
Contributor

ralight commented May 21, 2020

Ok, I can reproduce it now thank you. This only occurs if epoll isn't being used and the Debian packages should absolutely definitely be using epoll, but they aren't.

@smurfix
Copy link
Author

smurfix commented May 22, 2020

Right, they're using poll. I'll file a bug report.

On a related note: why the censored does mosquitto not sleep longer than 100msec? I'd like my computer to save power when nobody is talking to it …

@ralight
Copy link
Contributor

ralight commented May 22, 2020

There's probably no need, I'll do a new release 1.6.10 and fix the package at the same time.

100ms - it's just been like that forever. I suppose this could be raised to 1 second at most.

@smurfix
Copy link
Author

smurfix commented May 22, 2020

Umm, but why is there a periodic timer anyway? Presumably the code could actually discover how long it should sleep, and then just wait that long.

@ralight
Copy link
Contributor

ralight commented May 22, 2020

It's just a matter of simplicity and available time to make changes. Having said that, some other changes I've been making would make this a good time to revisit it.

ralight added a commit that referenced this issue May 25, 2020
This only occurs when compiled without epoll support.

Closes #1700. Thanks to Matthias Urlichs.
FranciscoKnebel pushed a commit to Open-Digital-Twin/mosquitto that referenced this issue Jul 30, 2020
This only occurs when compiled without epoll support.

Closes eclipse#1700. Thanks to Matthias Urlichs.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants