Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread sanitizer reveals missing locks #1374

Closed
jefftrull opened this issue Aug 8, 2019 · 9 comments
Closed

Thread sanitizer reveals missing locks #1374

jefftrull opened this issue Aug 8, 2019 · 9 comments
Milestone

Comments

@jefftrull
Copy link

I recently ran a client's software with gcc's thread sanitizer enabled. It revealed some issues in mosquitto that I've worked around in a fork but I felt I should document them here. The problems are:

  • Inconsistent protection of mosq->in_callback and mosq->state by their respective mutexes
  • mosq->sock may need its own mutex

I've attached a very rough patch showing where the issues are.

@ralight
Copy link
Contributor

ralight commented Aug 8, 2019

Thanks for this. Out of interest, is the code using loop_start(), or threads external to the library?

@jefftrull
Copy link
Author

jefftrull commented Aug 8, 2019

External threads in this case. Do you suspect a user error :)

(we are calling mosquitto_threaded_set)

@ralight
Copy link
Contributor

ralight commented Aug 8, 2019

Not at all, it's just good to know the situation.

I've not look at it properly yet, but I'm curious to know your reasoning as to why you've removed the mutex from the callback calls.

@jefftrull
Copy link
Author

I found that mosquitto_reinitialise was getting called from them and thus the mutexes were reset, causing the following unlock to silently fail... Could this be something we're doing wrong?

@jefftrull
Copy link
Author

mosq->in_callback still gets protected

@jefftrull
Copy link
Author

jefftrull commented Aug 8, 2019

I'm sorry, mosquitto_reinitialise is not getting called again. Something else is happening in the callback that is messing up the locks somehow. Investigating...

@jefftrull
Copy link
Author

OK, thanks for pointing this out. The changes in locking of callback_mutex were a rabbit hole triggered by the fact that we call mosquitto_disconnect from our main thread, taking a path where callback_mutex was not already held. It looks like the (much) easier fix is to change a single test in packet_mosq.c - the value of in_callback is only interesting in non-threaded mode there, so it's safe.
New diff attached

@jefftrull
Copy link
Author

Further investigation reveals that the first change in that last diff (removing the access to mosq->sock) is also unnecessary. Every remaining change protects mosq->state.

@ralight ralight added this to the 1.6.5 milestone Sep 3, 2019
@ralight
Copy link
Contributor

ralight commented Sep 8, 2019

Thanks, I've now added a commit that implements this.

@ralight ralight closed this as completed Sep 8, 2019
ralight added a commit that referenced this issue Sep 8, 2019
Closes #1374. Thanks to Jeff Trull.
ralight added a commit that referenced this issue Sep 18, 2019
Closes #1374. Thanks to Jeff Trull.
@lock lock bot locked as resolved and limited conversation to collaborators Dec 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants