Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker bridge 2.0.10: Corruption of pollfds on Windows #2173

Closed
NiclasLindgren opened this issue Apr 10, 2021 · 3 comments
Closed

Broker bridge 2.0.10: Corruption of pollfds on Windows #2173

NiclasLindgren opened this issue Apr 10, 2021 · 3 comments

Comments

@NiclasLindgren
Copy link

NiclasLindgren commented Apr 10, 2021

net__socket_close is called which deletes/closes the socket from db.contexts_by_sock, but doesn't call mux__delete(mosq) before trying another connect, when connect is called the socket fd is changed in the context, but the pollfd_index points to the old spot (which also isn't cleared)

This happens in bridge_check which calls mosquitto__check_keepalive which closes the socket, which triggers a call to rc = bridge__connect(context); in the same loop

I am not sure what the proper solution is, because it doesn't seem like you are supposed to call mux__delete from net__socket_close, rather that maybe bridge_check_pending should detect the socket as stale and call mux__delete

however adding mux__delete here (in net_mosq::net__socket_close) works

		HASH_FIND(hh_sock, db.contexts_by_sock, &mosq->sock, sizeof(mosq->sock), mosq_found);
		if(mosq_found){
			HASH_DELETE(hh_sock, db.contexts_by_sock, mosq_found);
			mux__delete(mosq);

This also seems to solve the issue where the broker is "stalled" if there is a bridge that isn't reachable (#2172 )

@NiclasLindgren
Copy link
Author

Another way to make it work would be

static void bridge_check_pending(struct mosquitto *context)
{
int err;
socklen_t len;

if (context->sock == INVALID_SOCKET && context->pollfd_index != -1)
{
	mux__delete(context);
}

But it feels like it is dangerous because alot of code calls net__socket_close which doesn't immediately removes the socket from the poll array

@ralight
Copy link
Contributor

ralight commented Apr 10, 2021

Thanks, you're right it needs some careful consideration of where it should happen. I'll take a look.

@NiclasLindgren
Copy link
Author

Cool, I was wrong that it solves #2172, that problem is much more serious, it seems when you have more than 1 bridge configured only one of them will ever work, all the other (even if reachable) will never connect up, I will investigate. I suspect this must all work on Linux but I am on Windows

@ralight ralight closed this as completed in 238b686 Jun 9, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants