Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBus connection closing unexpectedly #293

Open
chergert opened this issue Sep 9, 2016 · 14 comments
Open

DBus connection closing unexpectedly #293

chergert opened this issue Sep 9, 2016 · 14 comments
Labels

Comments

@chergert
Copy link
Contributor

chergert commented Sep 9, 2016

Occasionally, after a slight bit of inactivity (maybe 30 seconds or so) my next DBus operation fails when tunneling through the flatpak-dbus-proxy.

According to strace (and some extra printfs), it looks like the flatpak-dbus-proxy is getting ECONNRESET from the real dbus daemon. I verified it was the bus side of the proxy by comparing ProxySide *side to &side->client->bus_side in side_closed() of flatpak-proxy.c.

This in turn does a shutdown of the proxy socket to the application, resulting in me getting WARNING: Underlying GIOStream returned 0 bytes on an async read from GDBus.

A small snippet leading up to the error:

poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN|POLLOUT}, {fd=14, events=0}], 5, -1) = 2 ([{fd=5, revents=POLLIN}, {fd=7, revents=POLLOUT}])
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\211\0\0\0;\2\0\0}\0\0\0\10\1g\0\1s\0\0\1\1o\0\25\0\0\0"..., iov_len=281}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 281
sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\20\23\0\0<\2\0\0\270\0\0\0\t\1u\0\3\0\0\0\10\1g\0\20aya"..., iov_len=5080}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[8, 9, 10]}], msg_controllen=32, msg_flags=0}, MSG_NOSIGNAL) = 5080
close(8)                                = 0
close(9)                                = 0
close(10)                               = 0
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=14, events=0}], 5, -1) = 2 ([{fd=5, revents=POLLIN}, {fd=7, revents=POLLIN|POLLERR|POLLHUP}])
read(5, "\6\0\0\0\0\0\0\0", 16)         = 8
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
recvmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\0\0\0\0\233\1\0\0005\0\0\0", iov_len=16}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 16
recvmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\6\1s\0\6\0\0\0:1.723\0\0\5\1u\0;\2\0\0\7\1s\0\24\0\0\0"..., iov_len=56}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 56
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
recvmsg(7, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 ECONNRESET (Connection reset by peer)

I believe FDs 8, 9, and 10 are from a HostCommand RPC to org.freedesktop.Flatpak.Development.

@alexlarsson
Copy link
Member

Hmm, essentially this seems to be doing the "right thing", no? The dbus daemon kicked us off the bus, and the client got a short read. I guess there are two issues here:

  1. The error you get on the sandbox side doesn't seem to be what GDbus normally sees on getting kicked off the bus. (I think?) I wonder how they differ
  2. I assume you didn't expect to get kicked off the bus. Do you have any output from the dbus-daemon when this happened? Can you see what happens with dbus-monitor? Also, you can try running flatpak run with --log-session-bus.

@chergert
Copy link
Contributor Author

chergert commented Sep 9, 2016

I tried looking through the dbus code to find the situations where it would boot us off the connection with timeouts, but I couldn't find any. I had some debug code to watch for the "closed" event on the GDbusConnection and we don't receive one when connected to a normal desktop session. It only occurs for me during the flatpak session.

The only thing different I'm doing in the flatpak version is the HostCommand stuff which includes signal (un)subscribe, but that code was all copied from flatpak-builder.c.

I also tested out dbus-monitor before filing this, and I didn't see anything interesting. But I think that dbus-monitor will only notify of messages, so if there is a problem with wire format stuff, I don't think we'd know about that via dbus-monitor output.

I'll give the --log-session-bus a try and report back.

@chergert
Copy link
Contributor Author

chergert commented Sep 9, 2016

C352: -> org.freedesktop.DBus call org.freedesktop.DBus.AddMatch at /org/freedesktop/DBus
C353: -> org.freedesktop.Flatpak call org.freedesktop.Flatpak.Development.HostCommand at /org/freedesktop/Flatpak/Development
B273: <- org.freedesktop.DBus return from C352

(process:26058): flatpak-WARNING **: Error reading from socket: Error receiving message: Connection reset by peer

The final g_warning() was added by me.

@chergert
Copy link
Contributor Author

chergert commented Sep 9, 2016

I've also been able to replicate this outside the sandbox by using the org.freedesktop.Flatpak.Development interface from the host. So that might indicate something on the service side rather than in the proxy side.

@chergert
Copy link
Contributor Author

chergert commented Sep 9, 2016

I've modified flatpak-session-helper to log when GDBusConnection::closed is emitted, and so far I see no issue. It always stays connected. That is starting to show that the issue exists between the flatpak-proxy and the flatpak-session-helper, which is obviously the dbus session daemon.

I'd guess the dbus-daemon is failing some consistency check (maybe serial related?)

@alexlarsson
Copy link
Member

I wonder if its related to the fd passing that HostCommand uses.

@chergert
Copy link
Contributor Author

chergert commented Sep 12, 2016

I can't confirm it, but I would expect the error to be in dbus-daemon (if it were an fd-passing issue) as that is who I'm connecting with outside the sandbox (and the flatpak-session-helper seems to be unaffected).

I created this test case which just performs the subscribe/hostcommand/unsubscribe dance with a command line provided timeout period. I couldn't get it to fail, so something else must be going on.

https://gist.github.com/chergert/2f11ab7f48c56ab9195c3bb669e4c39f

I even modified the Builder code to use it's own connection, and the error still happens after a given amount of time of inactivity.

@alexlarsson
Copy link
Member

Some things are not super clear. Is this correct:

  • You can reproduce this doing HostCommand frome gnome-builder outside the sandbox (as well as inside)
  • You can't reproduce it in a more limited testcase, either in the sandbox or outside it.

@chergert
Copy link
Contributor Author

chergert commented Sep 12, 2016

Yes, I can reproduce this inside and outside the sandbox. The test-case I linked to above works both inside and outside the sandbox.

I think it's fair to claim that the flatpak-dbus-proxy can be ruled out.

@alexlarsson
Copy link
Member

And by "outside", do you mean that you're completely without proxy, or is it a manually started proxy?

@chergert
Copy link
Contributor Author

chergert commented Sep 13, 2016

No proxy is involved.

@alexlarsson
Copy link
Member

Did you ever figure this out?

@chergert
Copy link
Contributor Author

I did not. We are still using our workaround that creates new GDBusConnections for each process spawn.

@chergert
Copy link
Contributor Author

Pretty sure I haven't seen this in a long time, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants