Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for stream to finish sending before shutdown #3291

Open
matzf opened this issue Oct 19, 2021 · 12 comments
Open

Wait for stream to finish sending before shutdown #3291

matzf opened this issue Oct 19, 2021 · 12 comments

Comments

@matzf
Copy link

matzf commented Oct 19, 2021

An application using quic-go currently has no reliable way to wait for a/all quic sessions/streams to finish sending before shutting down.
When an application writes on a stream, the data is buffered internally in quic-go, similar TCP. If the application then shuts down (or closes the quic session or closes the UDP socket) before all the buffered data has actually been transmitted, this will simply be lost.

Side note: with TCP, this issue (usually) does not occur because the kernel keeps working on this send buffer after the application has shut down (and there is also the option SO_LINGER to control explicitly blocking close calls until all data has been transmitted). In QUIC, this is all in user space, so obviously once the application has shut down, the buffered data will not be transmitted.

Side note: for HTTP/3 this issue does not appear to occur by virtue of how QUIC is used. The server is long-lived to begin with, so any data it sends will reliably be transmitted (but CloseGracefully might be affected). The server usually replies after reading the full request and the (short lived) client then reads the full server reply. Thus, the client's data will be fully transmitted before shutting down.

Demonstration / Reproduction:

main.go
package main

import (
	"context"
	"crypto/tls"
	"flag"
	"fmt"
	"io"
	"strings"
	"time"

	"github.com/lucas-clemente/quic-go"
)

func main() {
	serv := flag.Bool("s", false, "Run as server, default run as client")
	certFile := flag.String("cert", "", "Certificat file for server")
	keyFile := flag.String("key", "", "Key file for server")
	flag.Parse()
	var err error
	if *serv {
		err = server(*certFile, *keyFile)
	} else {
		err = client()
	}
	if err != nil {
		fmt.Println("ERROR", err)
	}
}

func server(certFile, keyFile string) error {
	cert, err := tls.LoadX509KeyPair(certFile, keyFile)
	if err != nil {
		return err
	}
	listener, err := quic.ListenAddr(
		"localhost:3333",
		&tls.Config{
			NextProtos:   []string{"test"},
			Certificates: []tls.Certificate{cert},
		},
		nil,
	)
	if err != nil {
		return err
	}
	ctx := context.Background()
	sess, err := listener.Accept(ctx)
	if err != nil {
		return err
	}
	stream, err := sess.AcceptUniStream(ctx)
	if err != nil {
		return err
	}
	stream.SetReadDeadline(time.Now().Add(1 * time.Second)) // make this time out a bit quicker
	n, err := io.Copy(io.Discard, stream)
	fmt.Println("read", n, "from", sess.RemoteAddr(), "err:", err)
	return nil
}

func client() error {
	sess, err := quic.DialAddr(
		"localhost:3333",
		&tls.Config{NextProtos: []string{"test"}, InsecureSkipVerify: true},
		nil,
	)
	if err != nil {
		return err
	}
	stream, err := sess.OpenUniStream()
	if err != nil {
		return err
	}
	data := []byte(strings.Repeat("test_data_", 1000))
	n, err := stream.Write(data)
	if err != nil {
		return err
	}
	fmt.Println("sent", n, "to", sess.RemoteAddr())
	return stream.Close()
}

Run:

  • Start server: go run main.go -s quic-go/internal/testdata/cert.pem -key quic-go/internal/testdata/priv.key
  • Start client: go run main.go
  • Observe that (usually):
    • client says: "sent 10000 to 127.0.0.1:3333`
    • server says eg: "read 2346 from 127.0.0.1:49593 err: deadline exceeded"
      Note that this is issue inherently timing dependent. Sometimes the server reports success (full data read and no error), or reading the full data but a timeout (the FIN for the stream was not sent/received).

Notes:

  • If we add a sufficiently long time.Sleep() in there after closing the stream in the client, this example will work reliably.
  • The example uses a unidirectional stream to make it obvious which way the data flows here. But bidirectional streams are affected just the same.
  • In the example, the client is short lived and it's data send stream is cut off. However, a short lived server (i.e. a server sending a response and terminating immediately after closing the stream) would have the same problem. This seems like a less common scenario, but it could still happen (something like netcat -l).

Proposal:

Add an API to allow an application to block until all data on a stream / session has been sent before shutting down.
This could be either on the level of a SendStream or a Session (or both). For the shutdown issue described above, something on the level of the Session might be more convenient. On the other hand, having such a functionality on the level of Streams could facilitate other advanced usage.

Perhaps the most straight forward way to add this to the interface would be:

  • Session.CloseWithErrorSync(context.Context, ApplicationErrorCode, string) error: this blocks until all buffered data has been transmitted and then closes the session.
  • Stream.CloseSync(context.Context) error: close the stream and then wait for all buffered data to be transmitted.
@marten-seemann
Copy link
Member

Have you considered that data can not only be lost in a buffer, but also when in transfer? Waiting until the data has been sent out doesn't help you, as the packets might still be lost while in flight. If you want a clean shutdown with a guarantee that all data has been transmitted, you'll need to introduce a signal on the application layer.

@matzf
Copy link
Author

matzf commented Oct 19, 2021

Thanks for your reply and sorry for being imprecise. Sure, yes I've considered that packets might lost -- by "transmitted", I meant "transmitted and acknowledged".

@marten-seemann
Copy link
Member

That won’t work either. The ACK might be lost.

@matzf
Copy link
Author

matzf commented Oct 19, 2021

Ok sure, if the ACKs are lost, then the proposed CloseSync will time out eventually. It's ok that this can still fail, but it would be useful to have the option to at least give it a reasonable shot.
True, adding a signal on the application layer does allow to circumvent the lack of this functionality, but this would appear to add complexity to the application (at least in those cases where such a signal does not match the natural protocol) without being any more reliable than the proposed API. As quic-go internally has all the required information to implement such a CloseSync (or whatever API would be finally chosen), it would also feel like a hack to build this with additional signalling on top, instead of into the library.

@marten-seemann
Copy link
Member

I disagree. The way that streams are used is highly dependent on the application, and what works well for one application doesn’t work for the other.
Introducing an API that breaks in the case of packet loss seems incredibly brittle.

@matzf
Copy link
Author

matzf commented Oct 19, 2021

This must be a misunderstanding, the suggested API does not "break" in case of packet loss. Perhaps the function names that I suggested are misleading? The idea is that it would allow to wait for the internal buffer to become empty, which, indeed, is something that may never happen. I don't understand why you think this means it "breaks" -- with this interpretation Write on a stream would break too. To illustrate this point:

stream.Write(data)
stream.Write(make([]byte, SendBufferSize))
stream.Write([]byte{"x"}) // once this returns, we know that `data` was transmitted successfully.
                          // This is the condition we want to await, just without sending the spurious data.

Without this API, there does not appear to be a way to use quic-go as a "drop-in" replacement for TCP without jumping through hoops to ensure correct termination.

@marten-seemann
Copy link
Member

This must be a misunderstanding, the suggested API does not "break" in case of packet loss. Perhaps the function names that I suggested are misleading.

You're right. The sender will send PTO packets to force the acknowledgement.

Without this API, there does not appear to be a way to use quic-go as a "drop-in" replacement for TCP without jumping through hoops to ensure correct termination.

I think you're right. I wasn't aware how SO_LINGER works in detail. Here's a good explanation: https://ndeepak.com/posts/2016-10-21-tcprst/

Session.CloseWithErrorSync(context.Context, ApplicationErrorCode, string) error: this blocks until all buffered data has been transmitted and then closes the session.

We would have to be very explicit that the context has to be canceled at some point. Otherwise a malicious peer could just refuse to acknowledge STREAM data it has received, and make us wait forever.

@matzf
Copy link
Author

matzf commented Oct 26, 2021

Thanks for reconsidering this!

In the meantime, I took a cursory look at how this is handled in a few other quic implementations;
In quiche, there is an is_draining method on the level of a session, that tells the application that "the connection object cannot yet be dropped, but no new application data can be sent or received", and that the application should continue running the "event loop". Quicly has a similar API; after closing a connection, applications need to keep calling quicly_send until this returns a code indicating that the connection can be closed and freed.
In quinn, streams have a finish method that returns a Future which "completes when the peer has acknowledged all sent data, retransmitting data as needed". There is also a reset method that allows to close the stream immediately.

None of these approaches really fit quic-go, but perhaps it does suggest a different way to model this: instead of providing a CloseSync method that blocks internally, we could keep only the Close that returns immediately and add a new method Drained() <-chan struct{}, returning a channel that will be closed once the stream has been drained; like so:

stream.Write(data)
stream.Close()
select {
  case <-someCtx.Done(): // or timer, or whatever
   // timeout
  case <-stream.Drained():
   // ok
}

Or alternatively, the ~same thing on the session level. This would make the blocking behavior much more obvious, perhaps reducing the risk that a caller accidentally passes a context that's never cancelled. It also seems to less ergonomic for what I imagine are the typical uses cases though, so I think my personal preference would still be with the initial suggestion.

@guodoliu
Copy link

guodoliu commented Dec 2, 2022

@marten-seemann Hi, any update on this enhancement? i run into the same issue

@nisdas

This comment was marked as off-topic.

@marten-seemann

This comment was marked as off-topic.

@sukunrt
Copy link
Collaborator

sukunrt commented Apr 10, 2024

I like the idea of a notification that the stream has finished sending, but that will not help with a graceful connection close. This is because QUIC doesn't specify any behavior with regards to received data on streams on receiving CONNECTION_CLOSE. This would mean that even if the quic layer has ACKed the data, remote quic stack might discard the data on receiving a CONNECTION_CLOSE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants