Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in bufWriter.Write() #7389

Closed
veshij opened this issue Jul 4, 2024 · 7 comments · Fixed by #7394
Closed

Infinite loop in bufWriter.Write() #7389

veshij opened this issue Jul 4, 2024 · 7 comments · Fixed by #7394
Assignees

Comments

@veshij
Copy link
Contributor

veshij commented Jul 4, 2024

What version of gRPC are you using?

v1.63.2

What version of Go are you using (go version)?

1.19.9

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

We observed instances of an application with higher-than-expected CPU usage.
Upon investigation following profile was collected:
Screenshot 2024-07-03 at 4 45 02 PM

Corresponding stack:

2 @ 0xaae73a 0xaae452 0xa597e2 0xa5ad6a 0xa8eee7 0xa8ed32 0xa8ccb9 0xaa43b4 0x5b5dc1
#       0xaae739        google.golang.org/grpc/internal/transport.(*bufWriter).flushKeepBuffer+0x139    external/org_golang_google_grpc/internal/transport/http_util.go:355
#       0xaae451        google.golang.org/grpc/internal/transport.(*bufWriter).Write+0x251              external/org_golang_google_grpc/internal/transport/http_util.go:338
#       0xa597e1        golang.org/x/net/http2.(*Framer).endWrite+0xc1                                  external/org_golang_x_net/http2/frame.go:366
#       0xa5ad69        golang.org/x/net/http2.(*Framer).WriteDataPadded+0x389                          external/org_golang_x_net/http2/frame.go:685
#       0xa8eee6        golang.org/x/net/http2.(*Framer).WriteData+0x586                                external/org_golang_x_net/http2/frame.go:643
#       0xa8ed31        google.golang.org/grpc/internal/transport.(*loopyWriter).processData+0x3d1      external/org_golang_google_grpc/internal/transport/controlbuf.go:973
#       0xa8ccb8        google.golang.org/grpc/internal/transport.(*loopyWriter).run+0xb8               external/org_golang_google_grpc/internal/transport/controlbuf.go:558
#       0xaa43b3        google.golang.org/grpc/internal/transport.NewServerTransport.func2+0xf3         external/org_golang_google_grpc/internal/transport/http2_server.go:335

We suspect that under certain conditions the following code might end up in infinite loop:

	for len(b) > 0 {
		nn := copy(w.buf[w.offset:], b)
		b = b[nn:]
		w.offset += nn
		n += nn
		if w.offset >= w.batchSize {
			err = w.flushKeepBuffer()
		}
	}

Conditions to get into infinite loop state:

  • bufWriter received an error in flushKeepBuffer() thus stopped sending any new data/flushing the buffer, resetting w.offset.
  • w.buf is full - copy(w.buf[w.offset:], b) returns 0

It seems to be safe to terminate the for len(b) > 0 cycle on first received error from flushKeepBuffer().

diff --git a/internal/transport/http_util.go b/internal/transport/http_util.go
index 39cef3bd..5258d386 100644
--- a/internal/transport/http_util.go
+++ b/internal/transport/http_util.go
@@ -335,7 +335,9 @@ func (w *bufWriter) Write(b []byte) (n int, err error) {
 		w.offset += nn
 		n += nn
 		if w.offset >= w.batchSize {
-			err = w.flushKeepBuffer()
+			if err = w.flushKeepBuffer(); err != nil {
+				return n, err
+			}
 		}
 	}
 	return n, err
@arjan-bal arjan-bal self-assigned this Jul 4, 2024
@arjan-bal
Copy link
Contributor

arjan-bal commented Jul 4, 2024

@veshij thank you for reporting this issue! It does seem like the situation you mentioned will always arise when w.conn.Write(w.buf[:w.offset]) returns an error and there is more data to write than the buffer capacity.

Are you interested in contributing a PR with the fix that also contains a unit test? If not, I'll pick it up.

@veshij
Copy link
Contributor Author

veshij commented Jul 4, 2024 via email

@arjan-bal
Copy link
Contributor

arjan-bal commented Jul 4, 2024

You can unit test only the bufWriter struct by passing it a conn that returns an error. Then call buffWriter.Write() with a sufficiently large buffer. If the test gets stuck in an infinite loop as we suspect in the current state, and returns error with the fix made, I think it should be good enough.

@arjan-bal
Copy link
Contributor

Possible root cause is this change from 6 years ago: #2092

@veshij
Copy link
Contributor Author

veshij commented Jul 5, 2024

You can unit test only the bufWriter struct by passing it a conn that returns an error. Then call buffWriter.Write() with a sufficiently large buffer. If the test gets stuck in an infinite loop as we suspect in the current state, and returns error with the fix made, I think it should be good enough.

thank you, working on the PR

@veshij
Copy link
Contributor Author

veshij commented Jul 5, 2024

Created https://github.com/grpc/grpc-go/pull/7394/files
Noticed that bytes written (n) returned in case of an error is incorrect in existing implementation, but probably worth fixing in a separate PR.

Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants