Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect SDS grpc server logs #51433

Closed
2 tasks done
leosarra opened this issue Jun 6, 2024 · 3 comments · Fixed by #51452
Closed
2 tasks done

Incorrect SDS grpc server logs #51433

leosarra opened this issue Jun 6, 2024 · 3 comments · Fixed by #51452

Comments

@leosarra
Copy link
Contributor

leosarra commented Jun 6, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

I noticed that in case of a termination of pilot-agent, just before the gw/sidecar container stops, we get a
"SDS server for workload certificates started, listening on ... " message.

2024-06-06T11:05:55.569197Z     info    Envoy proxy is ready
2024-06-06T11:06:40.213398Z     info    Agent draining Proxy for termination
2024-06-06T11:06:40.213409Z     info    Status server has successfully terminated
2024-06-06T11:06:40.214709Z     info    Graceful termination period is 5s, starting...
2024-06-06T11:06:41.567277Z     warn    Envoy proxy is NOT ready: server is not live, current state is: DRAINING
2024-06-06T11:06:45.217743Z     info    Graceful termination period complete, terminating remaining proxies.
2024-06-06T11:06:45.217810Z     warn    Aborting proxy
2024-06-06T11:06:45.218312Z     info    Envoy aborted normally
2024-06-06T11:06:45.218338Z     warn    Aborted proxy instance
2024-06-06T11:06:45.218344Z     info    Agent has successfully terminated
2024-06-06T11:06:45.219123Z     info    ads     ADS: "" 1 terminated
2024-06-06T11:06:45.219171Z     info    ads     ADS: "" 2 terminated
2024-06-06T11:06:45.219093Z     info    sds     SDS server for workload certificates started, listening on "./var/run/secrets/workload-spiffe-uds/socket"
~container terminated~

This does not make much sense since the server was definetly running before the termination of the agent took place.

This logs is being printed at the wrong time because the code does not take into account that s.grpcWorkloadServer.Serve is a blocking operation.
As a consequence the init goroutine will be stuck waiting for the server to either throw an error or stop serving.
This is no on par with what I presume was the intended behavior of the init goroutine: 'start the Server up to 5 times and print a success or failure log entry.'
https://github.com/istio/istio/blob/master/security/pkg/nodeagent/sds/server.go#L83-L115

When a termination of pilot-agent happens then Serve() will be stopped (with return value nil) and the code will finally print "SDS server for workload certicates started".

We could remove this log line, as it can be misleading. Ideally, we should preserve it and print it at the appropriate time by having the goroutine to no longer block on Serve. While we could run Serve() in another goroutine, I don´t see an grpc.Server function/flag available to check its status, which would help us determine when to print the message. WTDY?

Version

-

Additional Information

No response

@hzxuzhonghu
Copy link
Member

move just before serve, and change a little

@hzxuzhonghu
Copy link
Member

And more correctly we can check grpc.ErrServerStopped

@leosarra
Copy link
Contributor Author

leosarra commented Jun 7, 2024

move just before serve, and change a little

Something akin to #51452 ?

And more correctly we can check grpc.ErrServerStopped

Do you mean to check the Serve() error and stop the retry early if ErrServerStopped?
I think this one may not be needed, Serve() will return nil if the server is stiopped while Serve is running.
ErrServerStopped will only be returned if we call again Serve() after a stop and that is usually prevented from the stooped bool of the for-loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants