Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: robot-server errors during shutdown if there's an active RPC connection #8155

Closed
SyntaxColoring opened this issue Jul 23, 2021 · 1 comment · Fixed by #8913
Closed
Labels
bug robot server Affects the `robot-server` project robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience). software-investigate Our Software team needs to look into this so we can understand more about it.

Comments

@SyntaxColoring
Copy link
Contributor

SyntaxColoring commented Jul 23, 2021

Overview

The opentrons-robot-server service shows errors if it's shut down under certain circumstances, even though the shutdown is supposed to be graceful. This maybe suggests buggy handling of some resource lifetimes?

Steps to reproduce

  1. SSH into a robot and run journalctl -f -u opentrons-robot-server.
  2. Connect to the robot with the Opentrons App. Toggle the switch next to the robot's name to start an RPC connection over a WebSocket. (I think that's what that does, anyway?)
  3. Run systemctl stop opentrons-robot-server over SSH.

Current behavior

These errors are printed:

Jul 22 18:27:11 red-dust opentrons-api[713]: While reading from socket:
                                             Traceback (most recent call last):
                                               File "usr/lib/python3.7/site-packages/robot_server/service/legacy/rpc/rpc.py", line 157, in handle_new_connection
                                               File "usr/lib/python3.7/site-packages/starlette/websockets.py", line 98, in receive_json
                                               File "usr/lib/python3.7/site-packages/starlette/websockets.py", line 80, in _raise_on_disconnect
                                             starlette.websockets.WebSocketDisconnect: 1012
Jul 22 18:27:11 red-dust opentrons-api[713]: Closing WebSocket 1900571920
Jul 22 18:27:11 red-dust opentrons-api[713]: Exception in ASGI application
                                             Traceback (most recent call last):
                                               File "usr/lib/python3.7/site-packages/uvicorn/protocols/websockets/wsproto_impl.py", line 209, in run_asgi
                                               File "usr/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
                                               File "usr/lib/python3.7/site-packages/fastapi/applications.py", line 149, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/applications.py", line 102, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/middleware/errors.py", line 146, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/middleware/base.py", line 21, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/middleware/cors.py", line 68, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/exceptions.py", line 58, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/routing.py", line 550, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/routing.py", line 283, in handle
                                               File "usr/lib/python3.7/site-packages/starlette/routing.py", line 57, in app
                                               File "usr/lib/python3.7/site-packages/fastapi/routing.py", line 242, in app
                                               File "usr/lib/python3.7/site-packages/robot_server/service/legacy/routers/rpc.py", line 15, in websocket_endpoint
                                               File "usr/lib/python3.7/site-packages/robot_server/service/legacy/rpc/rpc.py", line 165, in handle_new_connection
                                               File "usr/lib/python3.7/site-packages/starlette/websockets.py", line 142, in close
                                               File "usr/lib/python3.7/site-packages/starlette/websockets.py", line 68, in send
                                               File "usr/lib/python3.7/site-packages/uvicorn/protocols/websockets/wsproto_impl.py", line 281, in send
                                               File "usr/lib/python3.7/site-packages/wsproto/__init__.py", line 64, in send
                                               File "usr/lib/python3.7/site-packages/wsproto/connection.py", line 101, in send
                                             wsproto.utilities.LocalProtocolError: Connection cannot be closed in state ConnectionState.LOCAL_CLOSING

And this one, which is more concerning because it happens during the ASGI shutdown handler, which may prevent subsequent shutdown handlers from running and hide further errors.

Jul 22 18:27:11 red-dust opentrons-api[713]: Waiting for application shutdown.
Jul 22 18:27:11 red-dust opentrons-api[713]: Exception in 'lifespan' protocol
                                             Traceback (most recent call last):
                                               File "usr/lib/python3.7/site-packages/uvicorn/lifespan/on.py", line 48, in main
                                               File "usr/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
                                               File "usr/lib/python3.7/site-packages/fastapi/applications.py", line 149, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/applications.py", line 102, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/middleware/errors.py", line 146, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/middleware/base.py", line 21, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/middleware/cors.py", line 68, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/exceptions.py", line 58, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/routing.py", line 539, in __call__
                                               File "usr/lib/python3.7/site-packages/starlette/routing.py", line 526, in lifespan
                                               File "usr/lib/python3.7/site-packages/starlette/routing.py", line 504, in shutdown
                                               File "usr/lib/python3.7/site-packages/robot_server/app.py", line 70, in on_shutdown
                                               File "usr/lib/python3.7/site-packages/robot_server/service/legacy/rpc/rpc.py", line 75, in on_shutdown
                                               File "usr/lib/python3.7/site-packages/starlette/websockets.py", line 142, in close
                                               File "usr/lib/python3.7/site-packages/starlette/websockets.py", line 70, in send
                                             RuntimeError: Cannot call "send" once a close message has been sent.
Jul 22 18:27:11 red-dust opentrons-api[713]: Application shutdown complete.

Expected behavior

opentrons-robot-server should shut down cleanly.

@SyntaxColoring SyntaxColoring added bug software-investigate Our Software team needs to look into this so we can understand more about it. robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience). robot server Affects the `robot-server` project labels Jul 23, 2021
@SyntaxColoring
Copy link
Contributor Author

I believe this is moot with the merge of #8913, which removed this RPC endpoint entirely.

I don't know about the "Cannot call 'send' once a close message has been sent" error, but the "Connection cannot be closed in state ConnectionState.LOCAL_CLOSING" error looks to have been caused by this:

async def on_shutdown(self):
"""
Graceful shutdown handler
See https://docs.aiohttp.org/en/stable/web.html#graceful-shutdown
"""
for client_write_tasks in self.clients.copy():
await client_write_tasks.socket.close(code=WS_1001_GOING_AWAY)
self.shutdown()

Closing each open WebSocket on shutdown like this was apparently correct when robot-server was on aiohttp, but incorrect after we ported it to FastAPI. This is because FastAPI shutdown handlers run after all connections are already closed (I think), so closing them again here was redundant and an error.

mcous added a commit that referenced this issue Nov 29, 2021
…8913)

Closes #8155

BREAKING CHANGE: The RPC WebSocket endpoint has been removed. The /protocols endpoints have been
heavily reworked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug robot server Affects the `robot-server` project robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience). software-investigate Our Software team needs to look into this so we can understand more about it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant