-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(engine): do not run redundant cleanup on asyncio.CancelledError #8993
Conversation
Codecov Report
@@ Coverage Diff @@
## edge #8993 +/- ##
==========================================
- Coverage 74.72% 74.72% -0.01%
==========================================
Files 1860 1860
Lines 49407 49421 +14
Branches 4857 4857
==========================================
+ Hits 36918 36928 +10
- Misses 11644 11648 +4
Partials 845 845
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think this is a big improvement in clarity.
@@ -30,6 +32,8 @@ class CommandState: | |||
"""State of all protocol engine command resources.""" | |||
|
|||
is_running: bool | |||
is_gracefully_finished: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer if is_gracefully_finished
were instead named something like is_done_adding_commands
, or is_expecting_more_commands
?
If I'm reading this correctly, that's the minimal, unique piece of additional state that this boolean adds atop is_running
, is_hardware_stopped
, and stop_requested
. The term "finished" has a lot of overlap with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure is_expecting_more_commands
is quite right, because in the case of a hard stop / cancel, the engine certainly will not be expecting more commands. I think the graceful
is important; it's what tells the engine that we'd like to report a success or failure result rather than "stopped"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah. command_stream_closed_gracefully
maybe?
I guess I'm trying to pull the name of this boolean towards being about just the input stream of commands, and trying to avoid making it sound like it describes the higher-level state of the whole engine, which is a more complicated notion that needs to be synthesized by get_status()
.
except asyncio.CancelledError: | ||
log.debug("Run task was cancelled") | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this check in join()
instead of _run()
, to make things easier to match up. We call .cancel()
on self._run_task
, so it's symmetrical to catch CancelledError
from await self._run_task
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this check here to exclude asyncio.CancelledError
from the run task error capture. We need to capture run task errors here to feed them into the cleanup func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah drat, yeah. :(
@@ -263,16 +288,17 @@ def get_status(self) -> EngineStatus: | |||
all_errors = self._state.errors_by_id.values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Commenting on validate_action_allowed()
, which GitHub won't let me start a thread on):
Should we forbid adding new commands when self._state.is_gracefully_finished
or self._state.is_stop_requested
is True
?
Or would that mess something up about how live LPC commands and protocol commands coexist within the same run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a TODO
|
||
if self._cleanup_func is not None: | ||
await self._cleanup_func(error=error) | ||
elif error: | ||
raise error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a caller, I'd be surprised by cleanup only running conditionally on the run_func
succeeding.
But also, as a caller, I never really got the motivation behind this class in the first place, so I might be missing something. I had been thinking of this class as basically an indirection around a try
/finally
block, where run_func
is the try
body and cleanup_func
is the finally
body. So now I guess it's an indirection around just a try
block, without the finally
? But why would the caller not just write the try
itself?
Feel free to say this isn't a question for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've misread this code. This if
re-raises the error only if self._cleanup_func
is not defined. This is never true in actual usage, so the cleanup func always runs regardless of the success or failure of the run func.
I agree that this dependency is awkward and needs to go away, but it's not for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've misread this code. This if re-raises the error only if self._cleanup_func is not defined. This is never true in actual usage, so the cleanup func always runs regardless of the success or failure of the run func.
Right, but the whole block has been de-indented outside of the finally
, right? So if the run_func
code raises anything—which we still expect to be possible—then this code won't run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def _run(self) -> None:
error = None
try:
if self._run_func is not None:
await self._run_func()
except asyncio.CancelledError:
log.debug("Run task was cancelled")
raise
except Exception as e: # <-- exception in run_func caught here
log.debug(
"Exception raised during protocol run",
exc_info=error,
)
error = e # <-- exception captured, not reraised
if self._cleanup_func is not None:
await self._cleanup_func(error=error) # <-- cleanup runs
elif error:
raise error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested on robot and seems to work, thanks @mcous!
Overview
During a halt, an
asyncio.CancelledError
was propagating upwards in an unexpected (to me) way. This PR ensures that the engine cleanup logic does not run in the case of anasyncio.CancelledError
, and simplifies theEngineStatus
enum to ensure better status reporting in real-world, PAPIv2 usage.Closes #8980
Changelog
ProtocolEngine.stop
toProtocolEngine.finish
to better communicate that it should be called to tell the engine it is done workingProtocolRunner
when the protocol is done executingProtocolEngine.halt
toProtocolEngine.stop
to better communicate that it stops the engine from doing whatever it wants to be doingactionType: stop
, tooPAUSE_REQUESTED
status enum value for the time beingPAUSED
status is unreachable given current implementation of the legacy support pluginReview requests
Other than checking out the code and test changes, stress test playing, pausing, running to completion, and stopping protocol
/runs
to see if you can get any unexpected statuses back.Risk assessment
Low.