Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(engine): do not run redundant cleanup on asyncio.CancelledError #8993

Merged
merged 6 commits into from
Dec 7, 2021

Conversation

mcous
Copy link
Contributor

@mcous mcous commented Dec 6, 2021

Overview

During a halt, an asyncio.CancelledError was propagating upwards in an unexpected (to me) way. This PR ensures that the engine cleanup logic does not run in the case of an asyncio.CancelledError, and simplifies the EngineStatus enum to ensure better status reporting in real-world, PAPIv2 usage.

Closes #8980

Changelog

  • do not run redundant cleanup on asyncio.CancelledError
  • rename ProtocolEngine.stop to ProtocolEngine.finish to better communicate that it should be called to tell the engine it is done working
    • This method is called by the ProtocolRunner when the protocol is done executing
  • rename ProtocolEngine.halt to ProtocolEngine.stop to better communicate that it stops the engine from doing whatever it wants to be doing
    • This lines up better with the HTTP API client-facing actionType: stop, too
  • Remove PAUSE_REQUESTED status enum value for the time being
    • We will re-introduce it for PAPIv3 / JSON schema v6
    • For now, though, it just confuses PAPIv2 execution reporting, because the PAUSED status is unreachable given current implementation of the legacy support plugin

Review requests

Other than checking out the code and test changes, stress test playing, pausing, running to completion, and stopping protocol /runs to see if you can get any unexpected statuses back.

Risk assessment

Low.

@mcous mcous added the robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience). label Dec 6, 2021
@mcous mcous requested a review from a team as a code owner December 6, 2021 21:21
@codecov
Copy link

codecov bot commented Dec 6, 2021

Codecov Report

Merging #8993 (24629cd) into edge (be485fe) will decrease coverage by 0.00%.
The diff coverage is 98.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##             edge    #8993      +/-   ##
==========================================
- Coverage   74.72%   74.72%   -0.01%     
==========================================
  Files        1860     1860              
  Lines       49407    49421      +14     
  Branches     4857     4857              
==========================================
+ Hits        36918    36928      +10     
- Misses      11644    11648       +4     
  Partials      845      845              
Flag Coverage Δ
api 84.68% <98.11%> (-0.01%) ⬇️
robot-server 91.99% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../src/opentrons/protocol_engine/actions/__init__.py 100.00% <ø> (ø)
api/src/opentrons/protocol_engine/types.py 100.00% <ø> (ø)
api/src/opentrons/protocol_runner/task_queue.py 95.65% <85.71%> (+0.30%) ⬆️
...i/src/opentrons/protocol_engine/actions/actions.py 100.00% <100.00%> (ø)
...i/src/opentrons/protocol_engine/protocol_engine.py 100.00% <100.00%> (ø)
...pi/src/opentrons/protocol_engine/state/commands.py 99.21% <100.00%> (+0.05%) ⬆️
...i/src/opentrons/protocol_runner/protocol_runner.py 100.00% <100.00%> (ø)
...hardware_control/emulation/module_server/server.py 75.55% <0.00%> (-4.45%) ⬇️
...i/src/opentrons/hardware_control/thread_manager.py 88.80% <0.00%> (-1.61%) ⬇️

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this is a big improvement in clarity.

@@ -30,6 +32,8 @@ class CommandState:
"""State of all protocol engine command resources."""

is_running: bool
is_gracefully_finished: bool
Copy link
Contributor

@SyntaxColoring SyntaxColoring Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be clearer if is_gracefully_finished were instead named something like is_done_adding_commands, or is_expecting_more_commands?

If I'm reading this correctly, that's the minimal, unique piece of additional state that this boolean adds atop is_running, is_hardware_stopped, and stop_requested. The term "finished" has a lot of overlap with them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure is_expecting_more_commands is quite right, because in the case of a hard stop / cancel, the engine certainly will not be expecting more commands. I think the graceful is important; it's what tells the engine that we'd like to report a success or failure result rather than "stopped"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah. command_stream_closed_gracefully maybe?

I guess I'm trying to pull the name of this boolean towards being about just the input stream of commands, and trying to avoid making it sound like it describes the higher-level state of the whole engine, which is a more complicated notion that needs to be synthesized by get_status().

api/src/opentrons/protocol_engine/state/commands.py Outdated Show resolved Hide resolved
Comment on lines +82 to +84
except asyncio.CancelledError:
log.debug("Run task was cancelled")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this check in join() instead of _run(), to make things easier to match up. We call .cancel() on self._run_task, so it's symmetrical to catch CancelledError from await self._run_task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this check here to exclude asyncio.CancelledError from the run task error capture. We need to capture run task errors here to feed them into the cleanup func

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah drat, yeah. :(

api/src/opentrons/protocol_engine/state/commands.py Outdated Show resolved Hide resolved
@@ -263,16 +288,17 @@ def get_status(self) -> EngineStatus:
all_errors = self._state.errors_by_id.values()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Commenting on validate_action_allowed(), which GitHub won't let me start a thread on):

Should we forbid adding new commands when self._state.is_gracefully_finished or self._state.is_stop_requested is True?

Or would that mess something up about how live LPC commands and protocol commands coexist within the same run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a TODO

Comment on lines +91 to +95

if self._cleanup_func is not None:
await self._cleanup_func(error=error)
elif error:
raise error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a caller, I'd be surprised by cleanup only running conditionally on the run_func succeeding.

But also, as a caller, I never really got the motivation behind this class in the first place, so I might be missing something. I had been thinking of this class as basically an indirection around a try/finally block, where run_func is the try body and cleanup_func is the finally body. So now I guess it's an indirection around just a try block, without the finally? But why would the caller not just write the try itself?

Feel free to say this isn't a question for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've misread this code. This if re-raises the error only if self._cleanup_func is not defined. This is never true in actual usage, so the cleanup func always runs regardless of the success or failure of the run func.

I agree that this dependency is awkward and needs to go away, but it's not for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've misread this code. This if re-raises the error only if self._cleanup_func is not defined. This is never true in actual usage, so the cleanup func always runs regardless of the success or failure of the run func.

Right, but the whole block has been de-indented outside of the finally, right? So if the run_func code raises anything—which we still expect to be possible—then this code won't run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    async def _run(self) -> None:
        error = None

        try:
            if self._run_func is not None:
                await self._run_func()
        except asyncio.CancelledError:
            log.debug("Run task was cancelled")
            raise
        except Exception as e:    # <-- exception in run_func caught here
            log.debug(
                "Exception raised during protocol run",
                exc_info=error,
            )
            error = e             # <-- exception captured, not reraised

        if self._cleanup_func is not None:
            await self._cleanup_func(error=error)  # <-- cleanup runs
        elif error:
            raise error

Copy link
Member

@shlokamin shlokamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested on robot and seems to work, thanks @mcous!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort out interaction between HardwareAPI and new engine-based cancellation logic
3 participants