fix(api): Various E-stop fixes #14929

SyntaxColoring · 2024-04-16T21:24:51Z

Overview

Fixes various issues with ProtocolEngine's handling of E-stops—some that go back to when the code was initially added, and some that I accidentally introduced while working on error recovery.

Closes EXEC-382.

Test Plan

All of the following should behave reasonably.

Given that we're not sure how much of this was working properly in v7.2, I think we can be flexible with our interpretation of "behaves reasonably." For example, it's now theoretically possible that the UI briefly shows the run as "stop requested," where it did not before. I think that's fine as long as nothing functionally bad happens, like the run hanging.

Changelog

ProtocolEngine.estop() was doing low-level state manipulation for individual commands. I accidentally broke this in refactor(api): Split UpdateCommandAction #14726 when I made the command lifecycle stricter; I made it start raising AssertionErrors.

To fix this, .estop() is rewritten to do nothing more complicated than what .stop() does. This leaves the command state manipulation to CommandStore, where it canonically belongs and where it can be unit-tested easily. This is the change that fixes EXEC-382.
ProtocolEngine.estop() was eagerly dispatching a HardwareStoppedAction. This is normally reserved for after the protocol has exited and the hardware has settled. But it was issuing it right away, basically lying to higher layers about the engine being "finished."

This is fixed by removing all of that from .estop(), and relying on the existing teardown code in .finish() instead. This makes E-stop handling work basically the same way as other fatal run errors.
We were naively calling ProtocolEngine.estop() from a hardware event callback. This did not look thread-safe to me: the hardware event callback presumably runs in the hardware API thread, and the ProtocolEngine lives in the main server thread. See ProtocolEngine's concurrency requirements.

This is fixed by wrapping the call in loop.call_soon_threadsafe().

Review requests

None in particular.

Risk assessment

Medium. This can only be tested manually, and it's dealing with concurrency, and touching some confusing parts of our code.

The E-stop is a run-level fatal error, so we want to treat it like other run-level fatal errors: pass it in to ProtocolEngine.finish(), not ProtocolEngine.request_stop().

sfoster1

This looks great to me, as discussed

DerekMaggio · 2024-04-18T14:50:50Z

Could any of this affect other flows that are part of the ProtocolEngine? Namely, LPC using PE, and the PE calibration command.
We should do a sanity check and push the e-stop during these flows to ensure it functions as expected.

SyntaxColoring · 2024-04-18T14:57:22Z

Could any of this affect other flows that are part of the ProtocolEngine? Namely, LPC using PE, and the PE calibration command. We should do a sanity check and push the e-stop during these flows to ensure it functions as expected.

It definitely can affect those, and we definitely should do that testing. I will test at least the things mentioned in the test plan. Feel free to add any more that you think of.

TamarZanzouri

Great work! TY!

SyntaxColoring · 2024-04-18T21:02:54Z

~~I'm looking into issues where, after an E-stop, the run acts as if it was cancelled in the normal way, and tries to home.~~ Fixed, see the messages in the commits below.

This is intended to feed into ProtocolEngine.finish()'s magic error code searching.

This reverts commit 320dfe8. This solution was insufficient because it didn't handle the case where a command started and was then asyncio .cancel()'d by QueueWorker.cancel().

This fixes problems I introduced in this PR where certain E-stops would still try to drop tips and home at the end of the run. I thought that inspecting the `error` argument passed in to `finish()` would be enough to determine whether the run is finishing because of an E-stop. It's not. When the physical E-stop button is pressed, our top-down interruption from `ProtocolEngine.estop()` races against any bottom-up interruption from the electronics. The top-down interruption can either: * Raise a `RunStoppedError` when the protocol tries to run a new command * Raise a `RunStoppedError` when when the current command is asyncio `.cancel()`'d Neither of which currently carry any sign that they came from an E-stop. Until they do, we need a separate `from_estop` flag in `CommandState`.

We permanently set the run result to `FAILED` or `STOPPED` when a stop is requested, not just when the engine is `.finish()`'d. This seems premature to me, but, taking it for granted, it means we need to make sure that we choose `FAILED` if the stop was requested because of an E-stop. Also, because of that premature setting of the run result, we need to make sure that when the E-stop-induced `FinishAction` comes in, the run result already being set doesn't stop it from setting the run error.

SyntaxColoring added 4 commits April 16, 2024 17:09

ProtocolEngine.finish() docstring.

5cd0b75

Remove dead code.

4c67b02

Gut estop().

64cadc5

Do not finish() from estop(), but call finish() when we call estop().

f9d3ce9

SyntaxColoring force-pushed the estop_fixes branch from 08d0920 to f9d3ce9 Compare April 16, 2024 21:26

SyntaxColoring added 7 commits April 16, 2024 17:33

Elaborate comment.

2652d54

Clean up EStopActivatedError.

25d579b

Remove from_estop from StopAction.

254a9aa

The E-stop is a run-level fatal error, so we want to treat it like other run-level fatal errors: pass it in to ProtocolEngine.finish(), not ProtocolEngine.request_stop().

Update estop() docstring.

99f1603

Update engine store tests.

96ff284

Update unit test for EStopActivatedError cleanup.

0031b68

More unit test updates.

b348e84

SyntaxColoring marked this pull request as ready for review April 18, 2024 13:52

SyntaxColoring requested a review from a team as a code owner April 18, 2024 13:52

SyntaxColoring requested review from a team April 18, 2024 13:52

sfoster1 approved these changes Apr 18, 2024

View reviewed changes

TamarZanzouri approved these changes Apr 18, 2024

View reviewed changes

SyntaxColoring added 2 commits April 18, 2024 14:28

Merge branch 'edge' into estop_fixes

ad08946

Delete outdated comment.

caf10ae

SyntaxColoring added 5 commits April 19, 2024 10:02

Wrap EStopActivatedError from validate_action_allowed().

320dfe8

This is intended to feed into ProtocolEngine.finish()'s magic error code searching.

Revert "Wrap EStopActivatedError from validate_action_allowed()."

2c3dd03

This reverts commit 320dfe8. This solution was insufficient because it didn't handle the case where a command started and was then asyncio .cancel()'d by QueueWorker.cancel().

Add unit test for run state after E-stop.

e7c936e

SyntaxColoring merged commit 4418128 into edge Apr 19, 2024
44 checks passed

SyntaxColoring deleted the estop_fixes branch April 19, 2024 19:41

SyntaxColoring mentioned this pull request May 17, 2024

fix(api): restore empty error blocks on cancelled runs #15215

Merged

2 tasks

Carlos-fernandez pushed a commit that referenced this pull request May 20, 2024

fix(api): Various E-stop fixes (#14929)

3aedbde

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): Various E-stop fixes #14929

fix(api): Various E-stop fixes #14929

SyntaxColoring commented Apr 16, 2024 •

edited

Loading

sfoster1 left a comment

DerekMaggio commented Apr 18, 2024 •

edited

Loading

SyntaxColoring commented Apr 18, 2024

TamarZanzouri left a comment

SyntaxColoring commented Apr 18, 2024 •

edited

Loading

fix(api): Various E-stop fixes #14929

fix(api): Various E-stop fixes #14929

Conversation

SyntaxColoring commented Apr 16, 2024 • edited Loading

Overview

Test Plan

Changelog

Review requests

Risk assessment

sfoster1 left a comment

Choose a reason for hiding this comment

DerekMaggio commented Apr 18, 2024 • edited Loading

SyntaxColoring commented Apr 18, 2024

TamarZanzouri left a comment

Choose a reason for hiding this comment

SyntaxColoring commented Apr 18, 2024 • edited Loading

SyntaxColoring commented Apr 16, 2024 •

edited

Loading

DerekMaggio commented Apr 18, 2024 •

edited

Loading

SyntaxColoring commented Apr 18, 2024 •

edited

Loading