Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: how to safely handle a keyboard interrupt in a protocol #11360

Open
laura-wyzer opened this issue Aug 18, 2022 · 14 comments
Open

question: how to safely handle a keyboard interrupt in a protocol #11360

laura-wyzer opened this issue Aug 18, 2022 · 14 comments
Labels
software-investigate Our Software team needs to look into this so we can understand more about it.

Comments

@laura-wyzer
Copy link

Overview

A program we wrote with the Opentrons API uses a try/except block so that the program can stop, home the robot, and drop tips when the user hits Ctrl+C. This has been working for weeks, then all of the sudden had unexpected behavior when we interrupted the program as it was picking up a tip. After homing the robot, the pipette rapidly traversed the deck and jammed into the loaded labware, cause the tip to bend. The error below was produced:

SENDTOOPENTRONS

We understand that interrupting the program as it was picking up a tip could cause an error (even though we don't know how that would happen), but are unsure why the pipette flew across the deck?

Steps to reproduce

  1. Create a program that uses the API to run a protocol and catches keyboard interruptions that occur while running the protocol.
  2. Cause a keyboard interruption right when the robot is picking up a tip

note: do not do this with the labware on the deck since this broke our pipette tip

Current behavior

  1. Keyboard interruptions that occur while a tip is being picked up cannot be handled
  2. In response to not handling this issue, the robot traverses the deck and ignores the loaded labware

Expected behavior

Upon any keyboard interruption at any point in the protocol, the robot should follow the defined steps in our try/except block (homing the robot, dropping the tips, ending the program)

Operating system

Windows

System and robot setup or anything else?

  • ran in jupyter notebook terminal
  • API defined by opentrons.protocol_api.MAX_SUPPORTED_VERSION.
@mcous mcous added software-investigate Our Software team needs to look into this so we can understand more about it. and removed bug labels Aug 18, 2022
@mcous
Copy link
Contributor

mcous commented Aug 18, 2022

Are you able to post your protocol, or some minimal reproduction that shows how and where exactly you are using a try/except, and what's in your recovery block? If your try/except is overly broad, it can prevent proper operation of the protocol execution system. There's also a high chance it interferes with the hardware control layer's understanding of robot state (e.g. is there a tip on the pipette?)

@mcous mcous changed the title bug: question: how to safely handle a keyboard interrupt in a protocol Aug 18, 2022
@laura-wyzer
Copy link
Author

Hi Mike,
Thank you for getting back to me! We have a nondisclosure policy that prevents us from posting our code. However, if there is a confidential way to send it to a member of Opentrons I can certainly do that as long as it is not shared with a third party. It usually seems like the robot maintains an understanding of whether or not there is a pipette on the tip, since the recovery block accurately drops the tip if there is a tip on it (following code that we wrote) whereas it just homes the robot and ends as instructed if there are no tips on the pipette. There is nothing in our code that tells the robot to move back across the deck after homing (which is what happened when the tip broke), so I think it would be more likely that our program may be preventing proper operation of the protocol execution system. Please let me know how to best share our code and thank you for the help!
Best,
Laura Drepanos

@mcous
Copy link
Contributor

mcous commented Aug 22, 2022

You may email me directly at mike at opentrons dot com. However, a full protocol may be difficult for me to read or reason with. A small reproduction protocol would be much more helpful, if possible.

It would also be helpful simply to see how you are constructing your try/except block, without any of the contents of the try, e.g.

  • What exception type are you catching in the except
  • How deep in the protocol is the try block, or are you wrapping the entire contents of run
  • How are you executing the protocol (e.g. are you using opentrons_execute)?

@laura-wyzer
Copy link
Author

laura-wyzer commented Aug 22, 2022

Hi Mike,
I see what you are saying now! Here is the general structure of the program/ lines that may be important:

import opentrons.execute, from opentrons import protocol_api
protocol = opentrons.execute.get_protocol_api(opentrons.protocol_api.MAX_SUPPORTED_VERSION)

def run(protocol: protocol_api.ProtocolContext):
    ####_(protocol defined within the run function)_

try:
    run(protocol)
    protocol.home()
except KeyboardInterrupt:
    print(" user stopped program!")
    protocol.home()
    for i in protocol.loaded_instruments.values():
        if i.has_tip:
            i.drop_tip()
except Exception as e:
    traceback.print_exc()
    protocol.home()
    for i in protocol.loaded_instruments.values():
        if i.has_tip:
            i.drop_tip()

So to answer your questions:

  • the block catches a keyboard interruption and any other exceptions as two separate blocks (since they should have different print statements)
  • I am wrapping the entire contents of run in the try/except block
  • the code I included should show how this is executed (then this program is run on the commandline in jupyter terminal)

Best,
Laura

@mcous
Copy link
Contributor

mcous commented Aug 25, 2022

Thanks for the snippet, that is helpful! I definitely think this is an unsafe construct. The fact that it worked for a while just means you got luck with the timings of your ctrl-c presses.

It's very important that the hardware is told to halt before continuing with any cleanup activities, like drop tip. In fact, this is exactly how protocols run via the app/HTTP API behave when you issue a cancel:

  1. Halt the hardware to stop all movement / prevent future movement
  2. Reset the hardware
  3. Proceed with any homing and drop tips

Without the halt, this exception handler as written may start interleaving requests to the hardware layer (which is running asynchronously in another thread), causing unexpected movements like the ones you observed.

I'm going to need to look into how to best do this in Jupyter / command line, but it will likely be something along the lines of "move recovery to its own script / process so that everything can settle after a KeyboardInterupt happens"

@laura-wyzer
Copy link
Author

Hi Mike,
I see what you are saying, thanks for the help. I'm not sure how I would halt the hardware using python code since there isn't anything about this on the website, so that would be very helpful if you are able to find any options! Or if there are any alternative implementations for replicating the app's feature of homing the robot and dropping tips when the program is cancelled, I can look into those.
Thank you,
Laura

@mcous
Copy link
Contributor

mcous commented Aug 26, 2022

@laura-wyzer it's going to take me until Monday to really start testing this out, but in the mean time, do you know what happens to your protocol if you remove the try/except block and simply press ctrl-c? Does the protocol halt, or does it continue executing?

@laura-wyzer
Copy link
Author

Hi Mike, the protocol does halt if you remove the try/except block and press ctrl+c.

@caroline-wyzer
Copy link

Hi Mike,
I'm a colleague of Laura's. Any updates on how this might work? We would definitely like to implement something other than just the ctrl+c, since we'll be running most of our programs through ssh anyways. I'm looking through the code on here now and trying to figure out what to replicate.
Thanks!

@croots
Copy link

croots commented Sep 13, 2022

Hi,

I don't know if its useful, but I wanted to pitch in and say that we are seeing a similar issue on our end that matches the behavior @mcous is mentioning. It seems like whenever a set of instructions is halted mid action (pick up tip, drop tip) and then an action is immediately queued, the robot will finish whatever it was doing after the next queued action.

This behavior persists across separate jupyter notebook blocks. If I interrupt the interpreter and immediately queue the 'recover and reset' block, the 'recover and reset' block will run and then the last behavior (ex 'pick up tip') that was interrupted will complete (ex 'home the pipette above where the tips were picked up').

All this pointing to the issue that whatever is handling robot actions on the back end is vulnerable to race conditions.

@mcous
Copy link
Contributor

mcous commented Oct 31, 2022

The existing protocol execution system for Python protocols is pretty fundamentally susceptible to race conditions if you're trying to reach in and cancel a protocol run from the same place you are triggering protocol commands. In other words, SSH and Jupyter.

We've been on a multi-year journey of rearchitecting this system, and we're making progress! JSON protocols have been moved to the new system, but Python protocols are still a ways out.

The Opentrons App, however, does not suffer these problems, because it communicates with the robot over an HTTP API. It uses this API to upload a protocol file and kick off a run. Starting, pausing, and stopping the run can all be accomplished with subsequent HTTP requests, and since they come in externally, they can safely and gracefully shut down the run.

If you are able to use HTTP instead of Jupyter or SSH, I think you might have a better experience. Is this something that your workflow would be amenable to? For example, you could write a Python script that runs from your own computer to upload the protocol, run it, and wait for it to complete. In that script, you could wire a KeyboardInterrupt to send an HTTP stop request.

I can add more details to this thread if you're interested!

@hibazou
Copy link

hibazou commented Apr 24, 2024

Hi ! Please can you add more details to this thread ?
I Have already a Python Script to upload my protocol and run it. Can you give me guidelines to wire a KeyboardInterrupt to send an HTTP stop/pause/continue request ?

@caroline-wyzer
Copy link

Hi! The way I ended up solving this for my purposes was not using keyboard interrupt; it was simpler for us to simply kill the process on the robot and then home the robot. Here's the code we run:

    try:

        # change working directory to jupyter notebooks
        os.chdir("/var/lib/jupyter/notebooks")

        # find all currently running processes and make them into a nice list
        current_processes = subprocess.check_output(["ps", "aux"])
        current_processes = current_processes.decode("UTF-8")
        current_processes = current_processes.split()

        # pull a list of all files on jupyter notebook
        possible_files = []
        for root, dirs, files in os.walk("."):
            for filename in files:
                possible_files.append(filename)
        python_files = []

        # find all files with a python extension. Want to make sure we only kill python files.
        for file in possible_files:
            extension = file[len(file) - 2:]
            if (extension == "py") and (file != "cancel.py"):
                python_files.append(file)

        # match currently running python files to pids
        for i, item in enumerate(current_processes):
            if item in python_files:
                PID_to_kill = current_processes[i - 3]

        # actually kills
        try:
            os.system(f"kill -9 {PID_to_kill}")
        except NameError:
            pass

        # connects to robot for homing
        protocol = opentrons.execute.get_protocol_api(opentrons.protocol_api.MAX_SUPPORTED_VERSION)

        protocol.set_rail_lights(False)

        try:
            for pip in protocol.loaded_instruments.values():
                pip.drop_tip()
        except Exception as e:
            log_output(message=traceback.format_exc())

        protocol.home()
        os.system("~.")

    except Exception as e:
        log_output(message=traceback.format_exc())

        # connects to robot
        protocol = opentrons.execute.get_protocol_api(opentrons.protocol_api.MAX_SUPPORTED_VERSION)

        protocol.home()

        protocol.set_rail_lights(False)

        os.system("~.")

When we want to cancel something, we send the above code to the robot and execute it by running this code on the computer we have connected to the OT2:

    def cancel(self) -> None:
        """
        Cancel the ongoing protocol execution.

        Returns:
            None
        """
        # Construct the SSH command
        script = "cancel.py"
        file = f"/var/lib/jupyter/notebooks/execute_program.sh {script}"
        command = f"sh -l -c '{file}'"

        # Transfer python file
        py_file = fr"{self._tab.path}\Scripts\Back_End\OT2_Control\OT2_Programs\{script}"
        py_transfer = fr"scp -i {self._tab.run.key} {py_file} root@{self._tab.run.ip}:/var/lib/jupyter/notebooks"
        os.system(py_transfer)

        cancel_run = Thread(target=self._call_cancel,
                            args=(command,))
        cancel_run.start()

    def _call_cancel(self, command: str) -> None:
        """
        Execute a cancel command via SSH to interrupt protocol execution.

        Parameters:
            command (str): The cancel command.

        Returns:
            None
        """
        self._conn = subprocess.Popen(
            [
                "ssh",
                "-t",
                "-i",
                self._tab.run.key,
                f"root@{str(self._tab.run.ip)}",
                command
            ],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            stdin=subprocess.PIPE,
            bufsize=1,
            universal_newlines=True,
            shell=False
        )

This slows down our cancel by about 10 seconds. If you store the cancel script on the robot itself and execute the cancel directly through the command line, it's pretty much instantaneous.
Hopefully this helps!

@hibazou
Copy link

hibazou commented May 2, 2024

Hi! The way I ended up solving this for my purposes was not using keyboard interrupt; it was simpler for us to simply kill the process on the robot and then home the robot. Here's the code we run:

Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
software-investigate Our Software team needs to look into this so we can understand more about it.
Projects
None yet
Development

No branches or pull requests

5 participants