Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(robot-server, app): add a new endpoint for fast-fetching all run commands #15031

Merged
merged 14 commits into from
May 1, 2024

Conversation

sanni-t
Copy link
Member

@sanni-t sanni-t commented Apr 26, 2024

Overview

Robot server changes:

Adds a new endpoint- GET /runs/:run_id/commandsAsPreSerializedList to the run commands router. This endpoint returns a list of pre-serialized commands (that are valid json objects) of a finished run.

This endpoint is a much faster alternative to the GET /runs/:run_id/commands endpoint when fetching all commands of a completed run.

Also adds notification publishing when pre-serialized commands become available for a run.

App changes

closes RQA-2645 and RQA-2647

Test Plan

  • Run a large protocol (with ~10k steps) on a Flex and an OT-2
  • While the run is active, verify that hitting the new endpoint returns a 503 service unavailable response
  • Once the run is finished (whether after running successfully/ after failing/ after being canceled), verify that the new endpoint returns a list of stringified commands with total length of list equal to number of commands in the protocol.
  • Un-current the run and again verify the above response
  • Both the above two requests should take only a couple of seconds to get a response (as opposed to 30+ seconds)
  • Test that a notification gets sent when a run ends so that pre-serialized commands can be fetched

Changelog

  • added the new endpoint to runs/commands_router

  • added run commands fetching to run_store

  • added unit tests and integration tests

  • added a runs publish topic for pre-serialized commands

  • added notification publishing for pre-serialized commands availability after a run has been committed to the database. This happens in two places:

    • in RunController, after a run has finished running (either successfully ended, failed with error or stopped by the client)
    • in RunDataManager after a run has been un-currented.
  • added an api client function for getting from new commandsAsPreSerialized endpoint, along with react query and notifications wrapper hooks

  • implement the above in RunPreview component for terminal runs

Review requests

  • check the SQL request for correctness/ efficiency
  • test on a robot
  • test on a robot for endpoint behavior during transition states- when run data is being written to or modified/erased from the database
  • I have added the current command publisher to the 'unsubscribe' list once a run is cleaned up. Does this have any unexpected effect?

Risk assessment

New endpoint so impact on existing code is close to none.

@@ -428,7 +428,7 @@ def get_commands_slice(
)
.order_by(run_command_table.c.index_in_run)
)

# select_all_cmds_as_doc = sqlalchemy.select(sqlalchemy.func.json_array_elements)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. Left over by mistake. Removing

Comment on lines 67 to 69
const parsedCommandsFromQuery = commandsFromQuery.map(command =>
JSON.parse(command)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would actually have the react query do this for us, since anyone using this hook will probably want the data to be deserialized

@@ -67,7 +77,7 @@ export const RunPreviewComponent = (
] = React.useState<boolean>(true)
if (robotSideAnalysis == null) return null
const commands =
(isRunTerminal ? commandsFromQuery : robotSideAnalysis.commands) ?? []
(isRunTerminal ? parsedCommandsFromQuery : robotSideAnalysis.commands) ?? []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked over slack but falling back to analysis commands if the run commands are empty is probably better

enabled: isRunTerminal,
}).data?.data
const commandsFromQuery =
useAllCommandsAsPreSerializedList(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just talked to @mjhuff and he thinks it would be easy to have the pre serialized endpoint be hooked up MQTT. we know when we're putting stuff in the DB, so at that point the server can push an update to the client @sanni-t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Copy link
Member

@shlokamin shlokamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frontend code looks good! i tested on a bot and this seems to work great — nice job y'all!

Comment on lines +63 to +64
staleTime: Infinity,
cacheTime: Infinity,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we don't need staleTime and cacheTime anymore now that we're using the notification service right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave those there in the event that we can't establish MQTT connection and need to fallback to HTTP with those options passed through?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes you are totally right!

@sanni-t sanni-t changed the title feat(robot-server): add a new endpoint for fast-fetching all run commands feat(robot-server, app): add a new endpoint for fast-fetching all run commands Apr 30, 2024
@sanni-t sanni-t marked this pull request as ready for review April 30, 2024 22:27
@sanni-t sanni-t requested review from a team as code owners April 30, 2024 22:27
@sanni-t sanni-t requested review from mjhuff and jbleon95 and removed request for a team April 30, 2024 22:27
Copy link
Contributor

@mjhuff mjhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on wiring up the notifications!

@mjhuff
Copy link
Contributor

mjhuff commented May 1, 2024

@ncdiehl11 This is my bad for not telling you earlier (and it's also not a blocker) - on L43 of eslintrc.js, we actually have a lint rule to enforce using notification wrappers when they exist.

We should add the HTTP polling equivalent to the list to prevent devs from forgetting and using it by mistake.

@@ -351,6 +365,53 @@ async def get_run_commands(
)


@PydanticResponse.wrap_route(
commands_router.get,
path="/runs/{runId}/commandsAsPreSerializedList",
Copy link
Contributor

@SyntaxColoring SyntaxColoring May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do this that doesn't require:

  • A separate endpoint path?
  • New 503 semantics for runs that haven't completed?

i.e. could we make this something like GET /runs/{id}/commands?validate=false instead?

I know we're repeating how we did things with GET /protocols/analyses/{id}/asDocument, but I think it was a mistake for that endpoint to work like that, in hindsight.

Unifying them would have integration testing benefits. Imagine just parametrizing every Tavern test with ?validate=true/?validate=false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do this that doesn't require:
A separate endpoint path?

I think the separate endpoint path is the right thing to do here because the shape/type of the response is different from the regular ../commands endpoint

New 503 semantics for runs that haven't completed?

Talked about this briefly on the call and posting here for posterity- Not having a 503 for runs can be two things-

  • returning an empty list in the response: this is misleading and can cause confusion as to whether this run just didn't have any commands or whether the commands just haven't been serialized yet. We agreed that this is not a good option
  • serializing commands from memory (since it's a current run) and returning them as response: this is a good option but this too takes a non-ideal amount of processing time to respond with. I will test for exactly how much time and respond here. If it's faster/ takes similar time as fetching pre-serialized commands, then I'll replace the 503 with it. If it takes more than twice the amount it would to fetch pre-serialized commands, then I think it would be more disruptive to the client flow (and user experience) if the client hits the endpoint at the wrong time and ends up having to wait a long time and blocking rest of the flow.

Copy link
Contributor

@SyntaxColoring SyntaxColoring May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the separate endpoint path is the right thing to do here because the shape/type of the response is different from the regular ../commands endpoint

Definitely, if the response shape is different. If possible, I'm saying the response shape should not be different. (Discussed in the other thread: #15031 (comment))

If it takes more than twice the amount it would to fetch pre-serialized commands, then I think it would be more disruptive to the client flow (and user experience) if the client hits the endpoint at the wrong time and ends up having to wait a long time and blocking rest of the flow.

I'm not totally following this. How does returning 503 avoid the client having to wait a long time and avoid blocking the rest of the flow? Either way, the client can only move on after the server has spent time serializing everything.

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM given that it's documented as experimental, but see question above. Thank you!

@ncdiehl11
Copy link
Collaborator

@ncdiehl11 This is my bad for not telling you earlier (and it's also not a blocker) - on L43 of eslintrc.js, we actually have a lint rule to enforce using notification wrappers when they exist.

We should add the HTTP polling equivalent to the list to prevent devs from forgetting and using it by mistake.

[spoke in person] holding off on this in this PR as we are targeting release branch instead of edge where those eslint file rules were added @mjhuff

Comment on lines +210 to +213
- !anystr
- !anystr
- !anystr
- !anystr
Copy link
Contributor

@SyntaxColoring SyntaxColoring May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm just noticing this. Are we returning something like this?

{
    "data": [
        "{\"commandType\": \"home\", ...}"  // JSON list of strings that are parseable as JSON
    ]
}

If so, can we concatenate the command strings on the server side so we always emit straight JSON?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your example is what the response looks like.
Are you suggesting changing to something like this?-

{
    "data": 
        ' "{\"commandType\": \"home\", ...}", "{\"commandType\": \"aspirate\", ...}", "{...}" ' 
}

..so that data is a single string of concatenated individual command strings that are parseable as JSON?

Copy link
Contributor

@SyntaxColoring SyntaxColoring May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, I was unclear. I'm suggesting changing to something like this:

{
    "data": [
        {
            "commandType": "home",
            // ...
        },
        {
            "commandType": "loadLabware",
            // ..
        },
        // ...
    ]
}

So that it matches the normal GET /runs/{id}/commands.


[Edit] Sorry, I'm still being unclear. I mean we could do something like this:

response_body = '{"data":[' + ','.join(serialized_commands) + ']}'

If concatenating JSON fragments like that feels too icky, I'm sure there's a way to write a custom JSON encoder to do it. [Edit: Nope, standard library JSON encoders can't do this.]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested this and it's 5x more expensive than just returning the serialized commands list. It's still quite faster than doing the full conversion to pydantic response models so we'll keep that as an option. I've added a todo comment to address this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: we talked about this in Slack, and the "5x more expensive" result was with code roughly like:

commands_list = [json.loads(c) for c in command_jsons_from_sql]

Rather than with ','.join().

Comment on lines 75 to 80
engine_state_summary = self._run_hooks.get_state_summary(self._run_hooks.run_id)
if (
isinstance(engine_state_summary, StateSummary)
and engine_state_summary.completedAt is not None
):
await self.publish_pre_serialized_commands_notification(run_id=run_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in a call, but I didn't explain it well, so I'll try to be more coherent here...

You explained that this notifies the client right away if the commands are already available. That totally makes sense, because there won't be any more notifications later.

However, I think we should go further and send this first notification unconditionally, even when the commands are not available yet. Because I think that's how these topics should work in general.

I view it like this: This notification topic is not meant to say when the commands are ready. It's merely meant to say when to hit the HTTP endpoint, and the HTTP endpoint tells you whether the commands are ready.

Under the scheme I'm describing, an initial unconditional notification would basically confirm to the client that the subscription is active. This is needed in the general case because it solves a missed-update problem (see option 1); every topic should do it or something equivalent.

Maybe you're thinking of the conditional that you have right now as an optimization. Like, the server knows that the client won't like the GET response that is has to offer right now, so why bother inducing a GET request?

But I think we want to avoid the notifications system becoming a bunch of bespoke mini-APIs each with its own rules. Instead, we want it to be dumb and generic and consistent so we can stamp it out on all of our endpoints without thinking about it, share infrastructural code, integration-test it in generic form, and so on. If that comes at a cost of occasionally prompting the client to send out a superfluous GET, I think it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, though this doesn't have to change for this PR.

Copy link
Member Author

@sanni-t sanni-t May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think those are valid points. I can make the notification unconditional upon subscription.
But I think it might need some app side changes too, unless handling a 503 response is already implemented. Thoughts @ncdiehl11 @shlokamin ?

Copy link
Contributor

@SyntaxColoring SyntaxColoring May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely feel free to leave it out if it needs nontrivial changes or entails testing everything again. I'd be happy with a # todo comment.

Copy link
Contributor

@mjhuff mjhuff May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @SyntaxColoring 's point, currently, the client always performs an implicit endpoint hit whenever it subscribes to a topic to solve for the missed-update problem. This behavior is unsurprising from a client perspective, because this is what happens in a non-notification, vanilla HTTP hook (more importantly, see the aside below). In the current iteration of notifications, the server is responsible only for publishing updates for a given topic.

After discussion with @shlokamin the intent of this fast-fetch behavior, it doesn't sound like we need this need this bit. The runs_publisher is always initialized at the start of a run, and the resources this endpoint/topic manage do not change until the end of a run (if I understand correctly?).

The reason initialize() contains a publish for /runs/:runId is that an update to the underlying resource occurs at this point in time. This accounts for the following scenario:

  • A client subscribes to a non-yet existing run.
  • Simultaneously, the client performs an initial GET and the resources errors out/whatever it currently does.
  • The run is created. At this point, the /runs/:runId resource is updated, so we need to publish the update to prevent the client from holding stale data.

TLDR: If the fast-fetch resource can't actually update when a run is created, we don't need this bit. The client will GET it whenever it subscribes, and the endpoint should respond with the current state of the resource. Only publish when the resource updates.

ASIDE: The concept of confirming an active subscription does not play well with the MQTT broker-client model, as the robot-server (a client of the broker) has no ability to know what clients are subscribed to a given topic. The pub-sub model is meant to account for present and future states of a resource, not past states. The responsibility to solve the stale data problem must fall on the client in MQTT's pub-sub model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shlokamin is there precedent here for checking the response code? Currently, I am null checking the response data object

Copy link
Contributor

@mjhuff mjhuff May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncdiehl11 Yes, look at useCurrentMaintenanceRun for an example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional context @mjhuff!
I will remove these lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanni-t and thank you for bearing with me! 😄

@sanni-t sanni-t force-pushed the robot-server-add_new_fast_fetching_run_cmds_endpoint branch from 8fba742 to eb5239f Compare May 1, 2024 16:33
@sanni-t sanni-t changed the base branch from [email protected] to edge May 1, 2024 16:34
Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline - what about returning jsonlines instead of encapsulated json?

async def get_run_commands_as_pre_serialized_list(
runId: str,
run_data_manager: RunDataManager = Depends(get_run_data_manager),
) -> PydanticResponse[SimpleMultiBody[str]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the fundamental idea with this endpoint is to avoid server work by not doing json serialization, what if this endpoint's response was in the form of jsonlines, json object records (aka the json-formatted Command instances we get from the database) that are line-separated? I think this is a good idea for two reasons:

  1. It requires less server work. With a SimpleMultiBody like this, we still have to do the work of built in json module encoding all those commands into a json array - things like escaping all the internal quotes
  2. It generalizes better (IMO) to queries for running runs, since it's not totally contained and therefore doesn't really have "this is a complete view" semantics

If we're concerned about breaking similarity with the other elements of our api by not being json, one thing we can do is let the client specify Accept for either application/json (MultiBody response) or application/jsonlines

Copy link
Contributor

@SyntaxColoring SyntaxColoring May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get advantage 1 without giving up our usual response format, right? JSON lines would look basically this:

response_body = "\n".join(serialized_commands)

But this would be equally fast:

response_body = '{"data":[' + ','.join(serialized_commands) + ']}'

And I'm sure there's a way to write a custom JSON encoder to do it if concatenating JSON fragments like that feels too icky. [Edit: Nope, standard library JSON encoders can't do this.]

Could you clarify your point 2? I'm not sure what you mean by "contained" or how that affects generalization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. I've added a todo comment to investigate this option further.

@sanni-t sanni-t merged commit f44872b into edge May 1, 2024
17 checks passed
@sanni-t sanni-t deleted the robot-server-add_new_fast_fetching_run_cmds_endpoint branch May 2, 2024 14:12
Carlos-fernandez pushed a commit that referenced this pull request May 20, 2024
… commands (#15031)

# Overview

**Robot server changes:**

Adds a new endpoint- `GET /runs/:run_id/commandsAsPreSerializedList` to
the run commands router. This endpoint returns a list of pre-serialized
commands (that are valid json objects) of a finished run.

This endpoint is a much faster alternative to the `GET
/runs/:run_id/commands` endpoint when fetching all commands of a
completed run.

Also adds notification publishing when pre-serialized commands become
available for a run.

**App changes**

closes RQA-2645 and RQA-2647

# Risk assessment

Back-end: New endpoint so impact on existing code is close to none.
App: Medium. Fixes issues in existing behavior of handling historical runs.

---------

Co-authored-by: ncdiehl11 <[email protected]>
Co-authored-by: ncdiehl11 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants