Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(app-shell,usb-bridge): improve flex usb communication #14170

Merged
merged 7 commits into from
Dec 12, 2023

Conversation

sfoster1
Copy link
Member

@sfoster1 sfoster1 commented Dec 12, 2023

Closes RSS-420, RQA-2053, RQA-1615

This branch has fixes for a couple of the underlying issues with flex usb communication.

First, we need to limit the flex->host directional bandwidth (which we do by limiting the size of the USB packets we send on the python side) because if we send too much we will blow through the host-side serial buffers in their drivers, which are very variable. When we do that, we lose data which is part of an HTTP response; and the node http parser can't really handle that and just hangs. Even if it didn't and found the error, we need that error not to happen.

Second, we need to fix a couple socket lifetime issues on the node side. These were basically

  1. By hook or by crook, close sockets less
  2. When you close sockets, do it correctly so you can reopen them

(1) has some gross results because the code that uses sockets is really gross and we have not gone high when they went low, but it's a minimal changeset.
(2) is a real pain because the control flow bounces around 3 different events on like 3 different emitters and you need to trace it through; what it boils down to is noting that both the readable and writable streams AND the serialport have a "close" event they fire, and that only means the serial port is actually closed if it was the serial port that did it.

This doesn't get us all the way there almost certainly; we do occasionally still drop parts of packets on heavy usage like fetching logs sometimes, and therefore probably need to introduce more timeouts. We also haven't fully tested all the OS's and flows yet but we do get a whole lot farther with that testing than we previously did.

Testing more

At least the following:

  • Getting logs should succeed on all platforms
  • Updating robots should succeed on all platforms
  • Start a protocol and ideally run it on all platforms
  • Do something that the robot-server uses a redirect in on all platforms (attaching a pipette is a good example)
  • Other hot button flows but those above should give a broad test.

sfoster1 and others added 4 commits December 11, 2023 12:31
Desktop OS serial drivers often have very small incoming data buffers on
their serial connections because well serial is typically slow... unless
you've actually got usb on both sides.

The default value of 2048/.01s (204.8kB/s) should be further developed
but it's going to be a good idea to keep this smaller than we think
necessary to support slow client hardware.
There's a bunch of weird stuff in here that we found via too much
logging. May our loss be your gain.

Co-Authored-By: Jamey Huffnagle <[email protected]>
@sfoster1 sfoster1 requested review from a team as code owners December 12, 2023 00:10
@sfoster1 sfoster1 requested review from mjhuff and removed request for a team December 12, 2023 00:10
Copy link

codecov bot commented Dec 12, 2023

Codecov Report

Merging #14170 (da67885) into chore_release-7.1.0 (f7caea6) will increase coverage by 0.01%.
Report is 11 commits behind head on chore_release-7.1.0.
The diff coverage is 19.04%.

Additional details and impacted files

Impacted file tree graph

@@                   Coverage Diff                   @@
##           chore_release-7.1.0   #14170      +/-   ##
=======================================================
+ Coverage                70.45%   70.47%   +0.01%     
=======================================================
  Files                     2512     2512              
  Lines                    71217    71225       +8     
  Branches                  8971     8974       +3     
=======================================================
+ Hits                     50177    50194      +17     
+ Misses                   18844    18835       -9     
  Partials                  2196     2196              
Flag Coverage Δ
app 67.71% <0.00%> (+0.05%) ⬆️
usb-bridge 76.88% <66.66%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
app-shell/src/robot-update/index.ts 0.00% <ø> (ø)
usb-bridge/ot3usb/serial_thread.py 40.62% <66.66%> (+1.91%) ⬆️
app-shell/src/usb.ts 1.40% <0.00%> (ø)

... and 16 files with indirect coverage changes

Copy link
Contributor

@mjhuff mjhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with CI!

Added a few tickets to the PR and wrote a follow-up ticket.

Copy link
Contributor

@fsinapi fsinapi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow nice! ot3usb changes look good to me!

@SyntaxColoring
Copy link
Contributor

First, we need to limit the flex->host directional bandwidth (which we do by limiting the size of the USB packets we send on the python side) because if we send too much we will blow through the host-side serial buffers in their drivers, which are very variable.

Does the host not have a proper flow control mechanism to apply backpressure to the Flex? Some mechanism must exist at the USB protocol level, right? Are we accidentally subverting it because we're layering a virtual serial port (?) atop raw USB?

Solving this by metering bandwidth seems dicey. Like, it will depend on the load that the host device is under, right?

@sfoster1
Copy link
Member Author

Does the host not have a proper flow control mechanism to apply backpressure to the Flex? Some mechanism must exist at the USB protocol level, right? Are we accidentally subverting it because we're layering a virtual serial port (?) atop raw USB?

You know, that's a pretty good question, since flow control mechanisms do in fact exist for both actual rs-232 and usb cdc-acm, and we're just not using them. I think it's worth taking a pass at that.

Copy link
Member

@shlokamin shlokamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this @sfoster1 and @mjhuff!

Comment on lines -183 to -189
if (urls === null) {
log.warn('No release files in manifest', {
version: CURRENT_VERSION,
manifest,
})
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to cut down on noise? seems like a useful warning since we expect there to be urls in the release set

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the noise is just brutal

Comment on lines -44 to +43
keepAliveMsecs: 10000,
keepAliveMsecs: Infinity,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

data: response.data,
status: response.status,
statusText: response.statusText,
}
} catch (e) {
// eslint-disable-next-line @typescript-eslint/restrict-template-expressions
usbLog.debug(`usbListener error ${e?.message ?? 'unknown'}`)
console.log(`axios request error ${e?.message ?? 'unknown'}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this changed cuz reading logs from the usb logger was annoying? cuz it seems like thats the right place to report logs here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah how does one get logs out of the usb logger?

@sfoster1 sfoster1 merged commit 4442886 into chore_release-7.1.0 Dec 12, 2023
16 of 17 checks passed
@sfoster1 sfoster1 deleted the chore_usb-fix branch December 12, 2023 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants