Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote change-stream causes zombie 'after save' handlers #4317

Closed
Wizarth opened this issue Mar 10, 2020 · 5 comments
Closed

Remote change-stream causes zombie 'after save' handlers #4317

Wizarth opened this issue Mar 10, 2020 · 5 comments
Assignees

Comments

@Wizarth
Copy link

Wizarth commented Mar 10, 2020

Steps to reproduce

Using the remote, connecting to change-stream creates an observer on the model that is never removed.

With this sandbox:
In separate shells

NODE_ENV=backend node server/server.js
NODE_ENV=frontend node server/server.js
node test.js

Current Behavior

backend is on localhost:3000
frontend is on localhost:3010

backend is using in-memory db for Test model.
frontend is using loopback-connector-remote, connecting to backend, for Test model.

server.js has a once per second check for the # of _observers['after save'] on the Test model.

test.js runs two tests. The first connects 100 times to http:https://localhost:3000/api/Tests/change-stream (using request). After ten seconds, these requests are aborted.

During this time, the backend loopback instance reports 100 after save observers. When the requests are disconnected, the number of observers drops to zero.

In the second test, 100 connections to http:https://localhost:3010/api/Tests/change-stream are created (the frontend). After ten seconds, these requests are aborted.

During this time, the backend loopback instance reports 100 after save observers. When the requests are disconnected, the number of observers does not drop.

Terminating the frontend instance does not cause the number of observers to drop.

Expected Behavior

Disconnecting the HTTP request should remove the observer.

Link to reproduction sandbox

https://github.com/Wizarth/loopback-sandbox/tree/change-stream_Observer

Additional information

Node version:

  • win32 x64 10.16.0
  • linux x64 8.14.0

Sandbox:
[email protected] C:...\loopback-sandbox
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]

We are using a setup where front-end Loopback servers are communicating to a single back-end Loopback server. The back-end server is primarily to centralize event handling, while the front end is doing per-session logic as much as possible. The front-end is connected to the backend using loopback-connector-remote (which uses strong-remoting).

The structure is NGINX -> front-end Loopback <> back end Loopback <> Various internal services.

We've set up a web app that uses the JS EventStream class to connect to the persisted models change-stream endpoint. What we're finding is that over time, the CPU usage of the back end Loopback increases every time the persisted model is updated. This CPU usage increases up to seeing multiple seconds of 100% processor usage by this process, which has all sorts of knock on bad effects.

By adding logging to loopback-datasource-juggler / lib/observer.js , we're seeing that once every two minutes, on the back-end server, a new observe 'after save' handler is attached, without any corresponding call to removeObserver . (change-stream sends an end event after two minutes of inactivity, which our client responds to by re-connecting.)

We've also found that connecting NGINX to the back end Loopback for the specific change-stream API doesn't show this behavior.

The provided sandbox fork is an artificial test case of connecting 100 clients at once, to simulate the connections created over time.

@Wizarth Wizarth added the bug label Mar 10, 2020
@bajtos
Copy link
Member

bajtos commented Mar 10, 2020

Thank you @Wizarth for a detailed bug report.

It makes me wonder, is it possible to recreate your observed behavior in an automated LoopBack test? We already have tests in place to verify that "after save" hooks are removed when the change-stream is closed, see test/change-stream.test.js. These tests are calling createChangeStream directly from JavaScript, perhaps we need to add a test invoking the method via REST API?

Maybe the problem is specific to the way how Node.js, strong-remoting and/or loopback-connect-remote handles keep-alive connections?

I quickly checked our source code & git history, I that some part of this problem have been fixed by #3474 in July 2017 - I guess that explains why the problem does not manifest when you connect to your LoopBack backend via NGINX.

@bajtos bajtos added the major label Mar 10, 2020
@Wizarth
Copy link
Author

Wizarth commented Mar 11, 2020

I was reading the source of strong-remoting, and found that json:true is set for the request.
I updated my test to set json:true as well, and found that both cases (direct connection and forwarding via another loopback instance) cause the zombie handlers. That narrows things down.

Edit: I haven't been able to narrow it down further, but I suspect the fault lies in strong-remoting/lib/http-context.js use of mux-demux.

@Wizarth
Copy link
Author

Wizarth commented Mar 23, 2020

I've narrowed it down to a unit test on strong-remoting, and have submitted an issue there.

@Wizarth
Copy link
Author

Wizarth commented Apr 8, 2020

Fixing strongloop/strong-remoting#482 solved the problem where directly connecting to the backend with json:true would leak handlers.

There remains an issue when forwarding - the handlers survive until the 2 minute change-stream timeout occurs (or some other timeout?), and are then cleaned up.

I think this is an issue in strong-remoting http-invocation - again needing to close the mux-demux instance. Since this does clean up after time, it's a lower priority for me to work on and I don't know if/when I'll be able to address it.

@bajtos
Copy link
Member

bajtos commented May 11, 2020

Thank you @Wizarth for confirming which part of the problem is fixed now 👍

There remains an issue when forwarding - the handlers survive until the 2 minute change-stream timeout occurs (or some other timeout?), and are then cleaned up.

I think this is an issue in strong-remoting http-invocation - again needing to close the mux-demux instance. Since this does clean up after time, it's a lower priority for me to work on and I don't know if/when I'll be able to address it.

Let's close this GiHub issue as resolved then. If you ever get time to look into the remaining problem yourself, then feel free to go ahead and open a pull request to fix it, no need to open a new issue first.

@bajtos bajtos closed this as completed May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants