Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev server v2 kills arc sandbox ungracefully, not giving sandbox plugins a chance to clean up #6871

Open
1 task done
lpsinger opened this issue Jul 18, 2023 · 8 comments · May be fixed by #8230
Open
1 task done

Dev server v2 kills arc sandbox ungracefully, not giving sandbox plugins a chance to clean up #6871

lpsinger opened this issue Jul 18, 2023 · 8 comments · May be fixed by #8230

Comments

@lpsinger
Copy link
Contributor

What version of Remix are you using?

1.18.1

Are all your remix dependencies & dev-dependencies using the same version?

  • Yes

Steps to Reproduce

When you are using dev_v2 mode in an Architect app, when you interrupt npm run dev with ^C, it kills all subprocesses with extreme prejudice and does not permit Architect plugins to clean up.

To reproduce:

  1. Start a basic Architect project using create-remix.
  2. Add a toy sandbox plugin that has a start method (which would normally start a sidecar process --- such as an OpenSearch server) and a stop method (which would normally stop the sidecar process and delete any temporary directories). For example:

plugin-sandbox.js

export default {
  sandbox: {
    start() {
      console.log("starting example sandbox plugin");
    },
    end() {
      console.log("stopping example sandbox plugin");
    },
  },
};
  1. Install this plugin in app.arc.

app.arc

...
@plugins
plugin-remix
  src plugin-remix.js
plugin-example
  src plugin-example.js
...
  1. Fire up the dev server using npm run dev. Wait until it's up and running.
  2. Stop the dev server with a keyboard interrupt (^C on Linux and macOS).

Expected Behavior

The dev server should come down cleanly and you should see both of the log messages from the Architect sandbox plugin, both starting example sandbox plugin and stopping example sandbox plugin. Here's the good output with dev_v2: false set. For a reproducer, see https://github.com/lpsinger/remix-bug-report-arc-sandbox-killed-ungracefully/tree/dev_v1.

$ npm run dev

> dev
> npm-run-all build --parallel "dev:*"


> build
> remix build

 info  building... (NODE_ENV=production)
 info  built (315ms)

> dev:remix
> remix watch


> dev:arc
> cross-env NODE_ENV=development arc sandbox

Watching Remix app in development mode...
💿 Building...
         App ⌁ remix-architect-app
      Region ⌁ us-west-2
     Profile ⌁ @aws profile / AWS_PROFILE not configured
     Version ⌁ Architect 10.13.1
         cwd ⌁ /private/tmp/my-remix-app

✓ Sandbox @http (HTTP API mode / Lambda proxy v2.0 format / live reload) routes
    any /* ................................ server

    http:https://localhost:3333

✓ Sandbox Started in 9ms
❤︎ Local environment ready!

⚬ Sandbox Running 1 Sandbox startup plugin
starting example sandbox plugin
✓ Sandbox Ran Sandbox startup plugin in 7ms
✓ Sandbox File watcher now looking for project changes
💿 Rebuilt in 441ms
[REMIX DEV] D7AD53A4 ready
^C
⚬ Sandbox Running 1 Sandbox shutdown plugin
stopping example sandbox plugin
✓ Sandbox Ran Sandbox shutdown plugin in 0ms
^C

Actual Behavior

Here's the bad output. You see starting example sandbox plugin, but you never see stopping example sandbox plugin. For a reproducer, see https://github.com/lpsinger/remix-bug-report-arc-sandbox-killed-ungracefully/tree/dev_v2.

$ npm run dev
> dev
> remix dev --no-restart -c "arc sandbox -e testing"

 💿  remix dev

 info  building...
 info  built (536ms)
         App ⌁ remix-architect-app
      Region ⌁ us-west-2
     Profile ⌁ @aws profile / AWS_PROFILE not configured
     Version ⌁ Architect 10.13.1
         cwd ⌁ /private/tmp/my-remix-app

✓ Sandbox @http (HTTP API mode / Lambda proxy v2.0 format / live reload) routes
    any /* ................................ server

    http:https://localhost:3333

✓ Sandbox Started in 9ms
❤︎ Local environment ready!

⚬ Sandbox Running 1 Sandbox startup plugin
starting example sandbox plugin
✓ Sandbox Ran Sandbox startup plugin in 4ms
✓ Sandbox File watcher now looking for project changes
^C
@pcattori
Copy link
Contributor

I tried to remove all Remix code for killing the app server process, but my system was still killing the app server when remix dev was CTRL-C'd. I tried removing any process.on calls for SIGINT and SIGTERM, clearing out the body of any exitHooks, and setting cleaup: false for execa.

The only way the app server process stayed alive is when I set detached: true for the subprocess, but detached subprocesses are another can of worms and their behavior differs across OSs. Might revisit this later, but if anyone has more insight into what can be done here, would be appreciated.

@lpsinger
Copy link
Contributor Author

lpsinger commented Dec 6, 2023

@pcattori, I think that part of the problem is that arc sandbox itself doesn't handle signals. I opened the PR architect/sandbox#729 to try to address that. Would you please see if your changes on the Remix side work when combined with that PR?

lpsinger added a commit to lpsinger/remix that referenced this issue Dec 6, 2023
lpsinger added a commit to lpsinger/remix that referenced this issue Dec 6, 2023
@lpsinger lpsinger linked a pull request Dec 6, 2023 that will close this issue
2 tasks
lpsinger added a commit to lpsinger/remix that referenced this issue Dec 6, 2023
lpsinger added a commit to lpsinger/remix that referenced this issue Dec 6, 2023
@Courey
Copy link
Contributor

Courey commented Jan 23, 2024

I did some testing on this issue as I could see it as well. I was able to change my docker to use an entrypoint. In my entrypoint I created a signal trap. I was able to trap all signals. This told me that sending signals from outside of docker into docker was working.

I then ensured that I was using exec in the entrypoint so that the second process after docker-init would be npm run dev. At this point it stopped responding to SIGINT and SIGTERM signals. SIGHUP and SIGKILL continued to function. This told me that the issue was possibly with the npm run process itself. I used strace to trace the npm run dev process and I never registered a sigint or sigterm, it was always just instant death.

I went digging and I have found this issue and this PR that I believe may be the cause (or a partial cause) of this issue.

I have updated my npm version to be the most recent version (10.3.0) and I was able to see it close the process in under a second when I use control + c, so I believe this should fix it.

@Courey
Copy link
Contributor

Courey commented Jan 23, 2024

I tried to remove all Remix code for killing the app server process, but my system was still killing the app server when remix dev was CTRL-C'd. I tried removing any process.on calls for SIGINT and SIGTERM, clearing out the body of any exitHooks, and setting cleaup: false for execa.

The only way the app server process stayed alive is when I set detached: true for the subprocess, but detached subprocesses are another can of worms and their behavior differs across OSs. Might revisit this later, but if anyone has more insight into what can be done here, would be appreciated.

@pcattori could you please ensure your npm version is 10.3.0 and try your experiment again? I am hopeful this is your issue.

lpsinger added a commit to lpsinger/remix that referenced this issue Jan 23, 2024
@Courey
Copy link
Contributor

Courey commented Jan 23, 2024

I spoke to @lpsinger and this did not impact the issue he was having. I want to be more specific about what I was seeing and what was fixed with the npm version change.

In the buggy version, you send a sigint/term to docker (depending on what you have configured and if you use compose). NPM was catching these but doing nothing with them. So as you tried to close the npm run process, it would hang for 10 seconds till it hit the Docker timeout and docker sends a sigkill. So I was unable to get to the remix layer at all because the signal was being trapped before that.

All the npm version change does is make it so that npm isn't just trapping and keeping the signal. So now the signals get passed along.

It does not however show any additional logging or proof of graceful shutdown in remix.

@Courey
Copy link
Contributor

Courey commented Feb 1, 2024

I can reliably "gracefully" terminate the child process now, but I have done terrible things to get there. I am hoping though that this at least helps point in the right direction and maybe one of you will advise further.

in in the devServer_unstable/index file, if I replace exitHook__default["default"](cleanup); with:

  state.appServer.on("exit", ()=>{
    process.exit(0)
  })

  process.on("SIGINT", ()=>{
    cleanup()
  })

and then in to the sandbox/src/cli/index file in architect's sandbox I ALSO add this:

    process.on("exit", (signal)=>{
      sandbox.end()
    })

It will work as expected, but not how I would have thought it would. I added an exit event for the child process so that basically when it exits I then can exit the main process. I did this because handling asynchronous shutdown actions was a beast and this way I'm listening on the main process for the child process to reach exit status which then triggers the exit of the main process. Like I said, super duper ugly, but maybe illuminating?

a maybe odd thing:
If I run the process and then I take the PID for the arc process and kill -2 it, everything terminates as expected within arc and the remix process carries on as expected. I put logs in where sdtin/stdout are piped and so I could see what was passed through. When I kill -2 the process, something was passed through STDOUT 3 times and arc received the exit event twice. I would have expected to only get an exit signal once and I would not have expected anything to come through stdout much less 3 times. I'm not sure what it was as it didn't print so I am assuming it's signal. It could be normal but it seemed a bit odd to me. When I control + c it's just the signal and nothing goes through STDOUT (which is expected).

@pcattori
Copy link
Contributor

pcattori commented Feb 2, 2024

@Courey Thanks for diving deep on this! 🙏

We've been heads down on getting Vite stuff stabilized. We're going to need to clean up some of the CLI code soon anyway, so will try to include some/all of this there. In the meantime, would love any arc experts to help me figure out how best to interop with Vite dev server since that's what we're moving to anyway. I think we can use sandbox to spin up a Vite dev server in middleware mode, and then delegate asset requets and route requests to it within the arc request handler. Conceptually, that all checks out to me, but haven't tried getting it running for real. One piece to figure out is that the Vite middleware is Express-style middleware. So there's a chance we may need to adapt the web fetch standard Request/Response that both arc and Remix prefer to Node-style (i.e. Express-style) IncomingRequest, ServerResponse.

@Courey
Copy link
Contributor

Courey commented Mar 13, 2024

I created this issue with Arc because I was able to isolate the process and send it signals manually and did not get the expected response from arc sandbox, so I thought they might be able to tell me more about that and that might point me in the right direction. That was a red herring. They believe the issue lies with execa and how it handles children of children. They seem extremely helpful and friendly, and I know in your comment you said:

In the meantime, would love any arc experts to help me figure out how best to interop with Vite dev server since that's what we're moving to anyway.

I was hoping to introduce y'all to each other because it sounds like it could be a useful pairing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants