Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js cannot receive signals like SIGTERM #550

Closed
WesCossick opened this issue Jul 16, 2020 · 15 comments
Closed

Node.js cannot receive signals like SIGTERM #550

WesCossick opened this issue Jul 16, 2020 · 15 comments

Comments

@WesCossick
Copy link

Currently, the Node.js distroless container runs the Node.js process as PID 1:

# ps
PID   USER     TIME  COMMAND
    1 root      0:00 /nodejs/bin/node
   27 root      0:00 sh
   33 root      0:00 ps

According to Node.js's best practices:

Node.js was not designed to run as PID 1 which leads to unexpected behaviour when running inside of Docker. For example, a Node.js process running as PID 1 will not respond to SIGINT (CTRL-C) and similar signals.

So basically, Node.js apps won't receive SIGTERM, SIGINT, etc. when running inside gcr.io/distroless/nodejs.

@WesCossick WesCossick changed the title Node.js images cannot receive signals like SIGTERM Node.js cannot receive signals like SIGTERM Jul 16, 2020
@chanseokoh
Copy link
Member

chanseokoh commented Jul 16, 2020

It's not just Node. And yes, in a lot of cases, you'd like to put some init system. Many people have suggested tini, but I do recall some other contenders which I don't remember exactly. But I think tini looks great with good reputation (although I've never used it). For example, take a look at this article.

@WesCossick
Copy link
Author

After a tremendous amount of research and testing, I've finally gotten to the bottom of what's going on. In case anyone else encounters a similar issue, I've documented what I've learned…

It turns out that Node.js running as PID 1 handles signals like SIGTERM and SIGINT just fine. In fact, Docker doesn't really recommend using Tini unless you need to:

To clear up confusion in the blogosphere, you don’t always need a “init” tool to sit between Docker and Node.js, and you should probably spend more time thinking about how your app stops gracefully.
...
For those that know about init options like docker run --init or using tini in your Dockerfile, they are good backup options when you can’t change your app code, but it’s a much better solution to write code to handle proper signal handling for graceful shutdowns.

The process.on('SIGTERM', ...); handler I was working with wasn't being called, and since I knew it had been called successfully in the past, I had mistakenly attributed the root cause to our switch from Alpine to distroless. Then I assumed that the cause was downgrading from Node.js v12 to v10… also not the case.

After some further testing, I discovered that as soon as any asynchronous code began executing after the SIGTERM signal was sent, the Node.js app would die. Since the process.on('SIGTERM', ...); handler I was working with was inside an async function, it was never being called. But, if I placed the same process.on('SIGTERM', ...); handler outside of the async function, it would execute all the code up to the first asynchronous call, and then die prematurely.

I finally tracked the actual problem down to Prisma, which prevents apps from gracefully shutting down. This is discussed here: prisma/prisma#2917. I was able to verify that Prisma is calling process.exit(0) inside its own signal handler, which is causing apps to exit prematurely if they need to run asynchronous cleanup code in a SIGTERM or SIGINT handler.

@chanseokoh
Copy link
Member

Thanks for the update! It surely will help others.

@jsravn
Copy link

jsravn commented Mar 4, 2021

It seems like an issue with the python distroless image at least. Any ideas?

docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'

All signals to the python process are ignored in this case.

@chanseokoh
Copy link
Member

All signals to the python process are ignored in this case.

It works in my testing.

$ docker pull gcr.io/distroless/python3
Using default tag: latest
latest: Pulling from distroless/python3
Digest: sha256:975006719a62860e116b88adeac9dc278d939ddbec5e62f74b2e19f28d8fd3a5
Status: Image is up to date for gcr.io/distroless/python3:latest
gcr.io/distroless/python3:latest
$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
... container (the Python process) pauses ...

In another terminal,

$ docker ps
CONTAINER ID   IMAGE                       COMMAND                  CREATED         STATUS         PORTS     NAMES
45fed20c6b84   gcr.io/distroless/python3   "/usr/bin/python3.5 …"   3 seconds ago   Up 2 seconds             quirky_jepsen
$ docker exec -it 45 python
Python 3.5.3 (default, Nov 18 2020, 21:09:16) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import signal
>>> print(open('/proc/1/cmdline', 'r').read())
/usr/bin/python3.5-cimport signal; signal.pause()
>>> os.kill(1, signal.SIGINT)
>>> $ 

Then in the original terminal,

$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyboardInterrupt
$

OTOH, in #550 (comment), the NodeJs doc clearly states

a Node.js process running as PID 1 will not respond to SIGINT (CTRL-C) and similar signals.

@jsravn
Copy link

jsravn commented Mar 4, 2021

It doesn't seem to work outside of the container, is what I meant.

Neither docker kill -s SIGTERM <containerid> or kill -SIGTERM <pid> works. In comparison, something like gcr.io/google-containers/pause does handle external signals.

@chanseokoh
Copy link
Member

All signals to the python process are ignored in this case.

Still, this is not true.

$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'

In another terminal, I do

$ docker ps
CONTAINER ID   IMAGE                       COMMAND                  CREATED         STATUS         PORTS     NAMES
751e3f71ad5a   gcr.io/distroless/python3   "/usr/bin/python3.5 …"   4 seconds ago   Up 3 seconds             goofy_panini
$ docker kill -s SIGINT 75
75

And in the original terminal,

$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyboardInterrupt
$ 

@jsravn
Copy link

jsravn commented Mar 4, 2021

SIGINT works here as well, but not SIGTERM. Is that expected? If I run it outside of distroless, it does quit upon receiving SIGTERM.

@chanseokoh
Copy link
Member

At least SIGKILL (yes, I did confirm) and SIGINT work. I am not a Python dev, so I don't know about what's with SIGTERM. But this is not a Distroless issue. The behavior is consistent outside Distroless.

Using the official Docker Hub python image,

$ docker pull python
Using default tag: latest
latest: Pulling from library/python
Digest: sha256:e2cd43d291bbd21bed01bcceb5c0a8d8c50a9cef319a7b5c5ff6f85232e82021
Status: Image is up to date for python:latest
docker.io/library/python:latest
$ docker run --name test --rm --entrypoint python python -c 'import signal; signal.pause()'
... container pauses ...

In another terminal,

$ docker ps
CONTAINER ID   IMAGE     COMMAND                  CREATED          STATUS          PORTS     NAMES
0c6a21270f3d   python    "python -c 'import s…"   41 seconds ago   Up 40 seconds             test
$ docker kill -s SIGTERM test
test
$ docker kill -s SIGTERM test
test
$ docker kill -s SIGTERM test
test
$ docker kill -s SIGINT test
test
$ docker kill -s SIGINT test
Error response from daemon: Cannot kill container: test: No such container: test
$

@jsravn
Copy link

jsravn commented Mar 4, 2021

SIGKILL will work because the OS hard kills the process. SIGINT works I believe because docker itself installs an INT handler, so you can do ^-c. You're right that the official python process doesn't seem to work either. So it seems probably related to being pid 1. How do you expect it to work gracefully in kubernetes though, which relies on SIGTERM for terminating pods? I guess that goes back to the idea of using tini or something. Thanks for your help!

@chanseokoh
Copy link
Member

chanseokoh commented Mar 4, 2021

SIGINT works I believe because docker itself installs an INT handler, so you can do ^-c.

I don't know what kind of INT handler you are talking about that is installed in which process at which level, but it's not that a process can modify another running process to magically install some signal handler to do something. In fact, no OS allows one process to modify another. A process can only send signals to others.

What is clear is that, the python process registered an INT handler with its own stack-printing code, it is designed to receive and react to SIGINT, I was able to send SIGINT to python (how I can send signals is not important, BTW), and the python process did react to the signal in the tests above by printing out its internal stack trace to the console. It's the python process that does this printing in its own way, no one else.

Traceback (most recent call last):
  File "<string>", line 1, in <module>

If you are saying that the Docker runtime doesn't just allow sending SIGTERM to a process in a container running on its Docker runtime, then that's a Docker runtime problem. (However, I don't think that is the case.)


But then, I found out the correct reason: https://hackernoon.com/my-process-became-pid-1-and-now-signals-behave-strangely-b05c52cc551c

Well PID 1 is special in Linux, amongst other things it ignores any signals unless a handler for that signal is explicitly declared

Note: A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. So, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.

@jsravn
Copy link

jsravn commented Mar 5, 2021

I meant when you do docker run <container> in your shell, docker itself handles SIGINT so you can do ^-c at the command line. I assume it passes SIGINT onwards to the process. You can see that docker CLI continues to run after doing docker run, it isn't replaced by the container process as far as I can tell.

But it's interesting that python itself seems to handle SIGINT as pid 1 but ignores SIGTERM completely. I still can't explain why that's the case - I guess this is something baked into python.

Doing this, I can get it to respond to SIGTERM:

docker run --rm gcr.io/distroless/python3 -c 'import signal; import sys; signal.signal(signal.SIGTERM, lambda a,b : sys.exit(0)); signal.pause()'

@chanseokoh
Copy link
Member

chanseokoh commented Mar 5, 2021

Just felt I wanted add an update to prevent any further dissemination of misinformation or any possible confusion for posterity who are not familiar with signal handling.

SIGINT works I believe because docker itself installs an INT handler, so you can do ^-c

I should clearly point out that this is not true.

Even if the docker CLI didn't implement a SIGINT handler and completely ignored typing ^-c, the python process running inside a container on the Docker runtime can and will accept SIGINT and run its stack-pretty-printing handler code, whether it runs as PID 1 or not. The python is explicitly coded to do so. (Moreover, even if docker didn't have a SIGINT handler, it already provides other means to send signals to the running process (docker kill). And even if docker kill didn't exist, you can still send any signals to python, as demonstrated in #550 (comment)).

python itself seems to handle SIGINT as pid 1 but ignores SIGTERM completely. I still can't explain why that's the case

See the linked article in #550 (comment).

@jsravn
Copy link

jsravn commented Mar 5, 2021

Thanks for the info, but I wasn't trying to disseminate misinformation. Just trying to understand where the issue lies - whether with distroless or elsewhere. The linked article itself I found a bit confusing. It states that "it ignores any signal with the default action.". This makes it sound like the process is doing something special to ignore these signals. But all that is happening is that the default actions for those signals are not installed by the kernel for PID 1 as a special case.

In summary:

  1. For PID 1, default actions are not installed for signals by the kernel. It's better to look at the manpage for signal: https://linux.die.net/man/7/signal to see the default actions. The linked article and the docker manual state it in a confusing way (to me at least).
  2. The Python runtime installs its own SIGINT handler. Thus the container process responds to SIGINT however you send it, even as PID 1.
  3. Docker CLI, when you do a foreground run, seems to install a SIGINT handler, which it uses to capture SIGINT and then pass it to the container process when you do ^-c. If it didn't have this, the receiving process would not get the SIGINT at the command line (unless you did -it to attach the TTY to the process, probably). You can observe this pretty easily by sending SIGINT to the docker pid, and the python process then also gets the SIGINT. But this is mostly irrelevant to the discussion, which was about SIGTERM.

I believe both Java and golang seem to handle SIGTERM just fine as PID 1, so it's Python just not setting up its own handler for it, and relying on the OS default action to be installed. In my mind, this means it would make some sense for distroless Python to do something special to handle SIGTERM when running python as pid 1. Otherwise people will run into issues with it not stopping properly such as in K8s, unless they take special action.

@chanseokoh
Copy link
Member

chanseokoh commented Mar 5, 2021

In my mind, this means it would make some sense for distroless Python to do something special to handle SIGTERM when running python as pid 1.

IMO, this is out of the scope of Distroless. PID 1 means a lot on Linux and is supposed to take a lot of responsibilities including adopting orphan processes, reaping zombie processes, handling signals, etc. And the way you want to deal with these issues highly depends on your situation. People suggest different ideas and approaches for resolving these issues, and if you google, you'll find a lot of articles about what they think is their problem and what would be the best solution in their situation. There are many different ways and tools. For some, directly running your application process as PID 1 just works, and they think it's perfectly fine; no need for the usual PID 1 responsibilities. If your long-running process can spawn child processes and has a possibility of zombie processes starving all PIDs, perhaps deploying a full init system may be what you want, which can also be a solution to your python SIGTERM issue as well. But some folks don't like this setup but argues that running a single process in a container is the best practice which lets the container runtime manage the lifecycle of an application; with multiple processes, there's added complexity to properly handle app lifecycle. And in your situation, you only seem to care about whether SIGTERM terminates python or not, but for others, it's not just a matter of a particular signal working in a certain way. For example, you said "Java seems to handle SIGTERM just fine as PID 1", but in a containerized environment, some folks still think the way Java handles SIGTERM is problematic and often people write a wrapper script around it: #464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants