-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log.json of a container may grow to burst the tmpfs of /run, if a k8s user configure an exec liveness probe of a non exist executable file name. #8972
Comments
It sounds like DDoS :( kubelet should have throttling for this. |
Actually, the livenessprobe is called every few seconds accoding to the config in pod.yaml, it seems not that frequent. it is just the accumulation of log if the pod runs for a long time.
Good idea! Shall we remove the file of log.json after |
We can file a issue to seek the runc maintainer's comment. (I don't have background knowledge about this part :( ) |
Hi all, I'm facing this same issue at the moment where the /run directory gets filled up due to the log.json file and pods start crashing. Right now a quick workaround is to restart contained, but of course, it's not ideal. Do you think there's a less reactive temp solution for this? |
same issue,may be we can keep just last record or errMsg in log.json so that we can keep its size |
same question, a log of logs like these with interval 1s:
|
Got the same issue. Every 10 seconds (as a liveness probe configured) it prints messages to log.json. It's a calico pod and it has a liveness probe configured like this.
Though it seems fine it fills the /run folder pretty rapidly. The only solution I found so far is to daily truncate a file with a cronjob. the log.json lookes just like @so2bin: |
@Archesky and @so2bin Looks like you are using nvidia-container-toolkit. The toolkit provides its own runtime (hooks) in /usr/loca/nvidia directory. The nvidia runtime eventually calls "runc", the low-level runtime. The informational messages in log.json are from NVidia's runtime, which should be (IMHO) set to DEBUG level in the code. See source code starting on line 75. This is a relatively new addition that adds more text in log.json and fills /run N times faster. The other part of part of the livenessProbe message comes from this part of code. Probably be better if it were at DEBUG level. Work around ?
|
I have created NVIDIA/nvidia-container-toolkit#560 to reduce the logging verbosity in the NVIDIA Container Runtime. This was also pointed out in NVIDIA/nvidia-container-toolkit#511 |
we can let runc log into a fifo. #10430 |
Description
The
runc exec
command prints the error log to/run/containerd/io.containerd.runtime.v2.task/k8s.io/<cid>/log.json
.When a k8s user configure a exec liveness probe with a executable file that does not exist in container, this exec command will execute periodically according to the liveness probe config, and it will fail again and again. kubelet will consider this failure as a wrong config of liveness probe and will just send an event to k8s master, rather than restart the container. so the log.json file will grow unlimitedly if the k8s user ignored the liveness probe failure as all functions seems working properly.
As a result, the k8s node will become unhealthy as the tmpfs of /run may be bursted.
Steps to reproduce the issue
/run/containerd/io.containerd.runtime.v2.task/k8s.io/<cid>/log.json
Describe the results you received and expected
expected: The misconfiguration of a container will not break the whole node.
received: the tmpfs of /run will burst and the node will be broken.
What version of containerd are you using?
v1.6.14
Any other relevant information
runc version: 1.1.3
Show configuration if it is related to CRI plugin.
config.toml is not related to this issue.
The text was updated successfully, but these errors were encountered: