-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance and scaling nomad-driver-podman under high allocation loads #175
Comments
Not a solution here, but I have seen a similar behaviour to your second case with a smaller cluster. Most of the time, the podman socket CPU usage is very high even when most of the services are idle. The more services that are running on a node the greater the CPU usage is. If any CPU intensive task happens, the socket stops responding and allocations start failing for a while. I haven't had the chance to debug further but it might be a bug on podman's service. I don't use docker so I don't know how it behaves, but I doubt this behaviour is normal there since I have seen docker machines running more containers. |
@jdoss regarding problem 1: i think this is not directly related to this driver. Nomad uses, by default, the bin-pack strategy to run tasks. It will always try to fill-up a node before it considers another one. An alternative is the so called spread scheduler, it will evenly distribute work. Problem 2: i am aware of this problem, it happens also in our environment. Root cause is for now unclear and we have a systemd-timer to check/cleanup the socket periodically as a workaround. @rina-spinne getting metrics/stats from a single container is somewhat expensive. Running many containers concurrently and polling stats in a frequent pace can quickly cause quite some load. Maybe you can tune the collection_interval configuration option? It has an aggressive default of just 1 second. A good solution is to align it with you metric collector interval. This way you end up with a 30s or 60s for a typical prometheus setup. |
Thanks for this tip. I will modify my jobs to see if I can spread things out to prevent the socket from getting overloaded and report back.
Could you be able to share this unit and timer? |
@towe75 @rina-spinne I opened containers/podman#14941 to see if the Podman team has any thoughts on this issue. If you have any additional context to add to that issue, I am sure that would help track things down. |
Maybe related hashicorp/nomad#16246 |
@jdoss i do not think that it's related. |
I have been using the Podman driver for my workloads and my current project involves a very large Nomad cluster and launching 16,000 containers across the cluster. I have noticed some performance and scaling issues that have impacted my deployment of the workloads. I am wondering if there are any specific steps I could take to improve the stability of my deployments and optimize the number of containers per client node. Here are two major issues I am setting when launching jobs in batches of 4000:
The failed allocations tend to snowball a client node into an usable state because the Podman socket cannot fully recover to accept new allocations. The leads to a large amount of failed allocations.
Does anyone have any recommendation for changing my jobs so they spread out more evenly across my client nodes? I think I need to have more time between container start. I am using these settings in my job:
Also, any thoughts on why the podman socket gets overwhelmed by the driver? My client nodes use Fedora CoreOS which has pretty decent sysctl settings out of the box and I am using the Nomad recommended settings as well:
Does anyone else use the Podman driver for high allocation workloads?
The text was updated successfully, but these errors were encountered: