Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI directory not bind-mounted to podman containers under 1.3.0 #168

Closed
optiz0r opened this issue May 17, 2022 · 5 comments · Fixed by #169
Closed

CSI directory not bind-mounted to podman containers under 1.3.0 #168

optiz0r opened this issue May 17, 2022 · 5 comments · Fixed by #169

Comments

@optiz0r
Copy link
Contributor

optiz0r commented May 17, 2022

Nomad version

Nomad v1.3.0 (52e95d64113e01be05d585d8b4c07f6f19efebbc)

Operating system and Environment details

Issue

The CSI directory (/opt/nomad/data/client/csi/plugin/${uuid}) does not get bind-mounted into a job which defines the csi_plugin stanza. This means
a) the directory the application should be creating the csi.sock socket into, may not exist
b) Nomad marks the task as failed after a short time because the csi socket does not appear to be fingerprinted.

This behaviour appears to be specific to the podman task driver, since when the same job is run under the docker or exec drivers, the /csi directory does exist.

(I think this used to work under previous nomad versions, but I stopped using it a while ago due to the stale allocations issue, and actually this might have been under the docker driver rather than the podman one)

Reproduction steps

  • Run the below job spec
  • Watch the stderr logs from the job report missing directory
  • Watch the job logs report timeout

Expected Result

For the provided job spec, ls should succeed.

For an actual csi controller job:

  • The csi.sock should be created by the controller process
  • Nomad should fingerprint the controller
  • The job should not be marked failed

Actual Result

Stderr from job:

2022-05-17T19:15:50.579617884+00:00 stderr F ls: /csi: No such file or directory

Job-level logs:

May 17, '22 20:16:21 +0100 | Killing | Sent interrupt. Waiting 5s before force killing
-- | -- | --
May 17, '22 20:16:20 +0100 | Killing | CSI plugin did not become healthy before timeout
May 17, '22 20:16:20 +0100 | Plugin became unhealthy | Error: CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /opt/nomad/data/client/csi/plugins/159ad33f-2834-e84e-5137-5ad2c0a8eeed/csi.sock: no such file or directory
May 17, '22 20:16:20 +0100 | Alloc Unhealthy | Unhealthy because of failed task
May 17, '22 20:16:19 +0100 | Not Restarting | Exceeded allowed attempts 2 in interval 30m0s and mode is "fail"
May 17, '22 20:16:19 +0100 | Terminated | Exit Code: 1
May 17, '22 20:16:19 +0100 | Started | Task started by client
May 17, '22 20:15:56 +0100 | Restarting | Task restarting in 18.630202552s
May 17, '22 20:15:55 +0100 | Terminated | Exit Code: 1
May 17, '22 20:15:55 +0100 | Started | Task started by client

Job file (if appropriate)

job "csi-test" {
  region = "global"
  datacenters = ["dc1"]
  type = "service"

  group "controller" {
    count = 1

    task "controller" {
      driver = "podman"
      config {
        image = "busybox:latest"
        args = ["ls", "-ld", "/csi"]
      }

      csi_plugin {
        id = "test"
        type = "controller"
        mount_dir = "/csi"
      }

      resources {
        cpu = 30
        memory = 64
      }
    }
  }
}

Nomad Server logs (if appropriate)

N/A

Nomad Client logs (if appropriate)

N/A

@tgross tgross self-assigned this May 17, 2022
@tgross
Copy link
Member

tgross commented May 18, 2022

Hi @optiz0r I looked at this a bit and I think there's some issue with podman volume mounting in general, and we need bind-mounting to work in order to configure the mounts needed for CSI. It looks like you ran into this in an unrelated issue in #142.

The podman driver documents that support for the volume mounting Capability is "none". This is a little odd because the capabilties struct is showing the default for MountConfigs, which is "all".

(Podman does document support for config.volumes but those are mostly owned entirely by the driver and the Nomad client isn't really involved, which is a bit confusing.)

I'm going to move this issue into the podman repo. I haven't spent much time at all in that codebase, so it'll take me a little bit to get up to speed to do more investigation.

@tgross tgross transferred this issue from hashicorp/nomad May 18, 2022
@optiz0r
Copy link
Contributor Author

optiz0r commented May 18, 2022

Thanks. It makes sense that proper mount support would be needed to make this work. Perhaps this should just be considered a duplicate of #142, being another usecase for needing the same functionality. Happy for you to close this in favour of the other issue if you think that makes sense.

@tgross
Copy link
Member

tgross commented May 18, 2022

Let's keep it open for now just in case there's a second layer to the problem once we start unpacking it 😀

@tgross
Copy link
Member

tgross commented May 19, 2022

I've opened #169 which should fix this, but that doesn't also close out #142, which is a new set of configuration to add.

@optiz0r
Copy link
Contributor Author

optiz0r commented May 19, 2022

I've built and tested this, seems to be working for democratic-csi too. Thanks for the very quick turnaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants