Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFS not mounting on computing nodes #6094

Closed
maestro7879 opened this issue Feb 10, 2024 · 1 comment
Closed

EFS not mounting on computing nodes #6094

maestro7879 opened this issue Feb 10, 2024 · 1 comment
Labels

Comments

@maestro7879
Copy link

Required Info:

  • AWS ParallelCluster version [e.g. 3.1.1]: 3.8.0
  • Full cluster configuration without any credentials or personal data.
  • Cluster name: prod-cluster
  • Output of pcluster describe-cluster command.
  • [Optional] Arn of the cluster CloudFormation main stack:

Bug description and how to reproduce:
I'm receiving the below error when compute nodes launch. The headnode is fine and mount EFS.
This is a custom AMI with AWS PC installed on top.

I'm able to mount EFS on the compute nodes so I have ruled out connectivity.
Assuming this is what is running. This works manually, still fails on the compute node build though.
sudo mount -t efs -o tls fs-03ef4f0037ca3b4da:/ /opt/parallelcluster/shared
sudo mount -t efs -o tls fs-08ecab8914d696b6f:/ /opt/parallelcluster/home

If there is somewhere I can look at what the below is attempting that would be helpful.
Recipe: aws-parallelcluster-environment::mount_internal_use_ebs

  • volume[mount /opt/parallelcluster/shared] action mount[2024-02-10T01:11:03+00:00] INFO: Processing volume[mount /opt/parallelcluster/shared] action mount (aws-parallelcluster-environment::mount_internal_use_ebs line 22)

    • directory[/opt/parallelcluster/shared] action create[2024-02-10T01:11:03+00:00] INFO: Processing directory[/opt/parallelcluster/shared] action create (aws-parallelcluster-environment::mount_internal_use_ebs line 42)
      [2024-02-10T01:11:03+00:00] INFO: directory[/opt/parallelcluster/shared] mode changed to 1777

      • change mode from '0755' to '01777'
    • mount[mount /opt/parallelcluster/shared] action mount[2024-02-10T01:11:03+00:00] INFO: Processing mount[mount /opt/parallelcluster/shared] action mount (aws-parallelcluster-environment::mount_internal_use_ebs line 51)
      [2024-02-10T01:14:06+00:00] INFO: Retrying execution of mount[mount /opt/parallelcluster/shared], 9 attempts left

@maestro7879
Copy link
Author

The issue was with iptables on the headnode. 2049 wasn't open for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant