Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with relion5 using 2D classification on aws g6 instances #1148

Open
Cookiemaster33 opened this issue Jun 20, 2024 · 1 comment
Open

Comments

@Cookiemaster33
Copy link

Hi there I am using relion5 running via SGE/qsub on aws clusters.

So far everything was running fine on g5 instances which use a NVIDIA A10G Tensor Core GPUs. We now switched to g6 instances which use NVIDIA L4 Tensor Core GPUs. During 2D classification we get the error: "failed to create cuffs plan".

Any idea what could be wrong?

Thanks and best

Toby

Environment:

  • OS: Ubuntu 18.04.5 LTS
  • MPI runtime: [e.g. OpenMPI 2.0.1]
  • RELION version: Relion 5.0
  • Memory: 192 GB
  • GPU: NVIDIA L4 Tensor Core GPU

Dataset:

  • Box size: 180 pix
  • Pixel size: 0.71 Å/px
  • Number of particles: 50,000

Job options:

  • Type of job: Class2D
  • Number of MPI processes: 1
  • Number of threads: 12
  • Full command: which relion_refine --o Class2D/job010/run --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 100 --i Extract/job006/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 198.0 --K 25 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 12 --gpu "0,1,2,3" --pipeline_control Class2D/job010/

Error message:

in: /relion/src/projector.cpp, line 362
ERROR:
failed to create cufft plan
=== Backtrace ===
/opt/relion/bin/relion_refine(_ZN11RelionErrorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x77) [0x56106c48bbd7]
/opt/relion/bin/relion_refine(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iibbiPKS1_b+0x36a3) [0x56106c52a8c3]
/opt/relion/bin/relion_refine(_ZN7MlModel23setFourierTransformMapsEbidPK13MultidimArrayIdE+0x901) [0x56106c69d271]
/opt/relion/bin/relion_refine(_ZN11MlOptimiser16expectationSetupEv+0x5a) [0x56106c4b16ea]
/opt/relion/bin/relion_refine(_ZN11MlOptimiser11expectationEv+0x34) [0x56106c4e1824]
/opt/relion/bin/relion_refine(_ZN11MlOptimiser7iterateEv+0x37a) [0x56106c4fd63a]
/opt/relion/bin/relion_refine(main+0x51) [0x56106c476c91]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x14c1b6623bf7]
/opt/relion/bin/relion_refine(_start+0x2a) [0x56106c47a5ea]

ERROR:
failed to create cufft plan

@biochem-fan
Copy link
Member

Which version of CUDA did you use to compile RELION?
Is it compatible with "Ubuntu 18.04.5 LTS"? This is very very old OS and you shouldn't use it.

Did you specify CUDA_ARCH? (You shouldn't, if you want to share the binary with different GPUs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants