Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: no kernel image is available for execution on the device #20

Open
xibian1120 opened this issue Dec 13, 2023 · 0 comments

Comments

@xibian1120
Copy link

xibian1120 commented Dec 13, 2023

when running the code ,it appers the error as follows:

Traceback (most recent call last):
  File "./train.py", line 95, in <module>
  File "./train.y", line 78, in main
    zero_shot_evaluation(model, val_loaders, opts)
  File "/media/yxl/a/2025191008/VALOR-master/train_utils.py", line 247, in zero_shot_evaluation
    eval_log = validate(model, test_loader, opts, global_step=0, total_step=opts.num_train_steps)
  File "/media/yxl/a/2025191008/VALOR-master/test.py", line 26, in validate
    val_log = validate_single(model, loader, task.split('--')[0], opts, global_step, total_step,task.split('--')[1])
  File "/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/media/yxl/a/2025191008/VALOR-master/test.py", line 40, in validate_single
    return validate_cap(model, val_loader, task, opts, global_step, dset_name)
  File "/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/media/yxl/a/2025191008/VALOR-master/test.py", line 161, in validate_cap
    evaluation_dict = model(batch, task_str, compute_loss=False)
  File "/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/apex/amp/_initialize.py", line 196, in new_fwd
    output = old_fwd(*applier(args, input_caster),
  File "/media/yxl/a/2025191008/VALOR-master/model/pretrain.py", line 135, in forward
    return self.forward_cap(batch, task, compute_loss=compute_loss)
  File "/media/yxl/a/2025191008/VALOR-master/model/pretrain.py", line 726, in forward_cap
    return self.generate_cap(batch, task)
  File "/media/yxl/a/2025191008/VALOR-master/model/pretrain.py", line 930, in generate_cap
    video_input = self.get_multimodal_forward_input_video(video_output) 
  File "/media/yxl/a/2025191008/VALOR-master/model/modeling.py", line 490, in get_multimodal_forward_input_video
    video_output =  video_output + self.video_frame_embedding[:,:video_output.shape[1],:].unsqueeze(-2)
RuntimeError: CUDA error: no kernel image is available for execution on the device
  0%|                                                                                                                                                      | 0/1495 [00:00<?, ?it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 15399) of binary: /home/yxl/anaconda3/envs/valor_env/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed

I've tried every method on the Internet but still don't solve the problem.
My environment :
_sys.platform linux
Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
numpy 1.24.4
detectron2 failed to import
detectron2._C not built correctly: No module named 'detectron2'
Compiler ($CXX) c++ (GCC) 7.3.0
CUDA compiler Build cuda_11.1.TC455_06.29190527_0
DETECTRON2_ENV_MODULE
PyTorch 1.9.0+cu111 @/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0 GeForce RTX 2080 Ti (arch=7.5)
CUDA_HOME /usr/local/cuda
TORCH_CUDA_ARCH_LIST 7.5
Pillow 10.1.0
torchvision 0.10.0+cu111 @/home/yxl/anaconda3/envs/valor_env/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
cv2 Not found


PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant