Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pynvml.nvml.NVMLError: System is not in ready state #54

Open
ainhoaVivel opened this issue Jun 18, 2024 · 2 comments
Open

pynvml.nvml.NVMLError: System is not in ready state #54

ainhoaVivel opened this issue Jun 18, 2024 · 2 comments

Comments

@ainhoaVivel
Copy link

ainhoaVivel commented Jun 18, 2024

  • Python version: 3.9.19
  • Operating System: AlmaLinux 9.3 (Shamrock Pampas Cat)
  • Pynvml version: 11.5.0

Description

I am using CodeCarbon to make some consumption measurements. However, this library uses pynvml in the background to access the graph information. I asked in the project repository and it seems that my problem is that pynvml is not working properly.

What I Did

I created this script

import pynvml

try:
    pynvml.nvmlInit()
    print("NVML initialized successfully")

    handle = pynvml.nvmlDeviceGetHandleByIndex(0)
    print(f"Device 0: {pynvml.nvmlDeviceGetName(handle)}")

    total_energy = pynvml.nvmlDeviceGetTotalEnergyConsumption(handle)
    print(f"Total energy consumption: {total_energy} mJ")

except pynvml.NVMLError as error:
    print(f"Failed to initialize NVML: {error}")

finally:
    pynvml.nvmlShutdown()

However, I got this output

NVML initialized successfully
Device 0: NVIDIA H100 PCIe
Failed to initialize NVML: System is not in ready state

I have tried several versions pynvml, but nothing. I can't find any additional information about the System is not in ready state error either. How can I fix this error?

@Lucas-Otavio
Copy link

I am also using CodeCarbon and I'm facing some similar issues, but the execution environment and error message are different.

I am trying to dockerize a project that uses Code Carbon, and it does not work inside the docker, even though nvidia-smi outputs as usual.

Environment:

  • Container's Base: nvidia/cuda:12.5.0-devel-ubuntu22.04
  • Python Version: 3.10.12
  • pynvml version: 11.5.0
  • Operating System: Ubuntu 22.04

Output:
When running the same script, the error was different.

NVML initialized successfully
Device 0: NVIDIA GeForce GTX 980M
Failed to initialize NVML: Not Supported

@rjzamora
Copy link
Collaborator

I have tried several versions pynvml, but nothing. I can't find any additional information about the System is not in ready state error either. How can I fix this error?

Sorry for the delayed response @ainhoaVivel! (this project is not really active)

“System is not in ready state” means that the GPU could not be properly initialized, but it's hard to speculate on the cause.

Are you able to run nvidia-smi in a terminal? (sorry if you are no longer working on this)

When running the same script, the error was different.

@Lucas-Otavio - It looks like Maxwell is not supported for nvmlDeviceGetTotalEnergyConsumption. The NVML documentation says: "For Volta or newer fully supported devices".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants