Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mowgli does not detect GPU #10

Open
mdmanurung opened this issue Feb 1, 2024 · 5 comments
Open

Mowgli does not detect GPU #10

mdmanurung opened this issue Feb 1, 2024 · 5 comments

Comments

@mdmanurung
Copy link

Dear authors,

Thank you for writing the package.

Prior to running mowgli, I have ensured that CUDA+GPU is detectable via torch.
image

However, I got the following error message: ValueError: Expected a cuda device, but got: cpu

This is the full error message

ValueError                                Traceback (most recent call last)
Cell In[83], line 1
----> 1 model.train(mdata2[1:1000,:])

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mowgli/models.py:255, in MowgliModel.train(self, mdata, max_iter_inner, max_iter, device, dtype, lr, optim_name, tol_inner, tol_outer, normalize_rows)
    251 try:
    252     for _ in range(max_iter):
    253 
    254         # Perform the `W` optimization step.
--> 255         self.optimize(
    256             loss_fn=self.loss_fn_w,
    257             max_iter=max_iter_inner,
    258             tol=tol_inner,
    259             history=self.losses_h,
    260             pbar=pbar,
    261             device=device,
    262         )
    264         # Update the shared factor `W`.
    265         htgw = 0

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/mowgli/models.py:398, in MowgliModel.optimize(self, loss_fn, max_iter, history, tol, pbar, device)
    394 if i % 10 == 0:
    395 
    396     # Add a value to the loss history.
    397     history.append(loss_fn().cpu().detach())
--> 398     gpu_mem_alloc = torch.cuda.memory_allocated(device=device)
    400     # Populate the progress bar.
    401     pbar.set_postfix(
    402         {
    403             "loss": total_loss,
   (...)
    407         }
    408     )

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/torch/cuda/memory.py:326, in memory_allocated(device)
    311 def memory_allocated(device: Union[Device, int] = None) -> int:
    312     r"""Returns the current GPU memory occupied by tensors in bytes for a given
    313     device.
    314 
   (...)
    324         details about GPU memory management.
    325     """
--> 326     return memory_stats(device=device).get("allocated_bytes.all.current", 0)

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/torch/cuda/memory.py:205, in memory_stats(device)
    202     else:
    203         result.append((prefix, obj))
--> 205 stats = memory_stats_as_nested_dict(device=device)
    206 _recurse_add_to_result("", stats)
    207 result.sort()

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/torch/cuda/memory.py:216, in memory_stats_as_nested_dict(device)
    214 if not is_initialized():
    215     return {}
--> 216 device = _get_device_index(device, optional=True)
    217 return torch._C._cuda_memoryStats(device)

File /exports/archive/hg-funcgenom-research/mdmanurung/conda/envs/totalvi/lib/python3.9/site-packages/torch/cuda/_utils.py:30, in _get_device_index(device, optional, allow_cpu)
     28             raise ValueError('Expected a cuda or cpu device, but got: {}'.format(device))
     29     elif device.type != 'cuda':
---> 30         raise ValueError('Expected a cuda device, but got: {}'.format(device))
     31 if not torch.jit.is_scripting():
     32     if isinstance(device, torch.cuda.device):

Do you have any suggestion to solve this? Thanks in advance.

Regards,
Mikhael

@gjhuizing
Copy link
Collaborator

Hi @mdmanurung, could you try model.train(..., device="cuda") ? This should fix your issue!

@mdmanurung
Copy link
Author

Thanks! It does solve the problem.

@gjhuizing gjhuizing reopened this Feb 1, 2024
@gjhuizing
Copy link
Collaborator

Awesome! Feel free to reach out here or by email if you have more questions or if you have feedback on the tool :)

Reopening the issue because the error is a bit cryptic, I should make the behavior clearer and add some warnings is CPU is selected

@mdmanurung
Copy link
Author

Thanks! Out of curiosity, are you preparing for a version that can also handle batch correction? This would be very much needed...

@gjhuizing
Copy link
Collaborator

Not in the works for now I'm afraid... But you can always apply a batch correction method that works on counts before Mowgli. Or a batch correction method that works on embeddings (like harmony) after Mowgli.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants