Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BRITS imputation test fails on cuda device mismatch #10

Closed
MaciejSkrabski opened this issue Aug 2, 2022 · 4 comments
Closed

BRITS imputation test fails on cuda device mismatch #10

MaciejSkrabski opened this issue Aug 2, 2022 · 4 comments

Comments

@MaciejSkrabski
Copy link
Contributor

MaciejSkrabski commented Aug 2, 2022

Hi,
when trying to run imputation tests with commit 6dcc894 on dev branch.

py3.9_cuda11.3_cudnn8.2.0_0

$ python -m pytest tests/test_imputation.py

./tests/test_imputation.py::TestBRITS::test_parameters Failed with Error: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
  File ".../unittest/case.py", line 59, in testPartExecutor
    yield
  File ".../unittest/case.py", line 588, in run
    self._callSetUp()
  File ".../unittest/case.py", line 547, in _callSetUp
    self.setUp()
  File ".../PyPOTS/pypots/tests/test_imputation.py", line 98, in setUp
    self.brits.fit(self.train_X, self.val_X)
  File "/PyPOTS/pypots/imputation/brits.py", line 504, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "/PyPOTS/pypots/imputation/base.py", line 154, in _train_model
    if np.equal(self.best_loss, float("inf")):
  File .../lib/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
@MaciejSkrabski
Copy link
Contributor Author

similar issue with GRUD:

ERROR: test_classify (tests.test_classification.TestGRUD)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../PyPOTS/pypots/tests/test_classification.py", line 64, in setUp
    self.grud.fit(self.train_X, self.train_y, self.val_X, self.val_y)
  File ".../PyPOTS/pypots/classification/grud.py", line 151, in fit
    training_set = DatasetForGRUD(train_X, train_y)
  File ".../PyPOTS/pypots/data/dataset_for_grud.py", line 35, in __init__
    self.X_filledLOCF = self.locf.locf_torch(X)
  File ".../PyPOTS/pypots/imputation/locf.py", line 89, in locf_torch
    idx = torch.where(~mask, torch.arange(n_features, device=mask.device), 0)
RuntimeError: Expected condition, x and y to be on the same device, but condition is on cuda:0 and x and y are on cpu and cpu respectively

@MaciejSkrabski
Copy link
Contributor Author

similar issue with CRLI:

./tests/test_clustering.py::TestCRLI::test_parameters Failed with Error: Training got interrupted. Model was not get trained. Please try fit() again.
  File ".../PyPOTS/pypots/clustering/crli.py", line 350, in _train_model
    results = self.model.forward(inputs, training_object='discriminator')
  File ".../PyPOTS/pypots/clustering/crli.py", line 237, in forward
    inputs = self.cluster(inputs, training_object)
  File ".../PyPOTS/pypots/clustering/crli.py", line 215, in cluster
    imputation, imputed_X, generator_fb_hidden_states = self.generator(inputs)
  File ".../python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../PyPOTS/pypots/clustering/crli.py", line 102, in forward
    f_outputs, f_final_hidden_state = self.f_rnn(inputs)
  File ".../python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../PyPOTS/pypots/clustering/crli.py", line 78, in forward
    estimation = self.output_layer(hidden_state)
  File ".../python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File ".../python3.9/unittest/case.py", line 59, in testPartExecutor
    yield
  File ".../python3.9/unittest/case.py", line 588, in run
    self._callSetUp()
  File ".../python3.9/unittest/case.py", line 547, in _callSetUp
    self.setUp()
  File ".../PyPOTS/pypots/tests/test_clustering.py", line 25, in setUp
    self.crli.fit(self.train_X)
  File ".../PyPOTS/pypots/clustering/crli.py", line 298, in fit
    self._train_model(training_loader)
  File ".../PyPOTS/pypots/clustering/crli.py", line 383, in _train_model
    raise RuntimeError('Training got interrupted. Model was not get trained. Please try fit() again.')
RuntimeError: Training got interrupted. Model was not get trained. Please try fit() again.

@MaciejSkrabski
Copy link
Contributor Author

and VaDER:

./tests/test_clustering.py::TestVaDER::test_parameters Failed with Error: Expected condition, x and y to be on the same device, but condition is on cuda:0 and x and y are on cpu and cpu respectively
  File ".../python3.9/unittest/case.py", line 59, in testPartExecutor
    yield
  File ".../python3.9/unittest/case.py", line 588, in run
    self._callSetUp()
  File ".../python3.9/unittest/case.py", line 547, in _callSetUp
    self.setUp()
  File ".../PyPOTS/pypots/tests/test_clustering.py", line 56, in setUp
    self.vader.fit(self.train_X)
  File ".../PyPOTS/pypots/clustering/vader.py", line 323, in fit
    training_set = DatasetForGRUD(train_X)
  File ".../PyPOTS/pypots/data/dataset_for_grud.py", line 35, in __init__
    self.X_filledLOCF = self.locf.locf_torch(X)
  File ".../PyPOTS/pypots/imputation/locf.py", line 89, in locf_torch
    idx = torch.where(~mask, torch.arange(n_features), 0)
RuntimeError: Expected condition, x and y to be on the same device, but condition is on cuda:0 and x and y are on cpu and cpu respectively

@WenjieDu
Copy link
Owner

Solved by MaciejSkrabski in PR #11 that got merged successfully. So I close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants