Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU enabled model raises Exception: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0 #3

Closed
MaciejSkrabski opened this issue Jun 28, 2022 · 4 comments

Comments

@MaciejSkrabski
Copy link
Contributor

Hello,
great library, but using gpu enabled machine results in errors.

pypots version = 0.0.6 (the one available in PyPI)

code to replicate problem:

import unittest
from pypots.tests.test_imputation import TestBRITS, TestLOCF, TestSAITS, TestTransformer
from pypots import __version__


if __name__ == "__main__":
    print(__version__)
    unittest.main()

results:

0.0.6
Running test cases for BRITS...
Model initialized successfully. Number of the trainable parameters: 580976
ERunning test cases for BRITS...
Model initialized successfully. Number of the trainable parameters: 580976
ERunning test cases for LOCF...
LOCF test_MAE: 0.1712224306027283
.Running test cases for LOCF...
.Running test cases for SAITS...
Model initialized successfully. Number of the trainable parameters: 1332704
Exception: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0
ERunning test cases for SAITS...
Model initialized successfully. Number of the trainable parameters: 1332704
Exception: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0
ERunning test cases for Transformer...
Model initialized successfully. Number of the trainable parameters: 666122
epoch 0: training loss 0.7681, validating loss 0.2941
epoch 1: training loss 0.4731, validating loss 0.2395
epoch 2: training loss 0.4235, validating loss 0.2069
epoch 3: training loss 0.3781, validating loss 0.1914
epoch 4: training loss 0.3530, validating loss 0.1837
ERunning test cases for Transformer...
Model initialized successfully. Number of the trainable parameters: 666122
epoch 0: training loss 0.7826, validating loss 0.2820
epoch 1: training loss 0.4687, validating loss 0.2352
epoch 2: training loss 0.4188, validating loss 0.2132
epoch 3: training loss 0.3857, validating loss 0.1977
epoch 4: training loss 0.3604, validating loss 0.1945
E
======================================================================
ERROR: test_impute (pypots.tests.test_imputation.TestBRITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 99, in setUp
    self.brits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/brits.py", line 494, in fit
    training_set = DatasetForBRITS(train_X)  # time_gaps is necessary for BRITS
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 62, in __init__
    forward_delta = parse_delta(forward_missing_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 36, in parse_delta
    delta.append(torch.ones(1, n_features) + (1 - m_mask[step]) * delta[-1])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

======================================================================
ERROR: test_parameters (pypots.tests.test_imputation.TestBRITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 99, in setUp
    self.brits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/brits.py", line 494, in fit
    training_set = DatasetForBRITS(train_X)  # time_gaps is necessary for BRITS
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 62, in __init__
    forward_delta = parse_delta(forward_missing_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 36, in parse_delta
    delta.append(torch.ones(1, n_features) + (1 - m_mask[step]) * delta[-1])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

======================================================================
ERROR: test_impute (pypots.tests.test_imputation.TestSAITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 83, in _train_model
    results = self.model.forward(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 95, in forward
    imputed_data, [X_tilde_1, X_tilde_2, X_tilde_3] = self.impute(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 62, in impute
    enc_output, _ = encoder_layer(enc_output)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 122, in forward
    enc_output, attn_weights = self.slf_attn(enc_input, enc_input, enc_input, attn_mask=mask_time)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 72, in forward
    v, attn_weights = self.attention(q, k, v, attn_mask)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 32, in forward
    attn = attn.masked_fill(attn_mask == 1, -1e9)
RuntimeError: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 35, in setUp
    self.saits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 171, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 123, in _train_model
    raise RuntimeError('Training got interrupted. Model was not get trained. Please try fit() again.')
RuntimeError: Training got interrupted. Model was not get trained. Please try fit() again.

======================================================================
ERROR: test_parameters (pypots.tests.test_imputation.TestSAITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 83, in _train_model
    results = self.model.forward(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 95, in forward
    imputed_data, [X_tilde_1, X_tilde_2, X_tilde_3] = self.impute(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 62, in impute
    enc_output, _ = encoder_layer(enc_output)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 122, in forward
    enc_output, attn_weights = self.slf_attn(enc_input, enc_input, enc_input, attn_mask=mask_time)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 72, in forward
    v, attn_weights = self.attention(q, k, v, attn_mask)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 32, in forward
    attn = attn.masked_fill(attn_mask == 1, -1e9)
RuntimeError: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 35, in setUp
    self.saits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 171, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 123, in _train_model
    raise RuntimeError('Training got interrupted. Model was not get trained. Please try fit() again.')
RuntimeError: Training got interrupted. Model was not get trained. Please try fit() again.

======================================================================
ERROR: test_impute (pypots.tests.test_imputation.TestTransformer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 68, in setUp
    self.transformer.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 257, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 129, in _train_model
    if np.equal(self.best_loss, float('inf')):
  File "mydirs(...)/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

======================================================================
ERROR: test_parameters (pypots.tests.test_imputation.TestTransformer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 68, in setUp
    self.transformer.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 257, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 129, in _train_model
    if np.equal(self.best_loss, float('inf')):
  File "mydirs(...)/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

----------------------------------------------------------------------
Ran 8 tests in 20.239s

FAILED (errors=6)

i suspect that you call .to(device) too early on data. You might also override device parameter when initiating new tensors (i.e. in torch.ones in parse_delta)

Best regards!

@WenjieDu
Copy link
Owner

Hi there,

Thank you so much for your attention to PyPOTS! If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution.

I have received your message and will respond ASAP. Thank you for your patience! 😃

Best,
Wenjie

@WenjieDu
Copy link
Owner

Hi Maciej,

Thank you very much for your feedback! Actually, I noticed this bug and fixed it in this commit to branch dev, but I haven't released it to PyPI.

And in your PR, you create an additional argument device in class DatasetForBRITS, but this is unnecessary. In fact, we should keep deltas on the same device of missing_mask. So you can fetch the device type of missing_mask with the code missing_mask.device.

@MaciejSkrabski
Copy link
Contributor Author

Thanks for looking into it. I see there is a new build available in PyPI. Thank you for the tips on my code.

@WenjieDu
Copy link
Owner

Absolutely my pleasure! 😃 Yes, I released PyPOTS v0.0.7 on PyPI to fix this bug together with some other issues.

And many thanks to your PR, which is the first one to PyPOTS and means a lot to me, though it isn't merged into the main branch. I sincerely invite you to join the community and contribute to PyPOTS together. If you have ideas or comments on PyPOTS, please let me know, and your PRs are always welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants