Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look up tensor device member inside Tensor is_pinned() implementation instead of accepting an outside input #128988

Open
daulet-askarov opened this issue Jun 18, 2024 · 3 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@daulet-askarov
Copy link
Contributor

daulet-askarov commented Jun 18, 2024

🚀 The feature, motivation and pitch

Background: We recently found a bug where our cpu_test_tensor.is_pinned() call unexpectedly consumed considerable memory. The original cpu_test_tensor was on the device="cpu". It turned out, however, that is_pinned accepts an optional device input param and by default the internal implementation assumes device="cuda", so possibly Cuda context was created consuming additional memory. We are fixing the immediate problem with PR 128896 by passing device=cpu_tensor.device to is_pinned call.

Proposal: Drop the optional device input to is_pinned() and instead just have it automatically look up the device member field from the self Tensor object itself. This should make for a more robust API where user avoids being defaulted to a possibly incorrect device value.

Alternatives

No response

Additional context

See D58687049 for a repro of the original memory problem.

cc @ptrblck @msaroufim

@albanD
Copy link
Collaborator

albanD commented Jun 19, 2024

A pinned Tensor is always on CPU and pinned wrt an accelerator (old default being cuda). Asking is a Tensor is pinned wrt to cpu doesn't really make sense.

The proper fix here would be to check if the cuda context is not initialized. If it is not, then it is not possible for the Tensor to be pinned and so we can just return False.

@albanD albanD added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 19, 2024
@daulet-askarov
Copy link
Contributor Author

A pinned Tensor is always on CPU and pinned wrt an accelerator (old default being cuda). Asking is a Tensor is pinned wrt to cpu doesn't really make sense.

The proper fix here would be to check if the cuda context is not initialized. If it is not, then it is not possible for the Tensor to be pinned and so we can just return False.

Oh, I see. So the device input param to is_pinned is not really the tensor device but the device with respect to which it is pinned if any. We should probably update the docs for it. Regarding checking if the Cuda context exists and short circuit returning false makes total sense. Would you be able to make this change for is_pinned?

@daulet-askarov
Copy link
Contributor Author

Reopening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants