Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed CUDA randint generation for large ranges. #126066

Conversation

tringwald
Copy link
Collaborator

@tringwald tringwald commented May 13, 2024

Fixes #125224

For large ranges, calls to CUDA randint use a different unroll_factor to generate random ints. This unroll_factor was not considered correctly in the calculation of the Philox offsets. Thus, some of the random states were reused, resulting in lower entropy (see #125224).

This also affects multiple other random functions, such as torch.rand and torch.randn.

@tringwald tringwald requested a review from eqy as a code owner May 13, 2024 13:38
Copy link

pytorch-bot bot commented May 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126066

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (6 Unrelated Failures)

As of commit b0b9064 with merge base 3d56673 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_cuda.py Outdated Show resolved Hide resolved
test/test_cuda.py Outdated Show resolved Hide resolved
aten/src/ATen/native/cuda/DistributionTemplates.h Outdated Show resolved Hide resolved
@tringwald
Copy link
Collaborator Author

@r-barnes Thanks for reviewing, I added some type annotations and changed the C++ parameters to const.

@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch from 3ea6988 to 849bf9e Compare May 13, 2024 21:26
test/test_cuda.py Outdated Show resolved Hide resolved
@tringwald
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda-randint-randomness-for-large-range onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda-randint-randomness-for-large-range && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the cuda-randint-randomness-for-large-range branch from b09c3f1 to cb7925c Compare May 14, 2024 07:36
@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch 4 times, most recently from 303b76e to 0a7226b Compare May 18, 2024 20:05
@eqy
Copy link
Collaborator

eqy commented May 19, 2024

CC @drisspg who might know more about the SDPA tests

@tringwald
Copy link
Collaborator Author

tringwald commented May 19, 2024

Thanks @eqy. Those tests in test_transformers.py use torch._fill_mem_eff_dropout_mask_, which in turn calls a custom CUDA kernel to populate the dropout mask with uniform values before thresholding. I'm not sure why we don't use torch.rand there, but it seems like replacing the custom impl with torch.rand yields some weird test failures.
I've rolled back the test changes for now, so I can more easily debug the other failures, but we should probably reconsider if we need a custom rand impl for those tests.

@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch from ada1975 to 993afca Compare June 8, 2024 13:28
@pytorch-bot pytorch-bot bot added the release notes: linalg_frontend release notes category label Jun 8, 2024
@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch 4 times, most recently from c6a34be to 04e4e82 Compare June 15, 2024 19:00
@tringwald tringwald requested a review from a team as a code owner June 17, 2024 17:31
@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch from fdd1b56 to 21b653c Compare June 19, 2024 15:03
@tringwald
Copy link
Collaborator Author

I finally found a way to satisfy all the tests. Do you want to have another look at the changes @eqy? Unfortunately, the allocator_fuzz test resulted in an OOM error, so I had to lower the allocation size.

@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch from be24f90 to b49511c Compare June 25, 2024 16:35
@tringwald tringwald added the release notes: cuda release notes category label Jun 25, 2024
@tringwald
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 25, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@tringwald
Copy link
Collaborator Author

tringwald commented Jun 26, 2024

Can you have a look at this? @malfet
The changes are BC-breaking for certain random function calls, i.e. the sequence of generated numbers will be different. Should we add a UserWarning for this?

@tringwald tringwald force-pushed the cuda-randint-randomness-for-large-range branch from b49511c to b0b9064 Compare July 5, 2024 19:20
@tringwald
Copy link
Collaborator Author

I need additional approval for this PR. Could either of you take a look at this? @lezcano @Skylion007

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but didn't review. @eqy did.

@lezcano
Copy link
Collaborator

lezcano commented Jul 13, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
Fixes pytorch#125224

For large ranges, calls to CUDA `randint` use a different `unroll_factor` to generate random ints. This `unroll_factor` was not considered correctly in the calculation of the Philox offsets. Thus, some of the random states were reused, resulting in lower entropy (see pytorch#125224).

This also affects multiple other random functions, such as `torch.rand` and `torch.randn`.
Pull Request resolved: pytorch#126066
Approved by: https://github.com/eqy, https://github.com/lezcano
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: cuda release notes category release notes: linalg_frontend release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strange behavior of randint using device=cuda
6 participants