Fixed CUDA randint generation for large ranges. #126066

tringwald · 2024-05-13T13:38:55Z

For large ranges, calls to CUDA randint use a different unroll_factor to generate random ints. This unroll_factor was not considered correctly in the calculation of the Philox offsets. Thus, some of the random states were reused, resulting in lower entropy (see #125224).

This also affects multiple other random functions, such as torch.rand and torch.randn.

pytorch-bot · 2024-05-13T13:38:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126066

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (6 Unrelated Failures)

As of commit b0b9064 with merge base 3d56673 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral) (gh) (similar failure)
test_testing.py::TestImports::test_not_import_sympy

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh) (trunk failure)
test_overrides.py::TestTorchFunctionWarning::test_warn_on_invalid_torch_function
pull / linux-focal-py3.12-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh) (trunk failure)
test_overrides.py::TestTorchFunctionWarning::test_warn_on_invalid_torch_function
pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh) (trunk failure)
test_overrides.py::TestTorchFunctionWarning::test_warn_on_invalid_torch_function
trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh) (trunk failure)
test_overrides.py::TestTorchFunctionWarning::test_warn_on_invalid_torch_function

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 1, 3, linux.2xlarge, unstable) (gh)
test_overrides.py::TestTorchFunctionWarning::test_warn_on_invalid_torch_function

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_cuda.py

aten/src/ATen/native/cuda/DistributionTemplates.h

tringwald · 2024-05-13T14:26:12Z

@r-barnes Thanks for reviewing, I added some type annotations and changed the C++ parameters to const.

test/test_cuda.py

tringwald · 2024-05-14T07:35:03Z

@pytorchbot rebase

pytorchmergebot · 2024-05-14T07:36:35Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-05-14T07:36:40Z

Successfully rebased cuda-randint-randomness-for-large-range onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda-randint-randomness-for-large-range && git pull --rebase)

eqy · 2024-05-19T19:17:02Z

CC @drisspg who might know more about the SDPA tests

tringwald · 2024-05-19T20:54:10Z

Thanks @eqy. Those tests in test_transformers.py use torch._fill_mem_eff_dropout_mask_, which in turn calls a custom CUDA kernel to populate the dropout mask with uniform values before thresholding. I'm not sure why we don't use torch.rand there, but it seems like replacing the custom impl with torch.rand yields some weird test failures.
I've rolled back the test changes for now, so I can more easily debug the other failures, but we should probably reconsider if we need a custom rand impl for those tests.

tringwald · 2024-06-19T20:16:12Z

I finally found a way to satisfy all the tests. Do you want to have another look at the changes @eqy? Unfortunately, the allocator_fuzz test resulted in an OOM error, so I had to lower the allocation size.

aten/src/ATen/native/cuda/DistributionTemplates.h

tringwald · 2024-06-25T19:09:09Z

@pytorchbot merge

pytorchmergebot · 2024-06-25T19:10:49Z

Merge failed

Reason: Approvers from one of the following sets are needed:

superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

tringwald · 2024-06-26T14:26:13Z

Can you have a look at this? @malfet
The changes are BC-breaking for certain random function calls, i.e. the sequence of generated numbers will be different. Should we add a UserWarning for this?

…ons. Added tests for overlapping torch.rand/torch.randn random states.

tringwald · 2024-07-13T16:56:47Z

I need additional approval for this PR. Could either of you take a look at this? @lezcano @Skylion007

lezcano

Approving but didn't review. @eqy did.

lezcano · 2024-07-13T21:35:02Z

@pytorchbot merge

pytorchmergebot · 2024-07-13T21:36:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#125224 For large ranges, calls to CUDA `randint` use a different `unroll_factor` to generate random ints. This `unroll_factor` was not considered correctly in the calculation of the Philox offsets. Thus, some of the random states were reused, resulting in lower entropy (see pytorch#125224). This also affects multiple other random functions, such as `torch.rand` and `torch.randn`. Pull Request resolved: pytorch#126066 Approved by: https://github.com/eqy, https://github.com/lezcano

tringwald requested a review from eqy as a code owner May 13, 2024 13:38

pytorchbot added the open source label May 13, 2024

r-barnes reviewed May 13, 2024

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

test/test_cuda.py Outdated Show resolved Hide resolved

aten/src/ATen/native/cuda/DistributionTemplates.h Outdated Show resolved Hide resolved

tringwald mentioned this pull request May 13, 2024

Strange behavior of randint using device=cuda #125224

Closed

tringwald force-pushed the cuda-randint-randomness-for-large-range branch from 3ea6988 to 849bf9e Compare May 13, 2024 21:26

eqy reviewed May 14, 2024

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

eqy approved these changes May 14, 2024

View reviewed changes

pytorchmergebot force-pushed the cuda-randint-randomness-for-large-range branch from b09c3f1 to cb7925c Compare May 14, 2024 07:36

tringwald force-pushed the cuda-randint-randomness-for-large-range branch 4 times, most recently from 303b76e to 0a7226b Compare May 18, 2024 20:05

tringwald force-pushed the cuda-randint-randomness-for-large-range branch from ada1975 to 993afca Compare June 8, 2024 13:28

tringwald requested review from lezcano, nikitaved and IvanYashchuk as code owners June 8, 2024 21:36

pytorch-bot bot added the release notes: linalg_frontend release notes category label Jun 8, 2024

tringwald force-pushed the cuda-randint-randomness-for-large-range branch 4 times, most recently from c6a34be to 04e4e82 Compare June 15, 2024 19:00

tringwald requested a review from a team as a code owner June 17, 2024 17:31

tringwald force-pushed the cuda-randint-randomness-for-large-range branch from fdd1b56 to 21b653c Compare June 19, 2024 15:03

eqy approved these changes Jun 20, 2024

View reviewed changes

aten/src/ATen/native/cuda/DistributionTemplates.h Show resolved Hide resolved

tringwald force-pushed the cuda-randint-randomness-for-large-range branch from be24f90 to b49511c Compare June 25, 2024 16:35

tringwald added the release notes: cuda release notes category label Jun 25, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 25, 2024

pytorchmergebot added the merging label Jun 25, 2024

pytorchmergebot removed the merging label Jun 25, 2024

tringwald added 2 commits July 5, 2024 21:20

Fixed RNG offset and unroll factor calculation for certain distributi…

7b75448

…ons. Added tests for overlapping torch.rand/torch.randn random states.

Lower memory allocations in test_allocator_fuzz to prevent OOM.

b0b9064

tringwald force-pushed the cuda-randint-randomness-for-large-range branch from b49511c to b0b9064 Compare July 5, 2024 19:20

lezcano approved these changes Jul 13, 2024

View reviewed changes

pytorchmergebot added the merging label Jul 13, 2024

pytorchmergebot added the Merged label Jul 13, 2024

pytorchmergebot closed this in e5de258 Jul 13, 2024

pytorchmergebot removed the merging label Jul 13, 2024

henrylhtsang mentioned this pull request Jul 17, 2024

[aoti] Unskip some aot inductor tests #130973

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed CUDA randint generation for large ranges. #126066

Fixed CUDA randint generation for large ranges. #126066

tringwald commented May 13, 2024 •

edited

Loading

pytorch-bot bot commented May 13, 2024 •

edited

Loading

tringwald commented May 13, 2024

tringwald commented May 14, 2024

pytorchmergebot commented May 14, 2024

pytorchmergebot commented May 14, 2024

eqy commented May 19, 2024

tringwald commented May 19, 2024 •

edited

Loading

tringwald commented Jun 19, 2024

tringwald commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

tringwald commented Jun 26, 2024 •

edited

Loading

tringwald commented Jul 13, 2024

lezcano left a comment

lezcano commented Jul 13, 2024

pytorchmergebot commented Jul 13, 2024

Fixed CUDA randint generation for large ranges. #126066

Fixed CUDA randint generation for large ranges. #126066

Conversation

tringwald commented May 13, 2024 • edited Loading

pytorch-bot bot commented May 13, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126066

✅ You can merge normally! (6 Unrelated Failures)

tringwald commented May 13, 2024

tringwald commented May 14, 2024

pytorchmergebot commented May 14, 2024

pytorchmergebot commented May 14, 2024

eqy commented May 19, 2024

tringwald commented May 19, 2024 • edited Loading

tringwald commented Jun 19, 2024

tringwald commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

Merge failed

tringwald commented Jun 26, 2024 • edited Loading

tringwald commented Jul 13, 2024

lezcano left a comment

Choose a reason for hiding this comment

lezcano commented Jul 13, 2024

pytorchmergebot commented Jul 13, 2024

Merge started

tringwald commented May 13, 2024 •

edited

Loading

pytorch-bot bot commented May 13, 2024 •

edited

Loading

tringwald commented May 19, 2024 •

edited

Loading

tringwald commented Jun 26, 2024 •

edited

Loading