[MPS] Fast math env var #129007

qqaatw · 2024-06-18T22:29:09Z

Stack from ghstack (oldest at bottom):

Allow users to decide whether they want to have fast math enabled via env var

[ghstack-poisoned]

pytorch-bot · 2024-06-18T22:29:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129007

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4a15373 with merge base 9a7e251 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

malfet

Do you mind landing it separately from the rest of the stack?
Also, it would be nice to do it not just thru the environment variable, but via property as well, something like torch.backends.mps.use_fast_math?

Also, I feel like we are at the point when it's probably time to start compiling those offline rather than during first use of the op, so it would not be helpful anyway

qqaatw · 2024-06-19T23:40:35Z

Do you mind landing it separately from the rest of the stack? Also, it would be nice to do it not just thru the environment variable, but via property as well, something like torch.backends.mps.use_fast_math?

Also, I feel like we are at the point when it's probably time to start compiling those offline rather than during first use of the op, so it would not be helpful anyway

Hmm where do we usually store this kind of state, in python or c++? can I use the environment variable as proxy (ground truth) such that the property and environment variable are synchronized?

[ghstack-poisoned]

qqaatw · 2024-06-23T23:53:53Z

@malfet if we plan to compile the kernels offline, let's not put much effort on adding the property torch.backends.mps.use_fast_math here and get it merge as is? at least serves as a temporary option for users.

malfet

Sure

malfet · 2024-06-25T13:50:23Z

@pytorchbot merge -f "Lint + MPS are green"

pytorchmergebot · 2024-06-25T13:51:58Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR generalizes the multi_tensor_apply function for other fused optimizers Pull Request resolved: #129105 Approved by: https://github.com/malfet ghstack dependencies: #129006, #129008, #129007

``` [-------------------------------------- Fused SGD --------------------------------------] | Fused: True | Fused: False 1 threads: ------------------------------------------------------------------------------ numel: 1024, num_tensors: 100, momentum: True | 2 | 15 numel: 1024, num_tensors: 100, momentum: False | 2 | 5 numel: 65536, num_tensors: 100, momentum: True | 3 | 16 numel: 65536, num_tensors: 100, momentum: False | 2 | 5 numel: 1048576, num_tensors: 100, momentum: True | 11 | 16 numel: 1048576, num_tensors: 100, momentum: False | 8 | 6 numel: 1024, num_tensors: 500, momentum: True | 29 | 70 numel: 1024, num_tensors: 500, momentum: False | 20 | 24 numel: 65536, num_tensors: 500, momentum: True | 33 | 76 numel: 65536, num_tensors: 500, momentum: False | 22 | 26 numel: 1048576, num_tensors: 500, momentum: True | 70 | 80 numel: 1048576, num_tensors: 500, momentum: False | 43 | 40 numel: 1024, num_tensors: 1000, momentum: True | 108 | 139 numel: 1024, num_tensors: 1000, momentum: False | 72 | 48 numel: 65536, num_tensors: 1000, momentum: True | 116 | 150 numel: 65536, num_tensors: 1000, momentum: False | 77 | 52 numel: 1048576, num_tensors: 1000, momentum: True | 190 | 170 numel: 1048576, num_tensors: 1000, momentum: False | 120 | 50 ``` ```python def profile_fused_sgd(): from torch.optim.sgd import sgd import torch.utils.benchmark as benchmark import itertools def profile(fn, params, grads, momentum_buffer_list, fused): fn( params, grads, momentum_buffer_list, momentum=True if len(momentum_buffer_list) > 0 else False, dampening=0.0, nesterov=False, foreach=False, fused=fused, lr=1e-3, weight_decay=.0, maximize=False, grad_scale=None, found_inf=None, ) torch.mps.synchronize() device = "mps" results = [] for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]): sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}" print(sublabel) params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)] momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else [] fn = sgd for fused in [True, False]: t = benchmark.Timer( stmt='profile(fn, params, grads, momentum_buffer_list, fused)', label='Fused SGD', sub_label=sublabel, globals=locals(), description= f"Fused: {fused}", ).blocked_autorange(min_run_time=5) results.append(t) compare = benchmark.Compare(results) compare.trim_significant_figures() compare.colorize(rowwise=True) compare.print() ``` Pull Request resolved: #129350 Approved by: https://github.com/janeyx99 ghstack dependencies: #129006, #129008, #129007, #129105

Update

30418f0

[ghstack-poisoned]

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Jun 18, 2024

This was referenced Jun 18, 2024

[optim] Fused implementation stability table #129006

Closed

[Docs][MPS] Add mps environment variable table #129008

Closed

pytorchbot added the open source label Jun 18, 2024

Update

a4efcc5

[ghstack-poisoned]

qqaatw marked this pull request as ready for review June 18, 2024 23:17

qqaatw requested review from kulinseth and malfet as code owners June 18, 2024 23:17

malfet requested changes Jun 19, 2024

View reviewed changes

Update

42ff607

[ghstack-poisoned]

Update

b11e118

[ghstack-poisoned]

Update

4a15373

[ghstack-poisoned]

qqaatw mentioned this pull request Jun 20, 2024

[MPS][BE] Generalize Fused optimizers #129105

Closed

This was referenced Jun 24, 2024

[MPS] Fused SGD optimizer #129350

Closed

[MPS] Add tensor_lr overloads to fused adam & adamw #129451

Closed

malfet approved these changes Jun 25, 2024

View reviewed changes

pytorchmergebot added the merging label Jun 25, 2024

pytorchmergebot closed this in 71ebe51 Jun 25, 2024

pytorchmergebot added Merged and removed merging labels Jun 25, 2024

github-actions bot deleted the gh/qqaatw/18/head branch July 26, 2024 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MPS] Fast math env var #129007

[MPS] Fast math env var #129007

qqaatw commented Jun 18, 2024 •

edited

Loading

pytorch-bot bot commented Jun 18, 2024 •

edited

Loading

malfet left a comment •

edited

Loading

qqaatw commented Jun 19, 2024

qqaatw commented Jun 23, 2024 •

edited

Loading

malfet left a comment

malfet commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

[MPS] Fast math env var #129007

[MPS] Fast math env var #129007

Conversation

qqaatw commented Jun 18, 2024 • edited Loading

pytorch-bot bot commented Jun 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129007

✅ No Failures

malfet left a comment • edited Loading

Choose a reason for hiding this comment

qqaatw commented Jun 19, 2024

qqaatw commented Jun 23, 2024 • edited Loading

malfet left a comment

Choose a reason for hiding this comment

malfet commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

Merge started

qqaatw commented Jun 18, 2024 •

edited

Loading

pytorch-bot bot commented Jun 18, 2024 •

edited

Loading

malfet left a comment •

edited

Loading

qqaatw commented Jun 23, 2024 •

edited

Loading