Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MPS] Fast math env var #129007

Closed
wants to merge 5 commits into from
Closed

[MPS] Fast math env var #129007

wants to merge 5 commits into from

Conversation

qqaatw
Copy link
Collaborator

@qqaatw qqaatw commented Jun 18, 2024

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jun 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129007

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4a15373 with merge base 9a7e251 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Jun 18, 2024
[ghstack-poisoned]
@qqaatw qqaatw marked this pull request as ready for review June 18, 2024 23:17
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind landing it separately from the rest of the stack?
Also, it would be nice to do it not just thru the environment variable, but via property as well, something like torch.backends.mps.use_fast_math?

Also, I feel like we are at the point when it's probably time to start compiling those offline rather than during first use of the op, so it would not be helpful anyway

@qqaatw
Copy link
Collaborator Author

qqaatw commented Jun 19, 2024

Do you mind landing it separately from the rest of the stack? Also, it would be nice to do it not just thru the environment variable, but via property as well, something like torch.backends.mps.use_fast_math?

Also, I feel like we are at the point when it's probably time to start compiling those offline rather than during first use of the op, so it would not be helpful anyway

Hmm where do we usually store this kind of state, in python or c++? can I use the environment variable as proxy (ground truth) such that the property and environment variable are synchronized?

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@qqaatw
Copy link
Collaborator Author

qqaatw commented Jun 23, 2024

@malfet if we plan to compile the kernels offline, let's not put much effort on adding the property torch.backends.mps.use_fast_math here and get it merge as is? at least serves as a temporary option for users.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@malfet
Copy link
Contributor

malfet commented Jun 25, 2024

@pytorchbot merge -f "Lint + MPS are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jun 26, 2024
This PR generalizes the multi_tensor_apply function for other fused optimizers

Pull Request resolved: #129105
Approved by: https://github.com/malfet
ghstack dependencies: #129006, #129008, #129007
pytorchmergebot pushed a commit that referenced this pull request Jun 27, 2024
```
[-------------------------------------- Fused SGD --------------------------------------]
                                                          |  Fused: True  |  Fused: False
1 threads: ------------------------------------------------------------------------------
      numel: 1024, num_tensors: 100, momentum: True       |        2      |       15
      numel: 1024, num_tensors: 100, momentum: False      |        2      |        5
      numel: 65536, num_tensors: 100, momentum: True      |        3      |       16
      numel: 65536, num_tensors: 100, momentum: False     |        2      |        5
      numel: 1048576, num_tensors: 100, momentum: True    |       11      |       16
      numel: 1048576, num_tensors: 100, momentum: False   |        8      |        6
      numel: 1024, num_tensors: 500, momentum: True       |       29      |       70
      numel: 1024, num_tensors: 500, momentum: False      |       20      |       24
      numel: 65536, num_tensors: 500, momentum: True      |       33      |       76
      numel: 65536, num_tensors: 500, momentum: False     |       22      |       26
      numel: 1048576, num_tensors: 500, momentum: True    |       70      |       80
      numel: 1048576, num_tensors: 500, momentum: False   |       43      |       40
      numel: 1024, num_tensors: 1000, momentum: True      |      108      |      139
      numel: 1024, num_tensors: 1000, momentum: False     |       72      |       48
      numel: 65536, num_tensors: 1000, momentum: True     |      116      |      150
      numel: 65536, num_tensors: 1000, momentum: False    |       77      |       52
      numel: 1048576, num_tensors: 1000, momentum: True   |      190      |      170
      numel: 1048576, num_tensors: 1000, momentum: False  |      120      |       50
```

```python
def profile_fused_sgd():
    from torch.optim.sgd import sgd
    import torch.utils.benchmark as benchmark

    import itertools

    def profile(fn, params, grads, momentum_buffer_list, fused):
        fn(
            params,
            grads,
            momentum_buffer_list,
            momentum=True if len(momentum_buffer_list) > 0 else False,
            dampening=0.0,
            nesterov=False,
            foreach=False,
            fused=fused,
            lr=1e-3,
            weight_decay=.0,
            maximize=False,
            grad_scale=None,
            found_inf=None,
        )
        torch.mps.synchronize()

    device = "mps"

    results = []

    for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]):
        sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}"
        print(sublabel)
        params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)]
        momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else []
        fn = sgd

        for fused in [True, False]:

            t = benchmark.Timer(
                    stmt='profile(fn, params, grads, momentum_buffer_list, fused)',
                    label='Fused SGD',
                    sub_label=sublabel,
                    globals=locals(),
                    description= f"Fused: {fused}",
                ).blocked_autorange(min_run_time=5)
            results.append(t)

    compare = benchmark.Compare(results)
    compare.trim_significant_figures()
    compare.colorize(rowwise=True)
    compare.print()
```
Pull Request resolved: #129350
Approved by: https://github.com/janeyx99
ghstack dependencies: #129006, #129008, #129007, #129105
@github-actions github-actions bot deleted the gh/qqaatw/18/head branch July 26, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/mps Run MPS tests (subset of trunk) Merged open source release notes: mps Release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants