-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run test_mps_allocator_module serially #129340
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129340
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit ee3131d with merge base c9dc988 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot merge -f 'Bypass Windows queue' |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m 'The test is now failing again in trunk after a day or so of staying green, we need to continue the investigation' -c weird An example failure https://hud.pytorch.org/pytorch/pytorch/commit/dd00f5e78d44f55595728ef03018c01296a31ec9 |
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit c888ee3. Reverted #129340 on behalf of https://github.com/huydhn due to The test is now failing again in trunk after a day or so of staying green, we need to continue the investigation ([comment](#129340 (comment)))
@huydhn your PR has been successfully reverted. |
Yes especially the use of The test looks very strange though:
If the "before" must be zero, why the use in the 2nd assert? I'd say the first assert is the issue: Whether or not |
Because |
It's weird, but it is not, this test is just incompatible with parallel run mode, as it assumes now MPS allocation happens between those lines: Lines 7835 to 7838 in 92be340
But if another test runs in parallel, we can not ensure that memory allocation would be zero at that point. Let's just add |
Based on this #129340 (comment) I.e. - `assertTrue(x == y)` -> `assertEqual(x, y) - `assertTrue(not x)` -> assertFalse(x)` - `assertTrue(x > y)` -> assertGreater(x, y)`
No I meant if you assert |
Based on this #129340 (comment) I.e. - `assertTrue(x == y)` -> `assertEqual(x, y) - `assertTrue(not x)` -> assertFalse(x)` - `assertTrue(x > y)` -> assertGreater(x, y)` Pull Request resolved: #129569 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007
Merge startedYour change will be merged while ignoring the following 3 checks: trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral), trunk / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral), trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
test/test_mps.py
Outdated
@@ -7829,7 +7829,7 @@ def test_device_synchronize(self): | |||
x.backward(torch.randn_like(x)) | |||
torch.mps.synchronize() | |||
|
|||
@unittest.expectedFailure | |||
@serialTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serialTest | |
@serialTest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, this makes me think I need to to check if the test is really running (in the PR)
Merge failedReason: Comment with id 2200612866 not found Details for Dev Infra teamRaised by workflow job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please undo kineto submodule update
@pytorchbot merge -f 'MPS tests have passed' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Not sure why this test starts to fail (maybe runner update) https://hud.pytorch.org/hud/pytorch/pytorch/8a2fed7e6ab4cb6a97d92f6ca14f257370ec3a92/1?per_page=50&name_filter=mps or why it was XFAIL in this old PR pytorch#97151, but the test is passing locally for me now Pull Request resolved: pytorch#129340 Approved by: https://github.com/kit1980, https://github.com/malfet
Not sure why this test starts to fail (maybe runner update) https://hud.pytorch.org/hud/pytorch/pytorch/8a2fed7e6ab4cb6a97d92f6ca14f257370ec3a92/1?per_page=50&name_filter=mps or why it was XFAIL in this old PR #97151, but the test is passing locally for me now