Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gettext] fix flaky test on windows #11940

Merged
merged 47 commits into from
Feb 24, 2024

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Feb 4, 2024

This is a PR for triggering the CI/CD. I'll try to see if it changes anything (and to check if a special statement caused the regular issues). So I'll likely leave that PR open and try to retrigger CI/CD a few times to make it fail.

@jayaddison
Copy link
Contributor

Something I've been wondering about: is the .mo file genuinely a source dependency of the project during rebuilds?

It seems like the failing test is trying to say that "no, modifying the .mo file should not result in detection of files-to-update".

But I think that some of the logic in the codebase (notably here) does seem to consider mo files as dependencies (at least when the builder is configured accordingly).

@picnixz
Copy link
Member Author

picnixz commented Feb 4, 2024

Well the issue is that it sometimes works and not on Windows. I suspect that it's because when we are querying the last modification time, something happens. So I put some debug prints here and there to check.

As for the dependency, I don't know but I think yes.

@picnixz
Copy link
Member Author

picnixz commented Feb 4, 2024

Ok, so I know what happens:

  • When we read the documents, we put some timestamp to indicate when it was read. We use time.time_ns() // 1000 which gives us a lower-bound on the real time since EPOCH in microseconds. In particular, the time of the document is either the one that we stored or something a bit later.

  • When we want to test for the last modification time, we extract the closest microsecond of the last modification time as given by the OS in nanoseconds. So, we do -(ns // -1000).

Now, the failure that I could extract has a time.time_ns() equal to T = 1707062261168486400. So, we will store T // 1000 = 1707062261168486. The exact value would be 1707062261168486.4. Now, if we compare it to a document for which os.stat(file).st_mtime_ns also returns T, we actually compute T' = -(T // -1000) = 1707062261168487 and here is the little issue.

It should be fine to err for the release build because better be safe than sorry. But for the tests, since those failures appear quite often, we always get a notification on the fact that the workflow fails. Also, people might get confused because they don't know whether what they did affected this component or not sometimes. So I suggest that, for the tests only, we could mock time.time_ns for the specific tests and only for Windows platforms.

Actually the whole issue stems from the fact that time.time_ns() has a resolution of 318 microseconds on Windows vs 84ns on Unix-based systems. That's likely the reason why my two files end up having the same st_mtime_ns value but not on Unix (since they are effectively created at different points in time). If Windows had a good time.time_ns() resolution, then this issue would likely disappear.

@picnixz
Copy link
Member Author

picnixz commented Feb 4, 2024

Closing since I'm done with my investigation.

@picnixz
Copy link
Member Author

picnixz commented Feb 10, 2024

So let's work again on that one.

@picnixz picnixz reopened this Feb 10, 2024
@picnixz
Copy link
Member Author

picnixz commented Feb 10, 2024

I think it now works?

@picnixz
Copy link
Member Author

picnixz commented Feb 10, 2024

Merging this one should technically (and I hope so) fix the future issues on Windows that we had.

@jayaddison
Copy link
Contributor

FYI: I've got a branch where I've attempted to keep all the 'fix' parts of this, while filtering-out the refactor-like changes.

That helped me to understand that the key parts are both the time-mocking and also the very-specific modified-time patch after the BOM file has been written but before it is re-read.

It's currently visible for comparison at: master...jayaddison:sphinx:issue-11941/pr-11940-review-experimentation#files_bucket (although I'll probably remove that branch at some point)

I did also make one or two changes during that - nothing too significant. I'll try to remember to provide the relevant things as code-review feedback.

This was referenced Feb 11, 2024
@picnixz
Copy link
Member Author

picnixz commented Feb 24, 2024

I think I'll merge this one next since it's not hard to revert it if there are issues. It's not blocking per se but at least it would reduce the number of random CI/CD failures.

@picnixz picnixz merged commit 707bfbd into sphinx-doc:master Feb 24, 2024
22 checks passed
@picnixz picnixz deleted the fix/windows-test-gettext branch February 24, 2024 12:34
@jayaddison
Copy link
Contributor

Nice work @picnixz!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 18, 2024
@AA-Turner AA-Turner added this to the 7.3.0 milestone Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants