-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: spatial: long hangs using Qhull in SciPy 1.13.0, in Colour CI #20613
Comments
No way it can explain the runtime change from 15min to 4 hrs. Something else must have changed. I'd first verify the "new" run time is stable, then try to replicate locally. Then bisect. |
I will try to check if Github changed anything on their runners, I pointed
to Scipy as the culprit because it is the obvious change and this happens
in QHull by the looks of it but it could also be hardware related.
Thomas Mansencal - colour-science.org <https://www.colour-science.org> -
thomasmansencal.com <https://www.thomasmansencal.com>
…On Tue, 30 Apr 2024 at 12:39 PM, Evgeni Burovski ***@***.***> wrote:
Maybe gh-20337 <#20337>?
No way it can explain the runtime change from 15min to 4 hrs.
Something else must have changed. I'd first verify the "new" run time is
stable, then try to replicate locally. Then bisect.
—
Reply to this email directly, view it on GitHub
<#20613 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAYLQ57QVN4X3KAL3V6SUTY73R3XAVCNFSM6AAAAABG6542ZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBTHE3DGOJYGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Confirming it is coming from 1.13.0, I banned the version and did a fresh build in ~15mins using 1.12.0: https://github.com/colour-science/colour/actions/runs/8890511725/job/24410682802 |
|
Hey @ev-br, It is Windows only and I cannot reproduce locally as I'm developing on macOs. |
Just to confirm: with scipy 1.13 the Windows CI run takes 4hrs and MacOS and Linux runs take 15 mins, correct? Also, with scipy 1.12, all of them are 15 mins? It's likely much faster to debug in a Windows VM. If only to see if the issue is specific to the GH runners or windows in general. |
Correct yes! |
Is this a wheel install or you are building it yourself in the CI? |
It is a wheel install, Scipy is one of our dependencies. |
Right. To clarify, the installation is fast, but the test suite takes 4 hours. |
There was a substantial bump in vendored OpenBLAS version between |
Agreed that OpenBLAS version is the prime candidate. We have nightlies that were just upgraded from |
Hello, I built a repro here: https://github.com/KelSolaar/qhull-regression Doing so I narrowed down the issue to what I think is the concurrent use of QHull during distributed unit tests with |
Ah, that is really helpful. That should then have been fixed by gh-20619 (EDIT: or a newer OpenBLAS version in the 1.14.0 wheels). Could you try with |
Are there any wheels on test.pypi or alike? |
They're on PyPI, like all beta/RC releases usually are. You can install with |
I tried with 1.14.0 and no dice: https://github.com/KelSolaar/qhull-regression/actions/runs/9675386482/job/26692806365 |
Hello @KelSolaar, Would you mind trying setting the environment variable OMP_NUM_THREADS=1 within your Windows CI? Let me explain why I think this might help. I tried running your example on my Windows computer, and I noticed something odd - I had limited pytest to 2 cores with To check this, I tried setting OMP_NUM_THREADS to 1, and it ran much faster:
Alternatively, if setting OMP_NUM_THREADS to 1 globally is not an option, an alternative would be to use threadpoolctl, which is a library which can use context managers to turn thread-level parallelism on and off within particular context managers. This could be used to selectively disable parallelism within particular tests, or even within parts of your library where parallelism is known to not help. Example of one of your tests adapted to use threadpoolctl:
By changing both of your tests to limit thread-level parallelism, I get similar performance to the OMP_NUM_THREADS option. Could you try one of those, and report whether that helps? |
Hey @nickodell, Thanks a ton! Confirming that setting the environment variable worked:
I think that in our use case, it makes sense to disable OMP parallelism globally for the unit tests. I'm not sure where this information should go and if it is already in the documentation but if not, it might be worth considering adding somewhere as this is a gold nugget! |
I'm not sure if it's documented anywhere. I know about it because I was bitten by this on a previous project. The closest docs page is probably Debugging linear algebra related issues. There's also the upstream documentation, although that's a little light on why you'd want to restrict the thread count. |
Describe your issue.
Hello,
We have seen our Windows Github Actions build times jumping from 15m to 4h15m after what seems an update from Scipy 1.12.0 to Scipy 1.13.0:
15m: https://github.com/colour-science/colour/actions/runs/8489605040/job/23259884769
4h15m: https://github.com/colour-science/colour/actions/runs/8845206377/job/24288653119
All the functions affected are touching Qhull via our
colour.volume.is_within_mesh_volume
definition:I saw a reference to a Qhull related change in the mailing list here: https://mail.python.org/archives/list/[email protected]/thread/V25I4CIN556ZKKBIA2Q2OLVFGVYVYDEB/ but not the release notes on Github.
Any pointers would be appreciated, we will probably ban this version.
Cheers,
Thomas
Reproducing Code Example
Error message
SciPy/NumPy/Python version and system information
The text was updated successfully, but these errors were encountered: