-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NumPy seeding #100
Comments
The proposed solution relies on the assumption that worker#k always gets the same input which relies on the OS scheduling and speed by which workers are processing their tasks which would not be the same on different machines and in different runs, right? This would break the main purpose of manual seeding (for me) is to observe regressions between commits. Because then between two runs, workers with diff seeds get different inputs. Another question, why was the the commit that broke manual seeding (9f9e12f) only seeding numpy and not python's and torch's RNG as well? |
torch takes care of reseedings its workers, but numpy does not. Using the same seed at every worker is highly flawed: it produces biased results. If reproducabilty is key, use only one worker. (Even with this flawed approach, there is no guarantee of equal outputs as workers may have different speeds.) |
I pushed a small change such that workers use different seeds across different data loaders (6afbf8d). Please close if this addresses the issue. |
I had already adressed it. Your commit had a few flaws, see comments in commit. |
Thanks, now applied to all data loaders. |
bdb078f looks broken. Now every worker seems to get the same seed and thus the same sequence of random numbers when a manual seed is specified. This needs to be prevented, e.g., by reseeding based on current seed and worker number.
The text was updated successfully, but these errors were encountered: