Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy seeding #100

Closed
rgemulla opened this issue May 19, 2020 · 5 comments
Closed

NumPy seeding #100

rgemulla opened this issue May 19, 2020 · 5 comments
Assignees

Comments

@rgemulla
Copy link
Member

bdb078f looks broken. Now every worker seems to get the same seed and thus the same sequence of random numbers when a manual seed is specified. This needs to be prevented, e.g., by reseeding based on current seed and worker number.

@samuelbroscheit
Copy link
Member

samuelbroscheit commented May 19, 2020

The proposed solution relies on the assumption that worker#k always gets the same input which relies on the OS scheduling and speed by which workers are processing their tasks which would not be the same on different machines and in different runs, right? This would break the main purpose of manual seeding (for me) is to observe regressions between commits. Because then between two runs, workers with diff seeds get different inputs.

Another question, why was the the commit that broke manual seeding (9f9e12f) only seeding numpy and not python's and torch's RNG as well?

@rgemulla
Copy link
Member Author

torch takes care of reseedings its workers, but numpy does not. Using the same seed at every worker is highly flawed: it produces biased results. If reproducabilty is key, use only one worker. (Even with this flawed approach, there is no guarantee of equal outputs as workers may have different speeds.)

samuelbroscheit added a commit that referenced this issue May 20, 2020
@rgemulla
Copy link
Member Author

I pushed a small change such that workers use different seeds across different data loaders (6afbf8d). Please close if this addresses the issue.

@samuelbroscheit
Copy link
Member

I had already adressed it. Your commit had a few flaws, see comments in commit.

@rgemulla
Copy link
Member Author

Thanks, now applied to all data loaders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants