NumPy seeding #100

rgemulla · 2020-05-19T13:25:38Z

bdb078f looks broken. Now every worker seems to get the same seed and thus the same sequence of random numbers when a manual seed is specified. This needs to be prevented, e.g., by reseeding based on current seed and worker number.

The text was updated successfully, but these errors were encountered:

samuelbroscheit · 2020-05-19T14:00:37Z

The proposed solution relies on the assumption that worker#k always gets the same input which relies on the OS scheduling and speed by which workers are processing their tasks which would not be the same on different machines and in different runs, right? This would break the main purpose of manual seeding (for me) is to observe regressions between commits. Because then between two runs, workers with diff seeds get different inputs.

Another question, why was the the commit that broke manual seeding (9f9e12f) only seeding numpy and not python's and torch's RNG as well?

rgemulla · 2020-05-19T14:04:14Z

torch takes care of reseedings its workers, but numpy does not. Using the same seed at every worker is highly flawed: it produces biased results. If reproducabilty is key, use only one worker. (Even with this flawed approach, there is no guarantee of equal outputs as workers may have different speeds.)

rgemulla · 2020-05-22T09:52:57Z

I pushed a small change such that workers use different seeds across different data loaders (6afbf8d). Please close if this addresses the issue.

samuelbroscheit · 2020-05-22T10:19:23Z

I had already adressed it. Your commit had a few flaws, see comments in commit.

rgemulla · 2020-05-22T10:41:10Z

Thanks, now applied to all data loaders.

rgemulla assigned samuelbroscheit May 19, 2020

samuelbroscheit added a commit that referenced this issue May 20, 2020

Adress issue #100

34045a0

samuelbroscheit closed this as completed May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NumPy seeding #100

NumPy seeding #100

rgemulla commented May 19, 2020

samuelbroscheit commented May 19, 2020 •

edited

Loading

rgemulla commented May 19, 2020

rgemulla commented May 22, 2020

samuelbroscheit commented May 22, 2020

rgemulla commented May 22, 2020

NumPy seeding #100

NumPy seeding #100

Comments

rgemulla commented May 19, 2020

samuelbroscheit commented May 19, 2020 • edited Loading

rgemulla commented May 19, 2020

rgemulla commented May 22, 2020

samuelbroscheit commented May 22, 2020

rgemulla commented May 22, 2020

samuelbroscheit commented May 19, 2020 •

edited

Loading