-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running out of memory with julia v0.6
but not v0.5
#151
Comments
Could you please provide the output of |
|
Hm. I thought that the commit hash would be there. Could you please provide the exact commit you are on? |
|
Thanks. @amitmurthy I thought the leak was fixed with your latest commit. It would be great if you could take a look at this. |
I could run this example both on Julia master and 0.6 without any leaks with 4 workers and smaller sized darrays. No issues with the following n values over a 1000 iterations.
|
Thanks for the assistance :) n1,n2,n3=2001,2001,701
for i=1:16
dA=drand((n1,n2,n3),workers()[1:i]);dlA=similar(dA);test_gc(dA,dlA);
d_closeall();@everywhere gc();
println(i);
@fetchfrom 2 run(pipeline(`free`,stdout="bla",append=true))
end On
and.... out of memory On
runs... Without the function call
I don't really understand why this happens, but I hope this helps... |
I think it is a Julia issue rather than darray. Can you try with smaller n values but larger number of iterations? I could not detect any leaks locally with a 1000 iterations with n values |
For many iterations, the memory "oscillates"; it will only run out of memory when you strain the system enough with large for i=1:16
n1,n2,n3=2001,201,71
dA=drand((n1,n2,n3),workers()[1:i]);dlA=similar(dA);test_gc(dA,dlA);
d_closeall();@everywhere gc();
println(i);
@fetchfrom 2 run(`free`)
end
1
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 3569984 62415916 0 0 272016
From worker 2: -/+ buffers/cache: 3297968 62687932
From worker 2: Swap: 0 0 0
2
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 3844832 62141068 0 0 272112
From worker 2: -/+ buffers/cache: 3572720 62413180
From worker 2: Swap: 0 0 0
3
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 3889336 62096564 0 0 272228
From worker 2: -/+ buffers/cache: 3617108 62368792
From worker 2: Swap: 0 0 0
4
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 3918320 62067580 0 0 272336
From worker 2: -/+ buffers/cache: 3645984 62339916
From worker 2: Swap: 0 0 0
5
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 3955036 62030864 0 0 272420
From worker 2: -/+ buffers/cache: 3682616 62303284
From worker 2: Swap: 0 0 0
6
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4004760 61981140 0 0 272536
From worker 2: -/+ buffers/cache: 3732224 62253676
From worker 2: Swap: 0 0 0
7
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4146980 61838920 0 0 272632
From worker 2: -/+ buffers/cache: 3874348 62111552
From worker 2: Swap: 0 0 0
8
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4185032 61800868 0 0 272720
From worker 2: -/+ buffers/cache: 3912312 62073588
From worker 2: Swap: 0 0 0
9
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4242176 61743724 0 0 272808
From worker 2: -/+ buffers/cache: 3969368 62016532
From worker 2: Swap: 0 0 0
10
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4474444 61511456 0 0 272928
From worker 2: -/+ buffers/cache: 4201516 61784384
From worker 2: Swap: 0 0 0
11
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4573988 61411912 0 0 273028
From worker 2: -/+ buffers/cache: 4300960 61684940
From worker 2: Swap: 0 0 0
12
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4720300 61265600 0 0 273128
From worker 2: -/+ buffers/cache: 4447172 61538728
From worker 2: Swap: 0 0 0
13
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4730508 61255392 0 0 273224
From worker 2: -/+ buffers/cache: 4457284 61528616
From worker 2: Swap: 0 0 0
14
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4854036 61131864 0 0 273320
From worker 2: -/+ buffers/cache: 4580716 61405184
From worker 2: Swap: 0 0 0
15
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4749008 61236892 0 0 273416
From worker 2: -/+ buffers/cache: 4475592 61510308
From worker 2: Swap: 0 0 0
16
From worker 2: total used free shared buffers cached
From worker 2: Mem: 65985900 4782880 61203020 0 0 273504
From worker 2: -/+ buffers/cache: 4509376 61476524
From worker 2: Swap: 0 0 0 |
Yes, that is why I think this is an issue with Julia rather than DistributedArrays. Does calling |
nope... |
Reopen if relevant? |
Hello,
The code below is a minimal reproducible example that shows the behavior (the original code came up in an application I was writing). Sorry it is a bit convoluted, but on my machine it needed to be so to reproduce the error.
On
julia 0.5
the code runs without running out of memory, but not on0.6
I monitored the memory usage using
top
and what happens is the following:1- The total memory should be the same (about 60% on a 64 Gb node), split in
i
procs2- Each function call creates temporary arrays that need to be garbage collected
3- As the number of procs increases, the code runs faster as one would hope
4- For some reason, in
0.5
the memory de-allocation and garbage collection is faster than0.6
5- As a result, as memory is allocated for run
i
, residual memory from runsi-1,i-2,...
is still being deallocated6- Code runs out of memory...
I am not sure if this is expected behavior, or why
0.5
was more robust.p.s: I am on
master
forDistributedArrays
Cheers!
The text was updated successfully, but these errors were encountered: