Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating CuArray{Tracker.TrackedReal{Float64},1} a few times causes segfaults #121

Closed
tkf opened this issue Aug 20, 2019 · 3 comments
Closed
Labels
bug Something isn't working cuda array Stuff about CuArray.

Comments

@tkf
Copy link
Contributor

tkf commented Aug 20, 2019

I'm not sure if I should post it in Flux/Tracker or CuArrays. Please let me know if I should report it there.

Describe the bug

Executing sum(CuArray(param.(ones(10)))) a few times (typically 2 or 3) causes segfaults. It does not happen if I run GC.enable(false) first.

To Reproduce

julia> using CuArrays, Flux

julia> sum(CuArray(param.(ones(10))))
10.0 (tracked)

julia> sum(CuArray(param.(ones(10))))
10.0 (tracked)

julia> sum(CuArray(param.(ones(10))))
signal (11): Segmentation fault
in expression starting at no file:0
jl_subtype_env at /buildworker/worker/package_linux64/build/src/subtype.c:1174
abstract_call at ./compiler/abstractinterpretation.jl:577
abstract_eval_call at ./compiler/abstractinterpretation.jl:805
abstract_eval at ./compiler/abstractinterpretation.jl:890
typeinf_local at ./compiler/abstractinterpretation.jl:1135
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1191
typeinf at ./compiler/typeinfer.jl:14
typeinf_ext at ./compiler/typeinfer.jl:576
typeinf_ext at ./compiler/typeinfer.jl:613
jfptr_typeinf_ext_1.clone_1 at /home/takafumi/opt/julia/julia-1.1.1/lib/julia/sys.so (unknown line)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1571 [inlined]
jl_type_infer at /buildworker/worker/package_linux64/build/src/gf.c:255
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1797 [inlined]
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1841
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
prompt! at ./logging.jl:320
run_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/LineEdit.jl:2268
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
run_frontend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:1035
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:192
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
#734 at ./client.jl:362
jfptr_#734_6048.clone_1 at /home/takafumi/opt/julia/julia-1.1.1/lib/julia/sys.so (unknown line)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1571 [inlined]
jl_f__apply at /buildworker/worker/package_linux64/build/src/builtins.c:556
jl_f__apply_latest at /buildworker/worker/package_linux64/build/src/builtins.c:594
#invokelatest#1 at ./essentials.jl:742 [inlined]
invokelatest at ./essentials.jl:741 [inlined]
run_main_repl at ./client.jl:346
exec_options at ./client.jl:284
_start at ./client.jl:436
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
unknown function (ip: 0x40191d)
unknown function (ip: 0x401523)
__libc_start_main at /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291
unknown function (ip: 0x4015c4)
Allocations: 49584959 (Pool: 49579106; Big: 5853); GC: 99

Expected behavior
I expect Julia to not die.

Build log

(v1.1) pkg> build CuArrays
  Building CUDAnative → `~/.julia/packages/CUDAnative/nItlk/deps/build.log`
  Building Conda ─────→ `~/.julia/packages/Conda/kLXeC/deps/build.log`
  Building FFTW ──────→ `~/.julia/packages/FFTW/2okGQ/deps/build.log`

shell> cat ~/.julia/packages/CUDAnative/nItlk/deps/build.log

Environment details (please complete this section)
Details on Julia:

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

(v1.1) pkg> status Flux
    Status `~/.julia/environments/v1.1/Project.toml`
  [1520ce14] AbstractTrees v0.2.1
  [79e6a3ab] Adapt v1.0.0
  [944b1d66] CodecZlib v0.5.2
  [5ae59095] Colors v0.9.5
  [587475ba] Flux v0.8.3
  [e5e0dc1b] Juno v0.7.0
  [1914dd2f] MacroTools v0.5.1
  [872c559c] NNlib v0.6.0
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v0.5.2
  [2913bbd2] StatsBase v0.31.0
  [9f7883ad] Tracker v0.2.2
  [8bb1440f] DelimitedFiles 

(v1.1) pkg> status CuArrays
    Status `~/.julia/environments/v1.1/Project.toml`
  [79e6a3ab] Adapt v1.0.0
  [be33ccc6] CUDAnative v2.2.1
  [3a865a2d] CuArrays v1.1.0
  [1914dd2f] MacroTools v0.5.1
  [872c559c] NNlib v0.6.0
  [ae029012] Requires v0.5.2

julia> println(read(joinpath(dirname(dirname(pathof(CUDAnative))), "deps", "ext.jl"), String))
# autogenerated file, do not edit
const libcudadevrt = "/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudadevrt.a"
const ptx_support = VersionNumber[v"3.2.0", v"4.0.0", v"4.1.0", v"4.2.0", v"4.3.0", v"5.0.0"]
const configured = true
const libdevice = Dict(v"3.0.0"=>"/usr/local/cuda-8.0/nvvm/libdevice/libdevice.compute_30.10.bc",v"3.5.0"=>"/usr/local/cuda-8.0/nvvm/libdevice/libdevice.compute_35.10.bc",v"5.0.0"=>"/usr/local/cuda-8.0/nvvm/libdevice/libdevice.compute_50.10.bc")
const nvdisasm = "/usr/local/cuda-8.0/bin/nvdisasm"
const target_support = VersionNumber[v"3.0.0", v"3.2.0", v"3.5.0", v"3.7.0", v"5.0.0", v"5.2.0", v"5.3.0", v"6.0.0", v"6.1.0", v"6.2.0"]
const cuda_driver_version = v"10.1.0"
const ptxas = "/usr/local/cuda-8.0/bin/ptxas"
@maleadt
Copy link
Member

maleadt commented Aug 23, 2019

I expect Julia to not die.

That's fair.

Can reproduce, happens on about the third call, probably during or right after a GC run.

I'd guess this is a GPU pointer leaking into the CPU GC -- but I'm not sure how that could happen, since we've really locked down all of those conversions (FluxML/Flux.jl#581).

@maleadt
Copy link
Member

maleadt commented Aug 23, 2019

Running with a gc debug build triggers an assertion, but no verification error:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.2.0 (2019-08-20)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> using CuArrays, Flux
[ Info: Recompiling stale cache file /home/tim/Julia/depot/compiled/v1.2/CuArrays/7YFE0.ji for CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
[ Info: Recompiling stale cache file /home/tim/Julia/depot/compiled/v1.2/Flux/QdkVy.ji for Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]

julia> sum(CuArray(param.(ones(10))))
julia: /home/tim/Julia/julia-1.2/src/gc.c:1127: jl_value_t *jl_gc_pool_alloc(jl_ptls_t, int, int): Assertion `pg->osize == p->osize' failed.

@maleadt maleadt transferred this issue from JuliaGPU/CuArrays.jl May 27, 2020
@maleadt maleadt added bug Something isn't working cuda array Stuff about CuArray. labels May 27, 2020
@maleadt
Copy link
Member

maleadt commented Apr 27, 2024

Flux and CUDA.jl have changed a lot since this issue was filed, so going to close this as stale. Feel free to open a new issue if this still happens.

@maleadt maleadt closed this as completed Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda array Stuff about CuArray.
Projects
None yet
Development

No branches or pull requests

2 participants