Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot precompile GPU code with PrecompileTools #2006

Closed
beorostica opened this issue Jul 21, 2023 · 5 comments
Closed

Cannot precompile GPU code with PrecompileTools #2006

beorostica opened this issue Jul 21, 2023 · 5 comments
Labels
bug Something isn't working upstream Somebody else's problem.

Comments

@beorostica
Copy link

I'm using PrecompileTools for precompiling some functions that use CUDA in a repo I'm working on (KomaMRICore).
Particularly, when a precompile the "simulate()" function in my devolpement environment with the GPU enabled like so:

module KomaMRICore
...
@setup_workload begin
    obj = brain_phantom2D()
    sys = Scanner()
    seq = read_seq(joinpath(dirname(pathof(KomaMRICore)), "../../examples/3.koma_paper/comparison_accuracy/sequences/EPI/epi_100x100_TE100_FOV230.seq"))
    simParams = KomaMRICore.default_sim_params()
    simParams["gpu"] = true
    @compile_workload begin
        raw = simulate(obj, seq, sys; simParams)
    end
end
end

and then performing the same workload in the julia REPL of my development environment:

julia> using KomaMRICore
julia> obj = brain_phantom2D();
julia> sys = Scanner();
julia> seq = read_seq(joinpath(dirname(pathof(KomaMRICore)), "../../examples/3.koma_paper/comparison_accuracy/sequences/EPI/epi_100x100_TE100_FOV230.seq"));
julia> simParams = KomaMRICore.default_sim_params();
julia> simParams["gpu"] = true;
julia> raw = simulate(obj, seq, sys; simParams);
I have the following error "JIT session error: ..." after manipulating some data of the "raw" object:
julia> abs.(raw.profiles[1].data)
JIT session error: Symbols not found: [ __nv_hypotf ]
JIT session error: Symbols not found: [ __nv_hypotf ]
...
100×1 Matrix{Float32}:
  7.2798567
 19.101593
...

I get the error JIT session error: Symbols not found: [ __nv_hypotf ].

Note that this problem doesn't show up when the cpu is used instead of the gpu (by setting simParams["gpu"] = false).

This problem seem to be related with this issue #1870 CUDA, which was solved by adding directly some changes in the julia repo (which apparently is already part of 1.9.0-rc3, see issue #338 SnoopCompile, so it should work for julia 1.9.2, I'm not completely sure though).

Any suggestion for solving this or some pointers on how to continue to debug this issue?

@beorostica beorostica added the bug Something isn't working label Jul 21, 2023
@maleadt
Copy link
Member

maleadt commented Jul 22, 2023

cc @vchuravy

@beorostica beorostica changed the title Cannot precompile GPU code with SnoopPrecompile Cannot precompile GPU code with PrecompileTools Jul 24, 2023
@RomeoV
Copy link
Contributor

RomeoV commented Aug 5, 2023

Same here for a Flux.jl / Metalhead.jl precompilation on julia-1.10.0-beta1

module FastAIStartup
using FastAI, FastVision, Metalhead
import FastVision: RGB, N0f8

import PrecompileTools: @setup_workload, @compile_workload
@setup_workload begin
    labels = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
    @compile_workload begin
        data = ([rand(RGB{N0f8}, 32, 32) for _ in 1:100],
                [rand(labels) for _ in 1:100])
        blocks = (Image{2}(), FastAI.Label{String}(labels))
        task = ImageClassificationSingle(blocks)
        learner = tasklearner(task, data, backbone=ResNet(18).layers[1])
        fitonecycle!(learner, 2)
    end
end
end # module FastAIStartup
See the stacktrace
(FastAIStartup) pkg> precompile
Precompiling project...
  ✗ FastAIStartup
  0 dependencies successfully precompiled in 32 seconds. 297 already precompiled.

ERROR: The following 1 direct dependency failed to precompile:

FastAIStartup [bf55ac65-409a-4d86-bfc7-3fe70994b7f0]

Failed to precompile FastAIStartup [bf55ac65-409a-4d86-bfc7-3fe70994b7f0] to "/home/romeo/.julia/compiled/v1.10/FastAIStartup/jl_uOXydd".
ERROR: LoadError: LLVM error: Symbol name with unsupported characters
Stacktrace:
   [1] handle_error(reason::Cstring)
     @ LLVM ~/.julia/packages/LLVM/Od0DH/src/core/context.jl:134
   [2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{…}, OutMemBuf::Base.RefValue{…})
     @ LLVM.API ~/.julia/packages/LLVM/Od0DH/lib/15/libLLVM_h.jl:4326
   [3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
     @ LLVM ~/.julia/packages/LLVM/Od0DH/src/targetmachine.jl:45
   [4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/mcgen.jl:72
   [5] macro expansion
     @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
   [6] macro expansion
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:432 [inlined]
   [7] macro expansion
     @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
   [8] macro expansion
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:429 [inlined]
   [9] 
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:89
  [10] emit_asm
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:83 [inlined]
  [11] 
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:149
  [12] codegen
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:110 [inlined]
  [13] 
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:106
  [14] compile
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:98 [inlined]
  [15] #1037
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:104 [inlined]
  [16] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:47
  [17] compile(job::GPUCompiler.CompilerJob)
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:103
  [18] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:125
  [19] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:103
  [20] macro expansion
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:318 [inlined]
  [21] macro expansion
     @ CUDA ./lock.jl:267 [inlined]
  [22] cufunction(f::GPUArrays.var"#broadcast_kernel#26", tt::Type{Tuple{…}}; kwargs::@Kwargs{})
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:313
  [23] cufunction
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:310 [inlined]
  [24] macro expansion
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:104 [inlined]
  [25] #launch_heuristic#1080
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:17 [inlined]
  [26] launch_heuristic
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:15 [inlined]
  [27] _copyto!
     @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:65 [inlined]
  [28] copyto!
     @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:46 [inlined]
  [29] copy
     @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:37 [inlined]
  [30] materialize
     @ Base.Broadcast ./broadcast.jl:903 [inlined]
  [31] broadcast_preserving_zero_d
     @ Base.Broadcast ./broadcast.jl:892 [inlined]
  [32] +(A::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, B::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
     @ Base ./arraymath.jl:8
  [33] add_sum
     @ Base ./reduce.jl:24 [inlined]
  [34] BottomRF
     @ Base ./reduce.jl:86 [inlined]
  [35] afoldl
     @ Base ./operators.jl:543 [inlined]
  [36] _foldl_impl
     @ Base ./reduce.jl:68 [inlined]
  [37] foldl_impl
     @ Base ./reduce.jl:48 [inlined]
  [38] mapfoldl_impl
     @ Base ./reduce.jl:44 [inlined]
  [39] mapfoldl
     @ Base ./reduce.jl:175 [inlined]
  [40] mapreduce
     @ Base ./reduce.jl:307 [inlined]
  [41] sum
     @ Base ./reduce.jl:535 [inlined]
  [42] sum
     @ Base ./reduce.jl:564 [inlined]
  [43] rrule
     @ ChainRules ~/.julia/packages/ChainRules/9sNmB/src/rulesets/Base/mapreduce.jl:25 [inlined]
  [44] rrule
     @ ChainRulesCore ~/.julia/packages/ChainRulesCore/0t04l/src/rules.jl:134 [inlined]
  [45] chain_rrule
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/chainrules.jl:223 [inlined]
  [46] macro expansion
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0 [inlined]
  [47] _pullback
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:81 [inlined]
  [48] addact
     @ Metalhead ~/.julia/packages/Metalhead/qOYEz/src/utilities.jl:19 [inlined]
  [49] _apply
     @ Core ./boot.jl:836 [inlined]
  [50] adjoint
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
  [51] _pullback
     @ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
  [52] #_#1
     @ Zygote ~/.julia/packages/PartialFunctions/LzDRN/src/PartialFunctions.jl:24 [inlined]
  [53] _pullback(::Zygote.Context{…}, ::PartialFunctions.var"##_#1", ::@Kwargs{}, ::PartialFunctions.PartialFunction{…}, ::CUDA.CuArray{…}, ::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [54] _apply(::Function, ::Vararg{Any})
     @ Core ./boot.jl:836
  [55] adjoint
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
  [56] _pullback
     @ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
  [57] PartialFunction
     @ Zygote ~/.julia/packages/PartialFunctions/LzDRN/src/PartialFunctions.jl:24 [inlined]
  [58] _pullback(::Zygote.Context{…}, ::PartialFunctions.PartialFunction{…}, ::CUDA.CuArray{…}, ::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [59] _apply(::Function, ::Vararg{Any})
     @ Core ./boot.jl:836
  [60] adjoint
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
  [61] _pullback
     @ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
  [62] Parallel
     @ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:527 [inlined]
  [63] _pullback(ctx::Zygote.Context{…}, f::Flux.Parallel{…}, args::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [64] macro expansion
     @ Flux ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:53 [inlined]
  [65] _applychain
     @ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:53 [inlined]
  [66] _pullback(::Zygote.Context{…}, ::typeof(Flux._applychain), ::Tuple{…}, ::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [67] Chain
     @ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:51 [inlined]
--- the last 5 lines are repeated 2 more times ---
  [78] _pullback(ctx::Zygote.Context{true}, f::Flux.Chain{Tuple{…}}, args::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [79] #74
     @ Zygote ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:54 [inlined]
  [80] _pullback(ctx::Zygote.Context{…}, f::FluxTraining.var"#74#76"{}, args::Flux.Chain{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [81] #77
     @ Zygote ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:70 [inlined]
  [82] _pullback(::Zygote.Context{true}, ::FluxTraining.var"#77#78"{FluxTraining.var"#74#76"{}, Flux.Chain{}})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [83] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:384
  [84] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:96
  [85] _gradient(f::FluxTraining.var"#74#76"{}, ::Flux.Optimise.Adam, m::Flux.Chain{…}, ps::Zygote.Params{…})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:70
  [86] (::FluxTraining.var"#73#75"{})(handle::FluxTraining.var"#handlefn#82"{}, state::FluxTraining.PropDict{…})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:53
  [87] runstep(stepfn::FluxTraining.var"#73#75"{}, learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase, initialstate::@NamedTuple{})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:133
  [88] step!(learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase, batch::Tuple{…})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:51
  [89] (::FluxTraining.var"#71#72"{FluxTraining.Learner, FluxTraining.Phases.TrainingPhase, MLUtils.DataLoader{}})(::Function)
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:24
  [90] runepoch(epochfn::FluxTraining.var"#71#72"{}, learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase)
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:105
  [91] epoch!
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:22 [inlined]
  [92] (::FastAI.var"#157#159"{Tuple{Pair{}, Pair{}}, FluxTraining.Learner, Int64})()
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:31
  [93] withcallbacks(f::FastAI.var"#157#159"{}, learner::FluxTraining.Learner, callbacks::FluxTraining.Scheduler)
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/utils.jl:77
  [94] #156
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:28 [inlined]
  [95] withfields(f::FastAI.var"#156#158"{}, x::FluxTraining.Learner; kwargs::@Kwargs{})
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/utils.jl:52
  [96] fitonecycle!(learner::FluxTraining.Learner, nepochs::Int64, maxlr::Float64; phases::Tuple{…}, wd::Float64, kwargs::@Kwargs{})
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:27
  [97] fitonecycle!(learner::FluxTraining.Learner, nepochs::Int64, maxlr::Float64)
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:16
  [98] macro expansion
     @ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:14 [inlined]
  [99] macro expansion
     @ ~/.julia/packages/PrecompileTools/0yi7r/src/workloads.jl:74 [inlined]
 [100] macro expansion
     @ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:8 [inlined]
 [101] macro expansion
     @ ~/.julia/packages/PrecompileTools/0yi7r/src/workloads.jl:136 [inlined]
 [102] top-level scope
     @ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:6
 [103] include
     @ Base ./Base.jl:489 [inlined]
 [104] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{…}, dl_load_path::Vector{…}, load_path::Vector{…}, concrete_deps::Vector{…}, source::Nothing)
     @ Base ./loading.jl:2216
in expression starting at /home/romeo/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:1in expression starting at stdin:3

@maleadt maleadt added the upstream Somebody else's problem. label Aug 21, 2023
@RomeoV
Copy link
Contributor

RomeoV commented Sep 26, 2023

FYI this seems to be fixed now, although I haven't run extensive tests. But the code snippet I posted above runs.

versioninfo()
julia> versioninfo()
Julia Version 1.10.0-beta2
Commit a468aa198d0 (2023-08-17 06:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

(FastAIStartup) pkg> st
Project FastAIStartup v0.1.0
Status `~/Documents/julia_playground/FastAIStartup.jl/Project.toml`
⌃ [5d0beca9] FastAI v0.5.1
  [7bf02486] FastVision v0.1.1
⌅ [587475ba] Flux v0.13.17
⌃ [dbeba491] Metalhead v0.8.2
⌃ [aea7be01] PrecompileTools v1.1.2
  [02a925ec] cuDNN v1.1.0 `https://github.com/JuliaGPU/CUDA.jl.git:lib/cudnn#master`

@maleadt
Copy link
Member

maleadt commented Sep 26, 2023

Great, thanks for reporting back!

@maleadt maleadt closed this as completed Sep 26, 2023
@beorostica
Copy link
Author

Thank you @RomeoV!
It is working with "Julia Version 1.10.0-beta2" here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Somebody else's problem.
Projects
None yet
Development

No branches or pull requests

3 participants