Cannot precompile GPU code with PrecompileTools #2006

beorostica · 2023-07-21T15:41:08Z

I'm using PrecompileTools for precompiling some functions that use CUDA in a repo I'm working on (KomaMRICore).
Particularly, when a precompile the "simulate()" function in my devolpement environment with the GPU enabled like so:

module KomaMRICore
...
@setup_workload begin
    obj = brain_phantom2D()
    sys = Scanner()
    seq = read_seq(joinpath(dirname(pathof(KomaMRICore)), "../../examples/3.koma_paper/comparison_accuracy/sequences/EPI/epi_100x100_TE100_FOV230.seq"))
    simParams = KomaMRICore.default_sim_params()
    simParams["gpu"] = true
    @compile_workload begin
        raw = simulate(obj, seq, sys; simParams)
    end
end
end

and then performing the same workload in the julia REPL of my development environment:

julia> using KomaMRICore
julia> obj = brain_phantom2D();
julia> sys = Scanner();
julia> seq = read_seq(joinpath(dirname(pathof(KomaMRICore)), "../../examples/3.koma_paper/comparison_accuracy/sequences/EPI/epi_100x100_TE100_FOV230.seq"));
julia> simParams = KomaMRICore.default_sim_params();
julia> simParams["gpu"] = true;
julia> raw = simulate(obj, seq, sys; simParams);
I have the following error "JIT session error: ..." after manipulating some data of the "raw" object:
julia> abs.(raw.profiles[1].data)
JIT session error: Symbols not found: [ __nv_hypotf ]
JIT session error: Symbols not found: [ __nv_hypotf ]
...
100×1 Matrix{Float32}:
  7.2798567
 19.101593
...

I get the error JIT session error: Symbols not found: [ __nv_hypotf ].

Note that this problem doesn't show up when the cpu is used instead of the gpu (by setting simParams["gpu"] = false).

This problem seem to be related with this issue #1870 CUDA, which was solved by adding directly some changes in the julia repo (which apparently is already part of 1.9.0-rc3, see issue #338 SnoopCompile, so it should work for julia 1.9.2, I'm not completely sure though).

Any suggestion for solving this or some pointers on how to continue to debug this issue?

The text was updated successfully, but these errors were encountered:

maleadt · 2023-07-22T07:56:27Z

cc @vchuravy

RomeoV · 2023-08-05T08:25:23Z

Same here for a Flux.jl / Metalhead.jl precompilation on julia-1.10.0-beta1

module FastAIStartup
using FastAI, FastVision, Metalhead
import FastVision: RGB, N0f8

import PrecompileTools: @setup_workload, @compile_workload
@setup_workload begin
    labels = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
    @compile_workload begin
        data = ([rand(RGB{N0f8}, 32, 32) for _ in 1:100],
                [rand(labels) for _ in 1:100])
        blocks = (Image{2}(), FastAI.Label{String}(labels))
        task = ImageClassificationSingle(blocks)
        learner = tasklearner(task, data, backbone=ResNet(18).layers[1])
        fitonecycle!(learner, 2)
    end
end
end # module FastAIStartup

See the stacktrace

(FastAIStartup) pkg> precompile
Precompiling project...
  ✗ FastAIStartup
  0 dependencies successfully precompiled in 32 seconds. 297 already precompiled.

ERROR: The following 1 direct dependency failed to precompile:

FastAIStartup [bf55ac65-409a-4d86-bfc7-3fe70994b7f0]

Failed to precompile FastAIStartup [bf55ac65-409a-4d86-bfc7-3fe70994b7f0] to "/home/romeo/.julia/compiled/v1.10/FastAIStartup/jl_uOXydd".
ERROR: LoadError: LLVM error: Symbol name with unsupported characters
Stacktrace:
   [1] handle_error(reason::Cstring)
     @ LLVM ~/.julia/packages/LLVM/Od0DH/src/core/context.jl:134
   [2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{…}, OutMemBuf::Base.RefValue{…})
     @ LLVM.API ~/.julia/packages/LLVM/Od0DH/lib/15/libLLVM_h.jl:4326
   [3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
     @ LLVM ~/.julia/packages/LLVM/Od0DH/src/targetmachine.jl:45
   [4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/mcgen.jl:72
   [5] macro expansion
     @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
   [6] macro expansion
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:432 [inlined]
   [7] macro expansion
     @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
   [8] macro expansion
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:429 [inlined]
   [9] 
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:89
  [10] emit_asm
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:83 [inlined]
  [11] 
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:149
  [12] codegen
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:110 [inlined]
  [13] 
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:106
  [14] compile
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:98 [inlined]
  [15] #1037
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:104 [inlined]
  [16] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:47
  [17] compile(job::GPUCompiler.CompilerJob)
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:103
  [18] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:125
  [19] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
     @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:103
  [20] macro expansion
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:318 [inlined]
  [21] macro expansion
     @ CUDA ./lock.jl:267 [inlined]
  [22] cufunction(f::GPUArrays.var"#broadcast_kernel#26", tt::Type{Tuple{…}}; kwargs::@Kwargs{})
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:313
  [23] cufunction
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:310 [inlined]
  [24] macro expansion
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:104 [inlined]
  [25] #launch_heuristic#1080
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:17 [inlined]
  [26] launch_heuristic
     @ CUDA ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:15 [inlined]
  [27] _copyto!
     @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:65 [inlined]
  [28] copyto!
     @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:46 [inlined]
  [29] copy
     @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:37 [inlined]
  [30] materialize
     @ Base.Broadcast ./broadcast.jl:903 [inlined]
  [31] broadcast_preserving_zero_d
     @ Base.Broadcast ./broadcast.jl:892 [inlined]
  [32] +(A::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, B::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
     @ Base ./arraymath.jl:8
  [33] add_sum
     @ Base ./reduce.jl:24 [inlined]
  [34] BottomRF
     @ Base ./reduce.jl:86 [inlined]
  [35] afoldl
     @ Base ./operators.jl:543 [inlined]
  [36] _foldl_impl
     @ Base ./reduce.jl:68 [inlined]
  [37] foldl_impl
     @ Base ./reduce.jl:48 [inlined]
  [38] mapfoldl_impl
     @ Base ./reduce.jl:44 [inlined]
  [39] mapfoldl
     @ Base ./reduce.jl:175 [inlined]
  [40] mapreduce
     @ Base ./reduce.jl:307 [inlined]
  [41] sum
     @ Base ./reduce.jl:535 [inlined]
  [42] sum
     @ Base ./reduce.jl:564 [inlined]
  [43] rrule
     @ ChainRules ~/.julia/packages/ChainRules/9sNmB/src/rulesets/Base/mapreduce.jl:25 [inlined]
  [44] rrule
     @ ChainRulesCore ~/.julia/packages/ChainRulesCore/0t04l/src/rules.jl:134 [inlined]
  [45] chain_rrule
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/chainrules.jl:223 [inlined]
  [46] macro expansion
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0 [inlined]
  [47] _pullback
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:81 [inlined]
  [48] addact
     @ Metalhead ~/.julia/packages/Metalhead/qOYEz/src/utilities.jl:19 [inlined]
  [49] _apply
     @ Core ./boot.jl:836 [inlined]
  [50] adjoint
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
  [51] _pullback
     @ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
  [52] #_#1
     @ Zygote ~/.julia/packages/PartialFunctions/LzDRN/src/PartialFunctions.jl:24 [inlined]
  [53] _pullback(::Zygote.Context{…}, ::PartialFunctions.var"##_#1", ::@Kwargs{}, ::PartialFunctions.PartialFunction{…}, ::CUDA.CuArray{…}, ::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [54] _apply(::Function, ::Vararg{Any})
     @ Core ./boot.jl:836
  [55] adjoint
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
  [56] _pullback
     @ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
  [57] PartialFunction
     @ Zygote ~/.julia/packages/PartialFunctions/LzDRN/src/PartialFunctions.jl:24 [inlined]
  [58] _pullback(::Zygote.Context{…}, ::PartialFunctions.PartialFunction{…}, ::CUDA.CuArray{…}, ::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [59] _apply(::Function, ::Vararg{Any})
     @ Core ./boot.jl:836
  [60] adjoint
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
  [61] _pullback
     @ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
  [62] Parallel
     @ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:527 [inlined]
  [63] _pullback(ctx::Zygote.Context{…}, f::Flux.Parallel{…}, args::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [64] macro expansion
     @ Flux ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:53 [inlined]
  [65] _applychain
     @ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:53 [inlined]
  [66] _pullback(::Zygote.Context{…}, ::typeof(Flux._applychain), ::Tuple{…}, ::CUDA.CuArray{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [67] Chain
     @ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:51 [inlined]
--- the last 5 lines are repeated 2 more times ---
  [78] _pullback(ctx::Zygote.Context{true}, f::Flux.Chain{Tuple{…}}, args::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [79] #74
     @ Zygote ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:54 [inlined]
  [80] _pullback(ctx::Zygote.Context{…}, f::FluxTraining.var"#74#76"{…}, args::Flux.Chain{…})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [81] #77
     @ Zygote ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:70 [inlined]
  [82] _pullback(::Zygote.Context{true}, ::FluxTraining.var"#77#78"{FluxTraining.var"#74#76"{…}, Flux.Chain{…}})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
  [83] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:384
  [84] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
     @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:96
  [85] _gradient(f::FluxTraining.var"#74#76"{…}, ::Flux.Optimise.Adam, m::Flux.Chain{…}, ps::Zygote.Params{…})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:70
  [86] (::FluxTraining.var"#73#75"{…})(handle::FluxTraining.var"#handlefn#82"{…}, state::FluxTraining.PropDict{…})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:53
  [87] runstep(stepfn::FluxTraining.var"#73#75"{…}, learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase, initialstate::@NamedTuple{…})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:133
  [88] step!(learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase, batch::Tuple{…})
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:51
  [89] (::FluxTraining.var"#71#72"{FluxTraining.Learner, FluxTraining.Phases.TrainingPhase, MLUtils.DataLoader{…}})(::Function)
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:24
  [90] runepoch(epochfn::FluxTraining.var"#71#72"{…}, learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase)
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:105
  [91] epoch!
     @ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:22 [inlined]
  [92] (::FastAI.var"#157#159"{Tuple{Pair{…}, Pair{…}}, FluxTraining.Learner, Int64})()
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:31
  [93] withcallbacks(f::FastAI.var"#157#159"{…}, learner::FluxTraining.Learner, callbacks::FluxTraining.Scheduler)
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/utils.jl:77
  [94] #156
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:28 [inlined]
  [95] withfields(f::FastAI.var"#156#158"{…}, x::FluxTraining.Learner; kwargs::@Kwargs{…})
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/utils.jl:52
  [96] fitonecycle!(learner::FluxTraining.Learner, nepochs::Int64, maxlr::Float64; phases::Tuple{…}, wd::Float64, kwargs::@Kwargs{})
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:27
  [97] fitonecycle!(learner::FluxTraining.Learner, nepochs::Int64, maxlr::Float64)
     @ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:16
  [98] macro expansion
     @ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:14 [inlined]
  [99] macro expansion
     @ ~/.julia/packages/PrecompileTools/0yi7r/src/workloads.jl:74 [inlined]
 [100] macro expansion
     @ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:8 [inlined]
 [101] macro expansion
     @ ~/.julia/packages/PrecompileTools/0yi7r/src/workloads.jl:136 [inlined]
 [102] top-level scope
     @ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:6
 [103] include
     @ Base ./Base.jl:489 [inlined]
 [104] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{…}, dl_load_path::Vector{…}, load_path::Vector{…}, concrete_deps::Vector{…}, source::Nothing)
     @ Base ./loading.jl:2216
in expression starting at /home/romeo/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:1in expression starting at stdin:3

RomeoV · 2023-09-26T00:21:18Z

FYI this seems to be fixed now, although I haven't run extensive tests. But the code snippet I posted above runs.

versioninfo()

julia> versioninfo()
Julia Version 1.10.0-beta2
Commit a468aa198d0 (2023-08-17 06:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

(FastAIStartup) pkg> st
Project FastAIStartup v0.1.0
Status `~/Documents/julia_playground/FastAIStartup.jl/Project.toml`
⌃ [5d0beca9] FastAI v0.5.1
  [7bf02486] FastVision v0.1.1
⌅ [587475ba] Flux v0.13.17
⌃ [dbeba491] Metalhead v0.8.2
⌃ [aea7be01] PrecompileTools v1.1.2
  [02a925ec] cuDNN v1.1.0 `https://github.com/JuliaGPU/CUDA.jl.git:lib/cudnn#master`

maleadt · 2023-09-26T06:29:46Z

Great, thanks for reporting back!

beorostica · 2023-10-04T15:28:49Z

Thank you @RomeoV!
It is working with "Julia Version 1.10.0-beta2" here too.

beorostica added the bug Something isn't working label Jul 21, 2023

beorostica changed the title ~~Cannot precompile GPU code with SnoopPrecompile~~ Cannot precompile GPU code with PrecompileTools Jul 24, 2023

RomeoV mentioned this issue Aug 5, 2023

Use PrecompileTools.jl FluxML/FastAI.jl#284

Open

maleadt added the upstream Somebody else's problem. label Aug 21, 2023

maleadt closed this as completed Sep 26, 2023

beorostica mentioned this issue Oct 24, 2023

Add precompilation of the most used functions to remove latency in the simulations and UI. JuliaHealth/KomaMRI.jl#126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot precompile GPU code with PrecompileTools #2006

Cannot precompile GPU code with PrecompileTools #2006

beorostica commented Jul 21, 2023

maleadt commented Jul 22, 2023

RomeoV commented Aug 5, 2023 •

edited

Loading

RomeoV commented Sep 26, 2023 •

edited

Loading

maleadt commented Sep 26, 2023

beorostica commented Oct 4, 2023

Cannot precompile GPU code with PrecompileTools #2006

Cannot precompile GPU code with PrecompileTools #2006

Comments

beorostica commented Jul 21, 2023

maleadt commented Jul 22, 2023

RomeoV commented Aug 5, 2023 • edited Loading

RomeoV commented Sep 26, 2023 • edited Loading

maleadt commented Sep 26, 2023

beorostica commented Oct 4, 2023

RomeoV commented Aug 5, 2023 •

edited

Loading

RomeoV commented Sep 26, 2023 •

edited

Loading