Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVPTX i128 support broken on LLVM 11 / Julia 1.6 #793

Closed
ali-ramadhan opened this issue Mar 29, 2021 · 6 comments
Closed

NVPTX i128 support broken on LLVM 11 / Julia 1.6 #793

ali-ramadhan opened this issue Mar 29, 2021 · 6 comments
Labels
bug Something isn't working upstream Somebody else's problem.

Comments

@ali-ramadhan
Copy link

ali-ramadhan commented Mar 29, 2021

Describe the bug

I get a segfault when using setindex! with values of type Int128.

This only started happening with Julia 1.6, it's been fine with previous versions of CUDA.jl (even going back to CuArrays.jl).

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> using CUDA

julia> A = zeros(3) |> CuArray
3-element CuArray{Float64, 1}:
 0.0
 0.0
 0.0

julia> A .= UInt128(5)

produces this segault:

signal (11): Segmentation fault
in expression starting at REPL[5]:1
_ZL16getCopyFromPartsRN4llvm12SelectionDAGERKNS_5SDLocEPKNS_7SDValueEjNS_3MVTENS_3EVTEPKNS_5ValueENS_8OptionalIjEENSD_INS_3ISD8NodeTypeEEE.isra.1040 at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZN4llvm16SelectionDAGISel14LowerArgumentsERKNS_8FunctionE at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.835 at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/alir/julia-1.6.0-rc3/usr/bin/../lib/libLLVM-11jl.so (unknown line)
macro expansion at /home/alir/.julia/packages/LLVM/7Q46C/src/util.jl:85 [inlined]
LLVMTargetMachineEmitToMemoryBuffer at /home/alir/.julia/packages/LLVM/7Q46C/lib/libLLVM_h.jl:3612 [inlined]
emit at /home/alir/.julia/packages/LLVM/7Q46C/src/targetmachine.jl:44
mcgen at /home/alir/.julia/packages/GPUCompiler/XwWPj/src/mcgen.jl:74
unknown function (ip: 0x7f38423e1f73)
_jl_invoke at /home/alir/julia-1.6.0-rc3/src/gf.c:2237 [inlined]
jl_apply_generic at /home/alir/julia-1.6.0-rc3/src/gf.c:2419
macro expansion at /home/alir/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
macro expansion at /home/alir/.julia/packages/GPUCompiler/XwWPj/src/driver.jl:300 [inlined]
macro expansion at /home/alir/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
macro expansion at /home/alir/.julia/packages/GPUCompiler/XwWPj/src/driver.jl:297 [inlined]
#emit_asm#103 at /home/alir/.julia/packages/GPUCompiler/XwWPj/src/utils.jl:62
emit_asm##kw at /home/alir/.julia/packages/GPUCompiler/XwWPj/src/utils.jl:60
unknown function (ip: 0x7f38423d7a38)
_jl_invoke at /home/alir/julia-1.6.0-rc3/src/gf.c:2237 [inlined]
jl_apply_generic at /home/alir/julia-1.6.0-rc3/src/gf.c:2419
cufunction_compile at /home/alir/.julia/packages/CUDA/qEV3Y/src/compiler/execution.jl:306
check_cache at /home/alir/.julia/packages/GPUCompiler/XwWPj/src/cache.jl:44 [inlined]
cached_compilation at /home/alir/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:60 [inlined]
cached_compilation at /home/alir/.julia/packages/GPUCompiler/XwWPj/src/cache.jl:0
#cufunction#221 at /home/alir/.julia/packages/CUDA/qEV3Y/src/compiler/execution.jl:294
cufunction at /home/alir/.julia/packages/CUDA/qEV3Y/src/compiler/execution.jl:288 [inlined]
macro expansion at /home/alir/.julia/packages/CUDA/qEV3Y/src/compiler/execution.jl:102 [inlined]
#launch_heuristic#280 at /home/alir/.julia/packages/CUDA/qEV3Y/src/gpuarrays.jl:17 [inlined]
launch_heuristic at /home/alir/.julia/packages/CUDA/qEV3Y/src/gpuarrays.jl:17 [inlined]
copyto! at /home/alir/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:66 [inlined]
copyto! at /home/alir/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:76 [inlined]
materialize! at ./broadcast.jl:894 [inlined]
materialize! at ./broadcast.jl:891
unknown function (ip: 0x7f38512ecfa5)
_jl_invoke at /home/alir/julia-1.6.0-rc3/src/gf.c:2237 [inlined]
jl_apply_generic at /home/alir/julia-1.6.0-rc3/src/gf.c:2419
jl_apply at /home/alir/julia-1.6.0-rc3/src/julia.h:1703 [inlined]
do_call at /home/alir/julia-1.6.0-rc3/src/interpreter.c:115
eval_value at /home/alir/julia-1.6.0-rc3/src/interpreter.c:204
eval_stmt_value at /home/alir/julia-1.6.0-rc3/src/interpreter.c:155 [inlined]
eval_body at /home/alir/julia-1.6.0-rc3/src/interpreter.c:557
jl_interpret_toplevel_thunk at /home/alir/julia-1.6.0-rc3/src/interpreter.c:669
jl_toplevel_eval_flex at /home/alir/julia-1.6.0-rc3/src/toplevel.c:877
jl_toplevel_eval_flex at /home/alir/julia-1.6.0-rc3/src/toplevel.c:825
jl_toplevel_eval_in at /home/alir/julia-1.6.0-rc3/src/toplevel.c:929
eval at ./boot.jl:360 [inlined]
eval_user_input at /home/alir/julia-1.6.0-rc3/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
repl_backend_loop at /home/alir/julia-1.6.0-rc3/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
start_repl_backend at /home/alir/julia-1.6.0-rc3/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
#run_repl#42 at /home/alir/julia-1.6.0-rc3/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
run_repl at /home/alir/julia-1.6.0-rc3/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
_jl_invoke at /home/alir/julia-1.6.0-rc3/src/gf.c:2237 [inlined]
jl_apply_generic at /home/alir/julia-1.6.0-rc3/src/gf.c:2419
#874 at ./client.jl:387
jfptr_YY.874_29161 at /home/alir/julia-1.6.0-rc3/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/alir/julia-1.6.0-rc3/src/gf.c:2237 [inlined]
jl_apply_generic at /home/alir/julia-1.6.0-rc3/src/gf.c:2419
jl_apply at /home/alir/julia-1.6.0-rc3/src/julia.h:1703 [inlined]
jl_f__call_latest at /home/alir/julia-1.6.0-rc3/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_main_repl at ./client.jl:372
exec_options at ./client.jl:302
_start at ./client.jl:485
jfptr__start_38461 at /home/alir/julia-1.6.0-rc3/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/alir/julia-1.6.0-rc3/src/gf.c:2237 [inlined]
jl_apply_generic at /home/alir/julia-1.6.0-rc3/src/gf.c:2419
jl_apply at /home/alir/julia-1.6.0-rc3/src/julia.h:1703 [inlined]
true_main at /home/alir/julia-1.6.0-rc3/src/jlapi.c:560
repl_entrypoint at /home/alir/julia-1.6.0-rc3/src/jlapi.c:702
main at /home/alir/julia-1.6.0-rc3/cli/loader_exe.c:51
__libc_start_main at /lib64/libc.so.6 (unknown line)
_start at /home/alir/julia-1.6.0-rc3/julia (unknown line)
Allocations: 55524571 (Pool: 55505417; Big: 19154); GC: 58
[1]    4650 segmentation fault (core dumped)  ~/julia-1.6.0-rc3/julia --project
Manifest.toml

# This file is machine-generated - editing it directly is not advised

[[AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "485ee0867925449198280d4af84bdb46a2a404d0"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.0.1"

[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "ffcfa2d345aaee0ef3d8346a073d5dd03c983ebe"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.2.0"

[[ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"

[[Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[BFloat16s]]
deps = ["LinearAlgebra", "Test"]
git-tree-sha1 = "4af69e205efc343068dc8722b8dfec1ade89254a"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.1.0"

[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[CEnum]]
git-tree-sha1 = "215a9aa4a1f23fbd05b92769fdd62559488d70e9"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.4.1"

[[CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CompilerSupportLibraries_jll", "DataStructures", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "MacroTools", "Memoize", "NNlib", "Printf", "Random", "Reexport", "Requires", "SparseArrays", "Statistics", "TimerOutputs"]
git-tree-sha1 = "870e029382294443a6578190e992bf4cbfd34e22"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "2.6.2"

[[ChainRulesCore]]
deps = ["Compat", "LinearAlgebra", "SparseArrays"]
git-tree-sha1 = "644c24cd6344348f1c645efab24b707088be526a"
uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
version = "0.9.34"

[[Compat]]
deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
git-tree-sha1 = "919c7f3151e79ff196add81d7f4e45d91bbf420b"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "3.25.0"

[[CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"

[[CompoundPeriods]]
deps = ["Dates"]
git-tree-sha1 = "88b8763730e30994a0d6a13b3973ffdcd1a654fe"
uuid = "a216cea6-0a8c-5945-ab87-5ade47210022"
version = "0.4.1"

[[DataStructures]]
deps = ["Compat", "InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "4437b64df1e0adccc3e5d1adbc3ac741095e4677"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.18.9"

[[Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"

[[DelimitedFiles]]
deps = ["Mmap"]
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"

[[Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"

[[Downloads]]
deps = ["ArgTools", "LibCURL", "NetworkOptions"]
uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[[ExprTools]]
git-tree-sha1 = "10407a39b87f29d47ebaca8edbc75d7c302ff93e"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.3"

[[EzXML]]
deps = ["Printf", "XML2_jll"]
git-tree-sha1 = "0fa3b52a04a4e210aeb1626def9c90df3ae65268"
uuid = "8f5d6c58-4d21-5cfd-889c-e3ad7ee6a615"
version = "1.1.0"

[[GPUArrays]]
deps = ["AbstractFFTs", "Adapt", "LinearAlgebra", "Printf", "Random", "Serialization"]
git-tree-sha1 = "f99a25fe0313121f2f9627002734c7d63b4dd3bd"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "6.2.0"

[[GPUCompiler]]
deps = ["DataStructures", "ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "Serialization", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "ef2839b063e158672583b9c09d2cf4876a8d3d55"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.10.0"

[[InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"

[[JLLWrappers]]
git-tree-sha1 = "a431f5f2ca3f4feef3bd7a5e94b8b8d4f2f647a0"
uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210"
version = "1.2.0"

[[LLVM]]
deps = ["CEnum", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "b616937c31337576360cb9fb872ec7633af7b194"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "3.6.0"

[[LazyArtifacts]]
deps = ["Artifacts", "Pkg"]
uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"

[[LibCURL]]
deps = ["LibCURL_jll", "MozillaCACerts_jll"]
uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"

[[LibCURL_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"

[[LibGit2]]
deps = ["Base64", "NetworkOptions", "Printf", "SHA"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"

[[LibSSH2_jll]]
deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"

[[Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[Libiconv_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "8e924324b2e9275a51407a4e06deb3455b1e359f"
uuid = "94ce4f54-9a6c-5748-9c1c-f9c7231a4531"
version = "1.16.0+7"

[[LinearAlgebra]]
deps = ["Libdl"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"

[[MacroTools]]
deps = ["Markdown", "Random"]
git-tree-sha1 = "6a8a2a625ab0dea913aba95c11370589e0239ff0"
uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
version = "0.5.6"

[[Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"

[[MbedTLS_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"

[[Memoize]]
deps = ["MacroTools"]
git-tree-sha1 = "2b1dfcba103de714d31c033b5dacc2e4a12c7caa"
uuid = "c03570c3-d221-55d1-a50c-7939bbd78826"
version = "0.4.4"

[[Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"

[[Mocking]]
deps = ["ExprTools"]
git-tree-sha1 = "916b850daad0d46b8c71f65f719c49957e9513ed"
uuid = "78c3b35d-d492-501b-9361-3d52fe80e533"
version = "0.7.1"

[[MozillaCACerts_jll]]
uuid = "14a3606d-f60d-562e-9121-12d972cd8159"

[[NNlib]]
deps = ["ChainRulesCore", "Compat", "LinearAlgebra", "Pkg", "Requires", "Statistics"]
git-tree-sha1 = "ab1d43fead2ecb9aa5ae460d3d547c2cf8d89461"
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
version = "0.7.17"

[[NetworkOptions]]
uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"

[[OrderedCollections]]
git-tree-sha1 = "4fa2ba51070ec13fcc7517db714445b4ab986bdf"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.4.0"

[[Pkg]]
deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"

[[Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[[REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"

[[Random]]
deps = ["Serialization"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[[RecipesBase]]
git-tree-sha1 = "b3fb709f3c97bfc6e948be68beeecb55a0b340ae"
uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
version = "1.1.1"

[[Reexport]]
git-tree-sha1 = "57d8440b0c7d98fc4f889e478e80f268d534c9d5"
uuid = "189a3867-3050-52da-a836-e630ba90ab69"
version = "1.0.0"

[[Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "4036a3bd08ac7e968e27c203d45f5fff15020621"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.1.3"

[[SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"

[[Scratch]]
deps = ["Dates"]
git-tree-sha1 = "ad4b278adb62d185bbcb6864dc24959ab0627bf6"
uuid = "6c6a2e73-6563-6170-7368-637461726353"
version = "1.0.3"

[[Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"

[[SharedArrays]]
deps = ["Distributed", "Mmap", "Random", "Serialization"]
uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"

[[Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"

[[SparseArrays]]
deps = ["LinearAlgebra", "Random"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[[Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[[TOML]]
deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

[[Tar]]
deps = ["ArgTools", "SHA"]
uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"

[[Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[TimeZones]]
deps = ["Dates", "EzXML", "Mocking", "Pkg", "Printf", "RecipesBase", "Serialization", "Unicode"]
git-tree-sha1 = "4ba8a9579a243400db412b50300cd61d7447e583"
uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53"
version = "1.5.3"

[[TimerOutputs]]
deps = ["Printf"]
git-tree-sha1 = "32cdbe6cd2d214c25a0b88f985c9e0092877c236"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.8"

[[TimesDates]]
deps = ["CompoundPeriods", "Dates", "TimeZones"]
git-tree-sha1 = "b56fad6f36724a4261db450baa69074037846289"
uuid = "bdfc003b-8df8-5c39-adcd-3a9087f5df4a"
version = "0.2.6"

[[UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[[Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

[[XML2_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Libiconv_jll", "Pkg", "Zlib_jll"]
git-tree-sha1 = "afd2b541e8fd425cd3b7aa55932a257035ab4a70"
uuid = "02c8fc9c-b97f-50b9-bbe4-9be30ff0a78a"
version = "2.9.11+0"

[[Zlib_jll]]
deps = ["Libdl"]
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"

[[nghttp2_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"

[[p7zip_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0"

Expected behavior

I expected it to set every element of A to 5.0.

Version info

Details on Julia:

Julia Version 1.6.0-rc3
Commit 23267f0d46 (2021-03-16 17:04 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)

Details on CUDA:

CUDA toolkit 10.2.89, artifact installation
CUDA driver 10.2.0
NVIDIA driver 440.33.1

Libraries: 
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUPTI: 12.0.0
- NVML: 10.0.0+440.33.1
- CUDNN: 8.10.0 (for CUDA 10.2.0)
- CUTENSOR: 1.2.1 (for CUDA 10.2.0)

Toolchain:
- Julia: 1.6.0-rc3
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5
- Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

4 devices:
  0: TITAN V (sm_70, 8.556 GiB / 11.784 GiB available)
  1: TITAN V (sm_70, 11.771 GiB / 11.784 GiB available)
  2: TITAN V (sm_70, 11.771 GiB / 11.784 GiB available)
  3: TITAN V (sm_70, 862.500 MiB / 11.784 GiB available)

X-Ref: CliMA/Oceananigans.jl#1514

@maleadt
Copy link
Member

maleadt commented Mar 29, 2021

Underlying LLVM assertion:

julia: /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:9812: void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&): Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed.

signal (6): Aborted
in expression starting at REPL[3]:1
gsignal at /nix/store/hp8wcylqr14hrrpqap4wdrwzq092wfln-glibc-2.32-37/lib/libc.so.6 (unknown line)
abort at /nix/store/hp8wcylqr14hrrpqap4wdrwzq092wfln-glibc-2.32-37/lib/libc.so.6 (unknown line)
__assert_fail_base.cold.0 at /nix/store/hp8wcylqr14hrrpqap4wdrwzq092wfln-glibc-2.32-37/lib/libc.so.6 (unknown line)
__assert_fail at /nix/store/hp8wcylqr14hrrpqap4wdrwzq092wfln-glibc-2.32-37/lib/libc.so.6 (unknown line)
LowerArguments at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:9812
SelectAllBasicBlocks at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1356
runOnMachineFunction at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:504
runOnFunction at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/CodeGen/MachineFunctionPass.cpp:73
runOnFunction at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/IR/LegacyPassManager.cpp:1516
runOnModule at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/IR/LegacyPassManager.cpp:1552
runOnModule at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/IR/LegacyPassManager.cpp:1617 [inlined]
run at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/IR/LegacyPassManager.cpp:614
LLVMTargetMachineEmit at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/Target/TargetMachineC.cpp:213
LLVMTargetMachineEmitToMemoryBuffer at /home/tim/Julia/src/julia/deps/srccache/llvm-11.0.1/lib/Target/TargetMachineC.cpp:237
macro expansion at /home/tim/Julia/pkg/LLVM/src/util.jl:85 [inlined]
LLVMTargetMachineEmitToMemoryBuffer at /home/tim/Julia/pkg/LLVM/lib/libLLVM_h.jl:4924 [inlined]
emit at /home/tim/Julia/pkg/LLVM/src/targetmachine.jl:44
Module for which this occurs.
; ModuleID = 'text'
source_filename = "text"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

%printf_args.0 = type { i64 }

@exception7 = private unnamed_addr addrspace(1) constant [12 x i8] c"BoundsError\00", align 1
@exception_flag = weak local_unnamed_addr addrspace(1) externally_initialized global i64 0
@0 = private unnamed_addr addrspace(1) constant [108 x i8] c"ERROR: a %s was thrown during kernel execution.\0A       Run Julia on debug level 2 for device stack traces.\0A\00", align 1
@1 = private unnamed_addr addrspace(1) constant [110 x i8] c"WARNING: could not signal exception status to the host, execution will continue.\0A         Please file a bug.\0A\00", align 1

; Function Attrs: nounwind readnone speculatable willreturn
declare i128 @llvm.ctlz.i128(i128 %0, i1 immarg %1) #0

; Function Attrs: nounwind readnone speculatable willreturn
declare i128 @llvm.cttz.i128(i128 %0, i1 immarg %1) #0

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #1

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x() #1

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() #1

define ptx_kernel void @_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_({ [1 x i64], i8 addrspace(1)* } %0, { [1 x i128], [1 x [1 x i64]] } %1, i64 signext %2) local_unnamed_addr {
entry:
  %.fca.0.0.extract6 = extractvalue { [1 x i64], i8 addrspace(1)* } %0, 0, 0
  %.fca.0.0.extract = extractvalue { [1 x i128], [1 x [1 x i64]] } %1, 0, 0
  %.inv = icmp sgt i64 %2, 0, !dbg !42
  %3 = select i1 %.inv, i64 %2, i64 0, !dbg !42
  br i1 %.inv, label %L12.i.preheader, label %_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit, !dbg !50

L12.i.preheader:                                  ; preds = %entry
  %.fca.1.extract = extractvalue { [1 x i64], i8 addrspace(1)* } %0, 1
  %4 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !dbg !52, !range !75
  %narrow = add nuw nsw i32 %4, 1, !dbg !76
  %5 = zext i32 %narrow to i64, !dbg !76
  %6 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x(), !dbg !79, !range !88
  %7 = zext i32 %6 to i64, !dbg !89
  %8 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x(), !dbg !94, !range !103
  %9 = zext i32 %8 to i64, !dbg !104
  %10 = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x(), !dbg !106, !range !117
  %11 = zext i32 %10 to i64, !dbg !118
  %12 = icmp sgt i64 %.fca.0.0.extract6, 0, !dbg !120
  %13 = select i1 %12, i64 %.fca.0.0.extract6, i64 0, !dbg !120
  %.not20 = icmp eq i128 %.fca.0.0.extract, 0, !dbg !137
  %14 = call i128 @llvm.ctlz.i128(i128 %.fca.0.0.extract, i1 true), !dbg !156, !range !159
  %15 = trunc i128 %14 to i64, !dbg !160
  %16 = trunc i128 %.fca.0.0.extract to i64, !dbg !163
  %17 = add nsw i64 %15, -75, !dbg !165
  %18 = shl i64 %16, %17, !dbg !167
  %19 = icmp ugt i64 %17, 63, !dbg !167
  %.op = and i64 %18, 4503599627370495, !dbg !170
  %20 = select i1 %19, i64 0, i64 %.op, !dbg !170
  %21 = sub nsw i64 74, %15, !dbg !172
  %22 = zext i64 %21 to i128, !dbg !174
  %23 = lshr i128 %.fca.0.0.extract, %22, !dbg !174
  %24 = icmp ugt i64 %21, 127, !dbg !174
  %25 = trunc i128 %23 to i64, !dbg !177
  %.op22 = and i64 %25, 9007199254740991, !dbg !178
  %.op22.op = add nuw nsw i64 %.op22, 1, !dbg !179
  %.op22.op.op = lshr i64 %.op22.op, 1, !dbg !182
  %26 = select i1 %24, i64 0, i64 %.op22.op.op, !dbg !182
  %27 = call i128 @llvm.cttz.i128(i128 %.fca.0.0.extract, i1 true), !dbg !184, !range !159
  %28 = trunc i128 %27 to i64, !dbg !187
  %29 = icmp eq i64 %21, %28, !dbg !189
  %30 = zext i1 %29 to i64, !dbg !190
  %31 = xor i64 %30, -1, !dbg !194
  %32 = and i64 %26, %31, !dbg !196
  %33 = shl nuw nsw i64 %15, 52, !dbg !197
  %34 = sub nuw nsw i64 5179139571476070400, %33, !dbg !197
  br i1 %.not20, label %L12.i.us.preheader, label %L12.i.preheader.L12.i.preheader.split_crit_edge, !dbg !200

L12.i.us.preheader:                               ; preds = %L12.i.preheader
  %35 = mul i64 %9, %7, !dbg !200
  %36 = sub i64 %5, 1, !dbg !200
  %37 = add i64 %35, %36, !dbg !200
  %38 = shl nuw nsw i64 %37, 3, !dbg !200
  %scevgep = getelementptr i8, i8 addrspace(1)* %.fca.1.extract, i64 %38, !dbg !200
  %39 = mul i64 %11, %9, !dbg !200
  %40 = shl i64 %39, 3, !dbg !200
  %41 = add i64 %37, 1, !dbg !200
  br label %L12.i.us, !dbg !200

L12.i.preheader.L12.i.preheader.split_crit_edge:  ; preds = %L12.i.preheader
  %42 = icmp ult i64 %15, 75, !dbg !201
  br i1 %42, label %L12.i.us23.preheader, label %L12.i.preheader2, !dbg !200

L12.i.preheader2:                                 ; preds = %L12.i.preheader.L12.i.preheader.split_crit_edge
  %43 = or i64 %34, %20, !dbg !204
  %44 = mul i64 %9, %7, !dbg !200
  %45 = sub i64 %5, 1, !dbg !200
  %46 = add i64 %44, %45, !dbg !200
  %47 = shl nuw nsw i64 %46, 3, !dbg !200
  %scevgep26 = getelementptr i8, i8 addrspace(1)* %.fca.1.extract, i64 %47, !dbg !200
  %48 = mul i64 %11, %9, !dbg !200
  %49 = shl i64 %48, 3, !dbg !200
  %50 = add i64 %46, 1, !dbg !200
  br label %L12.i, !dbg !200

L12.i.us23.preheader:                             ; preds = %L12.i.preheader.L12.i.preheader.split_crit_edge
  %51 = add nuw i64 %34, %32, !dbg !204
  %52 = mul i64 %9, %7, !dbg !200
  %53 = sub i64 %5, 1, !dbg !200
  %54 = add i64 %52, %53, !dbg !200
  %55 = shl nuw nsw i64 %54, 3, !dbg !200
  %scevgep17 = getelementptr i8, i8 addrspace(1)* %.fca.1.extract, i64 %55, !dbg !200
  %56 = mul i64 %11, %9, !dbg !200
  %57 = shl i64 %56, 3, !dbg !200
  %58 = add i64 %54, 1, !dbg !200
  br label %L12.i.us23, !dbg !200

L12.i.us:                                         ; preds = %L12.i.us.preheader, %L182.i.us
  %lsr.iv13 = phi i64 [ %41, %L12.i.us.preheader ], [ %lsr.iv.next14, %L182.i.us ]
  %lsr.iv9 = phi i8 addrspace(1)* [ %scevgep, %L12.i.us.preheader ], [ %64, %L182.i.us ]
  %lsr.iv = phi i64 [ %3, %L12.i.us.preheader ], [ %lsr.iv.next, %L182.i.us ]
  %.not.us = icmp slt i64 %.fca.0.0.extract6, %lsr.iv13, !dbg !206
  br i1 %.not.us, label %_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit, label %L78.i.us, !dbg !200

L78.i.us:                                         ; preds = %L12.i.us
  %59 = icmp slt i64 %lsr.iv13, 1, !dbg !211
  %60 = icmp sgt i64 %lsr.iv13, %13, !dbg !231
  %61 = or i1 %59, %60, !dbg !213
  br i1 %61, label %L89.i.us, label %L182.i.us, !dbg !213

L89.i.us:                                         ; preds = %L78.i.us
  call fastcc void @gpu_report_exception(i64 ptrtoint ([12 x i8]* addrspacecast ([12 x i8] addrspace(1)* @exception7 to [12 x i8]*) to i64)), !dbg !213
  call fastcc void @gpu_signal_exception(), !dbg !213
  call void asm sideeffect "trap;", ""() #3, !dbg !213
  call void asm sideeffect "trap;", ""() #3, !dbg !213
  br label %L182.i.us

L182.i.us:                                        ; preds = %L89.i.us, %L78.i.us
  %62 = bitcast i8 addrspace(1)* %lsr.iv9 to i1 addrspace(1)*
  %63 = bitcast i8 addrspace(1)* %lsr.iv9 to double addrspace(1)*
  store double 0.000000e+00, double addrspace(1)* %63, align 8, !dbg !232, !tbaa !242
  %lsr.iv.next = add nsw i64 %lsr.iv, -1, !dbg !245
  %scevgep11 = getelementptr i1, i1 addrspace(1)* %62, i64 %40, !dbg !245
  %64 = bitcast i1 addrspace(1)* %scevgep11 to i8 addrspace(1)*, !dbg !245
  %lsr.iv.next14 = add i64 %lsr.iv13, %39, !dbg !245
  %.not21.not.us = icmp eq i64 %lsr.iv.next, 0, !dbg !245
  br i1 %.not21.not.us, label %_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit, label %L12.i.us, !dbg !155

L12.i.us23:                                       ; preds = %L12.i.us23.preheader, %L182.i.us34
  %lsr.iv22 = phi i64 [ %58, %L12.i.us23.preheader ], [ %lsr.iv.next23, %L182.i.us34 ]
  %lsr.iv18 = phi i8 addrspace(1)* [ %scevgep17, %L12.i.us23.preheader ], [ %70, %L182.i.us34 ]
  %lsr.iv15 = phi i64 [ %3, %L12.i.us23.preheader ], [ %lsr.iv.next16, %L182.i.us34 ]
  %.not.us25 = icmp slt i64 %.fca.0.0.extract6, %lsr.iv22, !dbg !206
  br i1 %.not.us25, label %_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit, label %L78.i.us26, !dbg !200

L78.i.us26:                                       ; preds = %L12.i.us23
  %65 = icmp slt i64 %lsr.iv22, 1, !dbg !211
  %66 = icmp sgt i64 %lsr.iv22, %13, !dbg !231
  %67 = or i1 %65, %66, !dbg !213
  br i1 %67, label %L89.i.us27, label %L182.i.us34, !dbg !213

L89.i.us27:                                       ; preds = %L78.i.us26
  call fastcc void @gpu_report_exception(i64 ptrtoint ([12 x i8]* addrspacecast ([12 x i8] addrspace(1)* @exception7 to [12 x i8]*) to i64)), !dbg !213
  call fastcc void @gpu_signal_exception(), !dbg !213
  call void asm sideeffect "trap;", ""() #3, !dbg !213
  call void asm sideeffect "trap;", ""() #3, !dbg !213
  br label %L182.i.us34

L182.i.us34:                                      ; preds = %L89.i.us27, %L78.i.us26
  %68 = bitcast i8 addrspace(1)* %lsr.iv18 to i1 addrspace(1)*
  %69 = bitcast i8 addrspace(1)* %lsr.iv18 to i64 addrspace(1)*
  store i64 %51, i64 addrspace(1)* %69, align 8, !dbg !232, !tbaa !242
  %lsr.iv.next16 = add nsw i64 %lsr.iv15, -1, !dbg !245
  %scevgep20 = getelementptr i1, i1 addrspace(1)* %68, i64 %57, !dbg !245
  %70 = bitcast i1 addrspace(1)* %scevgep20 to i8 addrspace(1)*, !dbg !245
  %lsr.iv.next23 = add i64 %lsr.iv22, %56, !dbg !245
  %.not21.not.us36 = icmp eq i64 %lsr.iv.next16, 0, !dbg !245
  br i1 %.not21.not.us36, label %_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit, label %L12.i.us23, !dbg !155

L12.i:                                            ; preds = %L12.i.preheader2, %L182.i
  %lsr.iv31 = phi i64 [ %50, %L12.i.preheader2 ], [ %lsr.iv.next32, %L182.i ]
  %lsr.iv27 = phi i8 addrspace(1)* [ %scevgep26, %L12.i.preheader2 ], [ %76, %L182.i ]
  %lsr.iv24 = phi i64 [ %3, %L12.i.preheader2 ], [ %lsr.iv.next25, %L182.i ]
  %.not = icmp slt i64 %.fca.0.0.extract6, %lsr.iv31, !dbg !206
  br i1 %.not, label %_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit, label %L78.i, !dbg !200

L78.i:                                            ; preds = %L12.i
  %71 = icmp slt i64 %lsr.iv31, 1, !dbg !211
  %72 = icmp sgt i64 %lsr.iv31, %13, !dbg !231
  %73 = or i1 %71, %72, !dbg !213
  br i1 %73, label %L89.i, label %L182.i, !dbg !213

L89.i:                                            ; preds = %L78.i
  call fastcc void @gpu_report_exception(i64 ptrtoint ([12 x i8]* addrspacecast ([12 x i8] addrspace(1)* @exception7 to [12 x i8]*) to i64)), !dbg !213
  call fastcc void @gpu_signal_exception(), !dbg !213
  call void asm sideeffect "trap;", ""() #3, !dbg !213
  call void asm sideeffect "trap;", ""() #3, !dbg !213
  br label %L182.i

L182.i:                                           ; preds = %L89.i, %L78.i
  %74 = bitcast i8 addrspace(1)* %lsr.iv27 to i1 addrspace(1)*
  %75 = bitcast i8 addrspace(1)* %lsr.iv27 to i64 addrspace(1)*
  store i64 %43, i64 addrspace(1)* %75, align 8, !dbg !232, !tbaa !242
  %lsr.iv.next25 = add nsw i64 %lsr.iv24, -1, !dbg !245
  %scevgep29 = getelementptr i1, i1 addrspace(1)* %74, i64 %49, !dbg !245
  %76 = bitcast i1 addrspace(1)* %scevgep29 to i8 addrspace(1)*, !dbg !245
  %lsr.iv.next32 = add i64 %lsr.iv31, %48, !dbg !245
  %.not21.not = icmp eq i64 %lsr.iv.next25, 0, !dbg !245
  br i1 %.not21.not, label %_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit, label %L12.i, !dbg !155

_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_.inner.exit: ; preds = %L182.i, %L12.i, %L182.i.us34, %L12.i.us23, %L182.i.us, %L12.i.us, %entry
  ret void
}

define internal fastcc void @gpu_report_exception(i64 zeroext %0) unnamed_addr !dbg !248 {
top:
  %1 = alloca %printf_args.0, align 8
  %2 = addrspacecast %printf_args.0* %1 to %printf_args.0 addrspace(5)*
  %3 = bitcast %printf_args.0* %1 to i8*, !dbg !249
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %3), !dbg !249
  %4 = bitcast %printf_args.0 addrspace(5)* %2 to i64 addrspace(5)*
  store i64 %0, i64 addrspace(5)* %4, align 8, !dbg !249
  %5 = call i32 @vprintf(i8* getelementptr ([108 x i8], [108 x i8]* addrspacecast ([108 x i8] addrspace(1)* @0 to [108 x i8]*), i64 0, i64 0), i8* nonnull %3), !dbg !249
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %3), !dbg !249
  ret void, !dbg !257
}

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg %0, i8* nocapture %1) #2

declare i32 @vprintf(i8* %0, i8* %1) local_unnamed_addr

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg %0, i8* nocapture %1) #2

define internal fastcc void @gpu_signal_exception() unnamed_addr !dbg !258 {
top:
  %ptr.i = load i64, i64 addrspace(1)* @exception_flag, align 8, !dbg !259
  %.not = icmp eq i64 %ptr.i, 0, !dbg !262
  br i1 %.not, label %L10, label %L6, !dbg !262

L6:                                               ; preds = %top
  %0 = inttoptr i64 %ptr.i to i64*, !dbg !263
  store i64 1, i64* %0, align 1, !dbg !263, !tbaa !268
  call void @llvm.nvvm.membar.sys(), !dbg !272
  br label %L13, !dbg !275

L10:                                              ; preds = %top
  %1 = call i32 @vprintf(i8* getelementptr ([110 x i8], [110 x i8]* addrspacecast ([110 x i8] addrspace(1)* @1 to [110 x i8]*), i64 0, i64 0), i8* null), !dbg !276
  br label %L13, !dbg !276

L13:                                              ; preds = %L10, %L6
  ret void, !dbg !283
}

; Function Attrs: nounwind
declare void @llvm.nvvm.membar.sys() #3

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8* %0, i8** %1) #3

attributes #0 = { nounwind readnone speculatable willreturn }
attributes #1 = { nounwind readnone }
attributes #2 = { argmemonly nounwind willreturn }
attributes #3 = { nounwind }

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2, !5, !7, !8, !9, !11, !12, !13, !15, !16, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26, !27, !28, !29, !30, !31, !32, !33, !34, !35, !36, !37, !38, !40}
!nvvm.annotations = !{!41}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 1, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!3 = !DIFile(filename: "/home/tim/Julia/pkg/GPUArrays/src/host/broadcast.jl", directory: ".")
!4 = !{}
!5 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !6, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!6 = !DIFile(filename: "abstractarray.jl", directory: ".")
!7 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !6, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!8 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !6, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!9 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!10 = !DIFile(filename: "/home/tim/Julia/pkg/GPUCompiler/src/runtime.jl", directory: ".")
!11 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!12 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!13 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!14 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/device/runtime.jl", directory: ".")
!15 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!16 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!17 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!18 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!19 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!20 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!21 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!22 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!23 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!24 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!25 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!26 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!27 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!28 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!29 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!30 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!31 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!32 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!33 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!34 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!35 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!36 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!37 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!38 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !39, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!39 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/device/intrinsics/memory_dynamic.jl", directory: ".")
!40 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!41 = !{void ({ [1 x i64], i8 addrspace(1)* }, { [1 x i128], [1 x [1 x i64]] }, i64)* @_Z27julia_broadcast_kernel_193315CuKernelContext13CuDeviceArrayI7Float64Li1ELi1EE11BroadcastedIv5TupleI5OneToI5Int64EE9_identityS3_I7UInt128EES5_, !"kernel", i32 1}
!42 = !DILocation(line: 292, scope: !43, inlinedAt: !46)
!43 = distinct !DISubprogram(name: "unitrange_last;", linkageName: "unitrange_last", scope: !44, file: !44, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!44 = !DIFile(filename: "range.jl", directory: ".")
!45 = !DISubroutineType(types: !4)
!46 = !DILocation(line: 287, scope: !47, inlinedAt: !48)
!47 = distinct !DISubprogram(name: "UnitRange;", linkageName: "UnitRange", scope: !44, file: !44, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!48 = !DILocation(line: 5, scope: !49, inlinedAt: !50)
!49 = distinct !DISubprogram(name: "Colon;", linkageName: "Colon", scope: !44, file: !44, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!50 = !DILocation(line: 57, scope: !51)
!51 = distinct !DISubprogram(name: "broadcast_kernel", linkageName: "julia_broadcast_kernel_1933", scope: null, file: !3, line: 56, type: !45, scopeLine: 56, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!52 = !DILocation(line: 0, scope: !53, inlinedAt: !55)
!53 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!54 = !DIFile(filename: "/home/tim/Julia/pkg/LLVM/src/interop/base.jl", directory: ".")
!55 = !DILocation(line: 7, scope: !56, inlinedAt: !58)
!56 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!57 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/device/intrinsics/indexing.jl", directory: ".")
!58 = !DILocation(line: 7, scope: !59, inlinedAt: !60)
!59 = distinct !DISubprogram(name: "_index;", linkageName: "_index", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!60 = !DILocation(line: 47, scope: !61, inlinedAt: !62)
!61 = distinct !DISubprogram(name: "threadIdx_x;", linkageName: "threadIdx_x", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!62 = !DILocation(line: 91, scope: !63, inlinedAt: !64)
!63 = distinct !DISubprogram(name: "threadIdx;", linkageName: "threadIdx", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!64 = !DILocation(line: 40, scope: !65, inlinedAt: !67)
!65 = distinct !DISubprogram(name: "threadidx;", linkageName: "threadidx", scope: !66, file: !66, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!66 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/gpuarrays.jl", directory: ".")
!67 = !DILocation(line: 20, scope: !68, inlinedAt: !70)
!68 = distinct !DISubprogram(name: "global_index;", linkageName: "global_index", scope: !69, file: !69, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!69 = !DIFile(filename: "/home/tim/Julia/pkg/GPUArrays/src/device/indexing.jl", directory: ".")
!70 = !DILocation(line: 44, scope: !71, inlinedAt: !72)
!71 = distinct !DISubprogram(name: "linear_index;", linkageName: "linear_index", scope: !69, file: !69, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!72 = !DILocation(line: 66, scope: !73, inlinedAt: !74)
!73 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !69, file: !69, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!74 = !DILocation(line: 58, scope: !51)
!75 = !{i32 0, i32 1023}
!76 = !DILocation(line: 0, scope: !77, inlinedAt: !60)
!77 = distinct !DISubprogram(name: "+;", linkageName: "+", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!78 = !DIFile(filename: "int.jl", directory: ".")
!79 = !DILocation(line: 0, scope: !53, inlinedAt: !80)
!80 = !DILocation(line: 7, scope: !56, inlinedAt: !81)
!81 = !DILocation(line: 7, scope: !59, inlinedAt: !82)
!82 = !DILocation(line: 57, scope: !83, inlinedAt: !84)
!83 = distinct !DISubprogram(name: "blockIdx_x;", linkageName: "blockIdx_x", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!84 = !DILocation(line: 77, scope: !85, inlinedAt: !86)
!85 = distinct !DISubprogram(name: "blockIdx;", linkageName: "blockIdx", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!86 = !DILocation(line: 38, scope: !87, inlinedAt: !67)
!87 = distinct !DISubprogram(name: "blockidx;", linkageName: "blockidx", scope: !66, file: !66, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!88 = !{i32 0, i32 2147483646}
!89 = !DILocation(line: 0, scope: !90, inlinedAt: !92)
!90 = distinct !DISubprogram(name: "toInt64;", linkageName: "toInt64", scope: !91, file: !91, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!91 = !DIFile(filename: "boot.jl", directory: ".")
!92 = !DILocation(line: 752, scope: !93, inlinedAt: !82)
!93 = distinct !DISubprogram(name: "Int64;", linkageName: "Int64", scope: !91, file: !91, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!94 = !DILocation(line: 0, scope: !53, inlinedAt: !95)
!95 = !DILocation(line: 7, scope: !56, inlinedAt: !96)
!96 = !DILocation(line: 7, scope: !59, inlinedAt: !97)
!97 = !DILocation(line: 52, scope: !98, inlinedAt: !99)
!98 = distinct !DISubprogram(name: "blockDim_x;", linkageName: "blockDim_x", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!99 = !DILocation(line: 84, scope: !100, inlinedAt: !101)
!100 = distinct !DISubprogram(name: "blockDim;", linkageName: "blockDim", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!101 = !DILocation(line: 39, scope: !102, inlinedAt: !67)
!102 = distinct !DISubprogram(name: "blockdim;", linkageName: "blockdim", scope: !66, file: !66, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!103 = !{i32 1, i32 1024}
!104 = !DILocation(line: 0, scope: !90, inlinedAt: !105)
!105 = !DILocation(line: 752, scope: !93, inlinedAt: !97)
!106 = !DILocation(line: 0, scope: !53, inlinedAt: !107)
!107 = !DILocation(line: 7, scope: !56, inlinedAt: !108)
!108 = !DILocation(line: 7, scope: !59, inlinedAt: !109)
!109 = !DILocation(line: 62, scope: !110, inlinedAt: !111)
!110 = distinct !DISubprogram(name: "gridDim_x;", linkageName: "gridDim_x", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!111 = !DILocation(line: 70, scope: !112, inlinedAt: !113)
!112 = distinct !DISubprogram(name: "gridDim;", linkageName: "gridDim", scope: !57, file: !57, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!113 = !DILocation(line: 41, scope: !114, inlinedAt: !115)
!114 = distinct !DISubprogram(name: "griddim;", linkageName: "griddim", scope: !66, file: !66, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!115 = !DILocation(line: 29, scope: !116, inlinedAt: !70)
!116 = distinct !DISubprogram(name: "global_size;", linkageName: "global_size", scope: !69, file: !69, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!117 = !{i32 1, i32 2147483647}
!118 = !DILocation(line: 0, scope: !90, inlinedAt: !119)
!119 = !DILocation(line: 752, scope: !93, inlinedAt: !109)
!120 = !DILocation(line: 0, scope: !121, inlinedAt: !123)
!121 = distinct !DISubprogram(name: "max;", linkageName: "max", scope: !122, file: !122, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!122 = !DIFile(filename: "promotion.jl", directory: ".")
!123 = !DILocation(line: 326, scope: !124, inlinedAt: !125)
!124 = distinct !DISubprogram(name: "OneTo;", linkageName: "OneTo", scope: !44, file: !44, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!125 = !DILocation(line: 335, scope: !124, inlinedAt: !126)
!126 = !DILocation(line: 337, scope: !127, inlinedAt: !128)
!127 = distinct !DISubprogram(name: "oneto;", linkageName: "oneto", scope: !44, file: !44, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!128 = !DILocation(line: 213, scope: !129, inlinedAt: !131)
!129 = distinct !DISubprogram(name: "map;", linkageName: "map", scope: !130, file: !130, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!130 = !DIFile(filename: "tuple.jl", directory: ".")
!131 = !DILocation(line: 89, scope: !132, inlinedAt: !133)
!132 = distinct !DISubprogram(name: "axes;", linkageName: "axes", scope: !6, file: !6, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!133 = !DILocation(line: 279, scope: !134, inlinedAt: !136)
!134 = distinct !DISubprogram(name: "CartesianIndices;", linkageName: "CartesianIndices", scope: !135, file: !135, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!135 = !DIFile(filename: "multidimensional.jl", directory: ".")
!136 = !DILocation(line: 81, scope: !73, inlinedAt: !74)
!137 = !DILocation(line: 0, scope: !138, inlinedAt: !139)
!138 = distinct !DISubprogram(name: "==;", linkageName: "==", scope: !122, file: !122, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!139 = !DILocation(line: 360, scope: !138, inlinedAt: !140)
!140 = !DILocation(line: 446, scope: !141, inlinedAt: !142)
!141 = distinct !DISubprogram(name: "==;", linkageName: "==", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!142 = !DILocation(line: 116, scope: !143, inlinedAt: !145)
!143 = distinct !DISubprogram(name: "Float64;", linkageName: "Float64", scope: !144, file: !144, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!144 = !DIFile(filename: "float.jl", directory: ".")
!145 = !DILocation(line: 7, scope: !146, inlinedAt: !148)
!146 = distinct !DISubprogram(name: "convert;", linkageName: "convert", scope: !147, file: !147, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!147 = !DIFile(filename: "number.jl", directory: ".")
!148 = !DILocation(line: 103, scope: !149, inlinedAt: !151)
!149 = distinct !DISubprogram(name: "setindex!;", linkageName: "setindex!", scope: !150, file: !150, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!150 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/device/array.jl", directory: ".")
!151 = !DILocation(line: 1286, scope: !152, inlinedAt: !153)
!152 = distinct !DISubprogram(name: "_setindex!;", linkageName: "_setindex!", scope: !6, file: !6, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!153 = !DILocation(line: 1267, scope: !154, inlinedAt: !155)
!154 = distinct !DISubprogram(name: "setindex!;", linkageName: "setindex!", scope: !6, file: !6, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!155 = !DILocation(line: 59, scope: !51)
!156 = !DILocation(line: 0, scope: !157, inlinedAt: !158)
!157 = distinct !DISubprogram(name: "leading_zeros;", linkageName: "leading_zeros", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!158 = !DILocation(line: 117, scope: !143, inlinedAt: !145)
!159 = !{i128 0, i128 129}
!160 = !DILocation(line: 0, scope: !161, inlinedAt: !162)
!161 = distinct !DISubprogram(name: "rem;", linkageName: "rem", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!162 = !DILocation(line: 385, scope: !157, inlinedAt: !158)
!163 = !DILocation(line: 0, scope: !161, inlinedAt: !164)
!164 = !DILocation(line: 119, scope: !143, inlinedAt: !145)
!165 = !DILocation(line: 0, scope: !166, inlinedAt: !164)
!166 = distinct !DISubprogram(name: "-;", linkageName: "-", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!167 = !DILocation(line: 0, scope: !168, inlinedAt: !169)
!168 = distinct !DISubprogram(name: "<<;", linkageName: "<<", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!169 = !DILocation(line: 464, scope: !168, inlinedAt: !164)
!170 = !DILocation(line: 0, scope: !171, inlinedAt: !164)
!171 = distinct !DISubprogram(name: "&;", linkageName: "&", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!172 = !DILocation(line: 0, scope: !166, inlinedAt: !173)
!173 = !DILocation(line: 121, scope: !143, inlinedAt: !145)
!174 = !DILocation(line: 0, scope: !175, inlinedAt: !176)
!175 = distinct !DISubprogram(name: ">>;", linkageName: ">>", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!176 = !DILocation(line: 462, scope: !175, inlinedAt: !173)
!177 = !DILocation(line: 0, scope: !161, inlinedAt: !173)
!178 = !DILocation(line: 0, scope: !171, inlinedAt: !173)
!179 = !DILocation(line: 0, scope: !77, inlinedAt: !180)
!180 = !DILocation(line: 923, scope: !77, inlinedAt: !181)
!181 = !DILocation(line: 122, scope: !143, inlinedAt: !145)
!182 = !DILocation(line: 0, scope: !175, inlinedAt: !183)
!183 = !DILocation(line: 462, scope: !175, inlinedAt: !181)
!184 = !DILocation(line: 0, scope: !185, inlinedAt: !186)
!185 = distinct !DISubprogram(name: "trailing_zeros;", linkageName: "trailing_zeros", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!186 = !DILocation(line: 123, scope: !143, inlinedAt: !145)
!187 = !DILocation(line: 0, scope: !161, inlinedAt: !188)
!188 = !DILocation(line: 398, scope: !185, inlinedAt: !186)
!189 = !DILocation(line: 0, scope: !138, inlinedAt: !186)
!190 = !DILocation(line: 0, scope: !191, inlinedAt: !192)
!191 = distinct !DISubprogram(name: "toUInt64;", linkageName: "toUInt64", scope: !91, file: !91, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!192 = !DILocation(line: 757, scope: !193, inlinedAt: !186)
!193 = distinct !DISubprogram(name: "UInt64;", linkageName: "UInt64", scope: !91, file: !91, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!194 = !DILocation(line: 0, scope: !195, inlinedAt: !186)
!195 = distinct !DISubprogram(name: "~;", linkageName: "~", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!196 = !DILocation(line: 0, scope: !171, inlinedAt: !186)
!197 = !DILocation(line: 0, scope: !168, inlinedAt: !198)
!198 = !DILocation(line: 464, scope: !168, inlinedAt: !199)
!199 = !DILocation(line: 125, scope: !143, inlinedAt: !145)
!200 = !DILocation(line: 67, scope: !73, inlinedAt: !74)
!201 = !DILocation(line: 0, scope: !202, inlinedAt: !203)
!202 = distinct !DISubprogram(name: "<=;", linkageName: "<=", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!203 = !DILocation(line: 118, scope: !143, inlinedAt: !145)
!204 = !DILocation(line: 0, scope: !77, inlinedAt: !205)
!205 = !DILocation(line: 126, scope: !143, inlinedAt: !145)
!206 = !DILocation(line: 83, scope: !207, inlinedAt: !208)
!207 = distinct !DISubprogram(name: "<;", linkageName: "<", scope: !78, file: !78, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!208 = !DILocation(line: 305, scope: !209, inlinedAt: !200)
!209 = distinct !DISubprogram(name: ">;", linkageName: ">", scope: !210, file: !210, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!210 = !DIFile(filename: "operators.jl", directory: ".")
!211 = !DILocation(line: 83, scope: !207, inlinedAt: !212)
!212 = !DILocation(line: 305, scope: !209, inlinedAt: !213)
!213 = !DILocation(line: 702, scope: !214, inlinedAt: !215)
!214 = distinct !DISubprogram(name: "getindex;", linkageName: "getindex", scope: !44, file: !44, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!215 = !DILocation(line: 648, scope: !216, inlinedAt: !218)
!216 = distinct !DISubprogram(name: "_broadcast_getindex_evalf;", linkageName: "_broadcast_getindex_evalf", scope: !217, file: !217, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!217 = !DIFile(filename: "broadcast.jl", directory: ".")
!218 = !DILocation(line: 621, scope: !219, inlinedAt: !220)
!219 = distinct !DISubprogram(name: "_broadcast_getindex;", linkageName: "_broadcast_getindex", scope: !217, file: !217, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!220 = !DILocation(line: 1098, scope: !221, inlinedAt: !222)
!221 = distinct !DISubprogram(name: "#19;", linkageName: "#19", scope: !217, file: !217, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!222 = !DILocation(line: 48, scope: !223, inlinedAt: !225)
!223 = distinct !DISubprogram(name: "ntuple;", linkageName: "ntuple", scope: !224, file: !224, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!224 = !DIFile(filename: "ntuple.jl", directory: ".")
!225 = !DILocation(line: 1098, scope: !226, inlinedAt: !227)
!226 = distinct !DISubprogram(name: "copy;", linkageName: "copy", scope: !217, file: !217, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!227 = !DILocation(line: 883, scope: !228, inlinedAt: !229)
!228 = distinct !DISubprogram(name: "materialize;", linkageName: "materialize", scope: !217, file: !217, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!229 = !DILocation(line: 353, scope: !230, inlinedAt: !136)
!230 = distinct !DISubprogram(name: "getindex;", linkageName: "getindex", scope: !135, file: !135, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!231 = !DILocation(line: 442, scope: !202, inlinedAt: !213)
!232 = !DILocation(line: 74, scope: !53, inlinedAt: !233)
!233 = !DILocation(line: 42, scope: !234, inlinedAt: !236)
!234 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !235, file: !235, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!235 = !DIFile(filename: "/home/tim/Julia/pkg/LLVM/src/interop/pointer.jl", directory: ".")
!236 = !DILocation(line: 42, scope: !237, inlinedAt: !238)
!237 = distinct !DISubprogram(name: "pointerset;", linkageName: "pointerset", scope: !235, file: !235, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!238 = !DILocation(line: 82, scope: !239, inlinedAt: !240)
!239 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !235, file: !235, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!240 = !DILocation(line: 88, scope: !241, inlinedAt: !148)
!241 = distinct !DISubprogram(name: "arrayset;", linkageName: "arrayset", scope: !150, file: !150, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!242 = !{!243, !243, i64 0, i64 0}
!243 = !{!"custom_tbaa_addrspace(1)", !244, i64 0}
!244 = !{!"custom_tbaa"}
!245 = !DILocation(line: 410, scope: !138, inlinedAt: !246)
!246 = !DILocation(line: 674, scope: !247, inlinedAt: !155)
!247 = distinct !DISubprogram(name: "iterate;", linkageName: "iterate", scope: !44, file: !44, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!248 = distinct !DISubprogram(name: "report_exception", linkageName: "julia_report_exception_3014", scope: null, file: !14, line: 51, type: !45, scopeLine: 51, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !13, retainedNodes: !4)
!249 = !DILocation(line: 74, scope: !250, inlinedAt: !251)
!250 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !13, retainedNodes: !4)
!251 = !DILocation(line: 38, scope: !252, inlinedAt: !254)
!252 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !253, file: !253, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !13, retainedNodes: !4)
!253 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/device/intrinsics/output.jl", directory: ".")
!254 = !DILocation(line: 38, scope: !255, inlinedAt: !256)
!255 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !253, file: !253, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !13, retainedNodes: !4)
!256 = !DILocation(line: 52, scope: !248)
!257 = !DILocation(line: 56, scope: !248)
!258 = distinct !DISubprogram(name: "signal_exception", linkageName: "julia_signal_exception_3520", scope: null, file: !14, line: 37, type: !45, scopeLine: 37, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !4)
!259 = !DILocation(line: 27, scope: !260, inlinedAt: !261)
!260 = distinct !DISubprogram(name: "exception_flag;", linkageName: "exception_flag", scope: !14, file: !14, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !4)
!261 = !DILocation(line: 38, scope: !258)
!262 = !DILocation(line: 39, scope: !258)
!263 = !DILocation(line: 118, scope: !264, inlinedAt: !266)
!264 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !265, file: !265, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !4)
!265 = !DIFile(filename: "pointer.jl", directory: ".")
!266 = !DILocation(line: 118, scope: !264, inlinedAt: !267)
!267 = !DILocation(line: 40, scope: !258)
!268 = !{!269, !269, i64 0}
!269 = !{!"jtbaa_data", !270, i64 0}
!270 = !{!"jtbaa", !271, i64 0}
!271 = !{!"jtbaa"}
!272 = !DILocation(line: 115, scope: !273, inlinedAt: !275)
!273 = distinct !DISubprogram(name: "threadfence_system;", linkageName: "threadfence_system", scope: !274, file: !274, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !4)
!274 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/device/intrinsics/synchronization.jl", directory: ".")
!275 = !DILocation(line: 41, scope: !258)
!276 = !DILocation(line: 74, scope: !277, inlinedAt: !278)
!277 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !4)
!278 = !DILocation(line: 38, scope: !279, inlinedAt: !280)
!279 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !253, file: !253, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !4)
!280 = !DILocation(line: 38, scope: !281, inlinedAt: !282)
!281 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !253, file: !253, type: !45, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !4)
!282 = !DILocation(line: 43, scope: !258)
!283 = !DILocation(line: 48, scope: !258)

Reduced to:

source_filename = "text"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define void @kernel( [1 x i128] ) {
  ret void
}

So that looks like a pretty serious LLVM bug we're unlikely to be able to fix from within CUDA.jl...

@ali-ramadhan
Copy link
Author

So that looks like a pretty serious LLVM bug we're unlikely to be able to fix from within CUDA.jl...

Ah that's unfortunate. It's not an important issue (for me at least) but I thought I should open an issue. I don't think many people are mixing CuArrays and Int128 haha.

@maleadt maleadt changed the title Segfault when using setindex! with values of type Int128 (with Julia 1.6) NVPTX i128 support broken on LLVM 11 / Julia 1.6 Mar 29, 2021
@maleadt maleadt added the upstream Somebody else's problem. label Mar 29, 2021
@maleadt
Copy link
Member

maleadt commented Apr 7, 2021

This looks like it's been always like this, at least the assertion gets triggered on ancient versions of LLVM. Reported upstream as https://bugs.llvm.org/show_bug.cgi?id=49877.

@ali-ramadhan
Copy link
Author

Ah interesting. Just to double check that what I said was right I went back to Julia 1.5 and indeed the minimal working example works there:

julia> using CUDA

julia> A = zeros(3) |> CuArray
3-element CuArray{Float64,1}:
 0.0
 0.0
 0.0

julia> A .= UInt128(5)
3-element CuArray{Float64,1}:
 5.0
 5.0
 5.0

julia> A
3-element CuArray{Float64,1}:
 5.0
 5.0
 5.0

Details on Julia

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)

Details on CUDA

julia> CUDA.versioninfo()
CUDA toolkit 10.2.89, artifact installation
CUDA driver 10.2.0
NVIDIA driver 440.33.1

Libraries: 
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUPTI: 12.0.0
- NVML: 10.0.0+440.33.1
- CUDNN: 8.0.4 (for CUDA 10.2.0)
- CUTENSOR: 1.2.1 (for CUDA 10.2.0)

Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

4 devices:
  0: TITAN V (sm_70, 9.202 GiB / 11.784 GiB available)
  1: TITAN V (sm_70, 9.850 GiB / 11.784 GiB available)
  2: TITAN V (sm_70, 9.631 GiB / 11.784 GiB available)
  3: TITAN V (sm_70, 9.874 GiB / 11.784 GiB available)

Environment

] status -m
(@v1.5) pkg> status -m
Status `~/.julia/environments/v1.5/Manifest.toml`
  [c3fe647b] AbstractAlgebra v0.12.0
  [621f4979] AbstractFFTs v0.5.0
  [537997a7] AbstractPlotting v0.15.18
  [1520ce14] AbstractTrees v0.3.3
  [79e6a3ab] Adapt v2.4.0
  [27a7e980] Animations v0.4.1
  [dce04be8] ArgCheck v2.1.0
  [c7e460c6] ArgParse v1.1.1
  [ec485272] ArnoldiMethod v0.1.0
  [4fba245c] ArrayInterface v3.1.1
  [4c555306] ArrayLayouts v0.4.12
  [56f22d72] Artifacts v1.3.0
  [13072b0f] AxisAlgorithms v1.0.0
  [ab4f0b2a] BFloat16s v0.1.0
  [fbb218c0] BSON v0.2.6
  [aae01518] BandedMatrices v0.16.4
  [198e06fe] BangBang v0.3.30
  [6e4b80f9] BenchmarkTools v0.5.0
  [9e28174c] BinDeps v1.0.2
  [b99e7846] BinaryProvider v0.5.10
  [764a87c0] BoundaryValueDiffEq v2.7.1
  [6e34b625] Bzip2_jll v1.0.6+5
  [fa961155] CEnum v0.4.1
  [179af706] CFTime v0.1.1
  [052768ef] CUDA v2.4.1
  [159f3aea] Cairo v1.0.5
  [13f3f980] CairoMakie v0.3.12
  [83423d85] Cairo_jll v1.16.0+6
  [7057c7e9] Cassette v0.3.4
  [324d7699] CategoricalArrays v0.9.2
  [082447d4] ChainRules v0.7.52
  [d360d2e6] ChainRulesCore v0.9.29
  [da1fd8a2] CodeTracking v1.0.5
  [944b1d66] CodecZlib v0.7.0
  [a2cac450] ColorBrewer v0.4.0
  [35d6a980] ColorSchemes v3.10.2
  [3da002f7] ColorTypes v0.10.9
  [c3611d14] ColorVectorSpace v0.8.7
  [5ae59095] Colors v0.12.6
  [861a8166] Combinatorics v1.0.2
  [38540f10] CommonSolve v0.2.0
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v3.25.0
  [e66e0078] CompilerSupportLibraries_jll v0.3.4+0
  [a33af91c] CompositionsBase v0.1.0
  [a216cea6] CompoundPeriods v0.4.1
  [8f4d0f93] Conda v1.5.0
  [a9693cdc] CondaBinDeps v0.2.0
  [88cd18e8] ConsoleProgressMonitor v0.1.2
  [187b0558] ConstructionBase v1.1.0
  [6add18c4] ContextVariablesX v0.1.1
  [d38c429a] Contour v0.5.7
  [adafc99b] CpuId v0.2.2
  [a8cc5b0e] Crayons v4.0.4
  [d58978e5] Dagger v0.11.0
  [9a962f9c] DataAPI v1.6.0
  [124859b0] DataDeps v0.7.6
  [a93c6f00] DataFrames v0.22.3
  [82cc6244] DataInterpolations v3.3.1
  [864edb3b] DataStructures v0.18.9
  [e2d170a0] DataValueInterfaces v1.0.0
  [244e2a9f] DefineSingletons v0.1.1
  [bcd4f6db] DelayDiffEq v5.28.4
  [2b5f629d] DiffEqBase v6.57.5
  [459566f4] DiffEqCallbacks v2.16.0
  [5a0ffddc] DiffEqFinancial v2.4.0
  [aae7a2af] DiffEqFlux v1.31.0
  [c894b116] DiffEqJump v6.13.0
  [77a26b50] DiffEqNoiseProcess v5.6.0
  [055956cb] DiffEqPhysics v3.9.0
  [41bf760c] DiffEqSensitivity v6.42.0
  [163ba53b] DiffResults v1.0.3
  [b552c78f] DiffRules v1.0.2
  [0c46a032] DifferentialEquations v6.16.0
  [0703355e] DimensionalData v0.15.2
  [c619ae07] DimensionalPlotRecipes v1.2.0
  [b4f34e82] Distances v0.10.2
  [31c24e10] Distributions v0.24.12
  [ced4e74d] DistributionsAD v0.6.19
  [ffbed154] DocStringExtensions v0.8.3
  [5ae413db] EarCut_jll v2.1.5+1
  [da5c29d0] EllipsisNotation v1.1.0
  [7da242da] Enzyme v0.3.0
  [7cc45869] Enzyme_jll v0.0.5+0
  [2e619515] Expat_jll v2.2.7+6
  [d4d017d3] ExponentialUtilities v1.8.0
  [e2ba6199] ExprTools v0.1.3
  [8f5d6c58] EzXML v1.1.0
  [c87230d0] FFMPEG v0.4.0
  [b22a6f82] FFMPEG_jll v4.3.1+4
  [7a1cc6ca] FFTW v1.3.0
  [f5851436] FFTW_jll v3.3.9+7
  [cc61a311] FLoops v0.1.6
  [b9860ae5] FLoopsBase v0.1.0
  [9aa1b823] FastClosures v0.3.2
  [5789e2e9] FileIO v1.4.5
  [1a297f60] FillArrays v0.10.2
  [6a86dc24] FiniteDiff v2.8.0
  [53c48c17] FixedPointNumbers v0.8.4
  [587475ba] Flux v0.11.3
  [a3f928ae] Fontconfig_jll v2.13.1+14
  [59287772] Formatting v0.4.2
  [f6369f11] ForwardDiff v0.10.16
  [b38be410] FreeType v3.0.1
  [d7e528f0] FreeType2_jll v2.10.1+5
  [663a7486] FreeTypeAbstraction v0.8.4
  [559328eb] FriBidi_jll v1.0.5+6
  [069b7b12] FunctionWrappers v1.1.1
  [d9f16b24] Functors v0.1.0
  [fb4132e2] FuzzyCompletions v0.4.0
  [0656b61e] GLFW_jll v3.3.2+1
  [0c68f7d7] GPUArrays v6.2.0
  [61eb1bfa] GPUCompiler v0.8.3
  [28b8d3ca] GR v0.53.0
  [d2c73de3] GR_jll v0.53.0+0
  [a75be94c] GalacticOptim v0.4.7
  [01680d73] GenericSVD v0.3.0
  [5c1252a2] GeometryBasics v0.3.9
  [78b55507] Gettext_jll v0.20.1+7
  [7746bdde] Glib_jll v2.59.0+4
  [c27321d9] Glob v1.3.0
  [af5da776] GlobalSensitivity v1.0.0
  [a2bd30eb] Graphics v1.1.0
  [3b182d85] Graphite2_jll v1.3.13+4
  [3955a311] GridLayoutBase v0.5.1
  [42e2da0e] Grisu v1.0.0
  [cd3eb016] HTTP v0.8.19
  [2e76f6c2] HarfBuzz_jll v2.6.1+10
  [a51ab1cf] ICU_jll v67.1.0+3
  [7073ff75] IJulia v1.23.1
  [7869d1d1] IRTools v0.4.2
  [615f187c] IfElse v0.1.0
  [a09fc81d] ImageCore v0.8.20
  [82e4d734] ImageIO v0.4.1
  [9b13fd28] IndirectArrays v0.5.1
  [d25df0c9] Inflate v0.1.2
  [83e8ac13] IniFile v0.5.0
  [22cec73e] InitialValues v0.2.10
  [1d5cc7b8] IntelOpenMP_jll v2018.0.3+2
  [a98d9a8b] Interpolations v0.13.1
  [8197267c] IntervalSets v0.5.2
  [41ab1584] InvertedIndices v1.0.0
  [f1662d9f] Isoband v0.1.1
  [c8e1da08] IterTools v1.3.0
  [42fd0dbc] IterativeSolvers v0.9.0
  [82899510] IteratorInterfaceExtensions v1.0.0
  [033835bb] JLD2 v0.3.2
  [692b3bcd] JLLWrappers v1.2.0
  [682c06a0] JSON v0.21.1
  [0f8b85d8] JSON3 v1.5.1
  [aacddb02] JpegTurbo_jll v2.0.1+3
  [aa1ae85d] JuliaInterpreter v0.8.8
  [b14d175d] JuliaVariables v0.2.3
  [e5e0dc1b] Juno v0.8.4
  [63c18a36] KernelAbstractions v0.5.3
  [5ab0869b] KernelDensity v0.6.2
  [c1c5ebd0] LAME_jll v3.100.0+3
  [929cbde3] LLVM v3.6.0
  [dd4b983a] LZO_jll v2.10.0+3
  [b964fa9f] LaTeXStrings v1.2.0
  [2ee39098] LabelledArrays v1.5.0
  [23fbe1c1] Latexify v0.14.7
  [a5e1c1ea] LatinHypercubeSampling v1.7.3
  [73f95e8e] LatticeRules v0.0.1
  [1d6d02ad] LeftChildRightSiblingTrees v0.1.2
  [dd192d2f] LibVPX_jll v1.9.0+1
  [e9f186c6] Libffi_jll v3.2.1+4
  [d4300ac3] Libgcrypt_jll v1.8.5+4
  [7e76a0d4] Libglvnd_jll v1.3.0+3
  [7add5ba3] Libgpg_error_jll v1.36.0+3
  [94ce4f54] Libiconv_jll v1.16.0+7
  [4b2f31a3] Libmount_jll v2.34.0+3
  [89763e89] Libtiff_jll v4.1.0+2
  [38a345b3] Libuuid_jll v2.34.0+7
  [093fc24a] LightGraphs v1.3.5
  [d3d80556] LineSearches v7.1.1
  [e6f89c97] LoggingExtras v0.4.5
  [bdcacae8] LoopVectorization v0.8.26
  [6f1432cf] LoweredCodeUtils v1.2.7
  [d00139f3] METIS_jll v5.1.0+5
  [856f044c] MKL_jll v2020.2.254+0
  [d8e11817] MLStyle v0.4.6
  [da04e1cc] MPI v0.16.1
  [7cb0a576] MPICH_jll v3.3.2+10
  [1914dd2f] MacroTools v0.5.6
  [dbb5928d] MappedArrays v0.3.0
  [7eb4fadd] Match v1.1.0
  [739be429] MbedTLS v1.0.3
  [c8ffd9c3] MbedTLS_jll v2.16.8+1
  [442fdcdd] Measures v0.3.1
  [e89f7d12] Media v0.5.0
  [f9f48841] MemPool v0.3.3
  [128add7d] MicroCollections v0.1.0
  [9237b28f] MicrosoftMPI_jll v10.1.3+0
  [e1d29d7a] Missings v0.4.5
  [78c3b35d] Mocking v0.7.1
  [961ee093] ModelingToolkit v5.6.2
  [e94cdb99] MosaicViews v0.2.4
  [99f44e22] MsgPack v1.1.0
  [46d2c3a1] MuladdMacro v0.2.2
  [f9640e96] MultiScaleArrays v1.8.1
  [ffc61752] Mustache v1.0.10
  [85f8d34a] NCDatasets v0.10.4
  [d41bc354] NLSolversBase v7.7.1
  [2774e3e8] NLsolve v4.5.1
  [872c559c] NNlib v0.7.14
  [77ba4419] NaNMath v0.3.5
  [71a1bf82] NameResolution v0.1.5
  [b8a86587] NearestNeighbors v0.4.8
  [f09324ee] Netpbm v1.0.0
  [8913a72c] NonlinearSolve v0.3.8
  [510215fc] Observables v0.3.3
  [d848d694] OceanTurb v0.3.3
  [9e8cae18] Oceananigans v0.52.1
  [6fe1bfb0] OffsetArrays v1.5.3
  [e7412a2a] Ogg_jll v1.3.4+2
  [4536629a] OpenBLAS_jll v0.3.9+5
  [fe0851c0] OpenMPI_jll v4.0.2+2
  [458c3c95] OpenSSL_jll v1.1.1+6
  [efe28fd5] OpenSpecFun_jll v0.5.3+4
  [429524aa] Optim v1.2.3
  [91d4177d] Opus_jll v1.3.1+3
  [bac558e1] OrderedCollections v1.4.0
  [1dea7af3] OrdinaryDiffEq v5.50.2
  [2f80f16e] PCRE_jll v8.42.0+4
  [90014a1f] PDMats v0.10.1
  [f57f5aa1] PNGFiles v0.3.5
  [19eb6ba3] Packing v0.4.1
  [5432bcbf] PaddedViews v0.5.8
  [36c8627f] Pango_jll v1.42.4+10
  [65888b18] ParameterizedFunctions v5.9.0
  [d96e819e] Parameters v0.12.2
  [69de0a69] Parsers v1.0.15
  [0e08944d] PencilArrays v0.4.1
  [4a48f351] PencilFFTs v0.11.3
  [30392449] Pixman_jll v0.40.0+0
  [14b8a8f1] PkgTemplates v0.7.13
  [ccf2f8ad] PlotThemes v2.0.1
  [995b91a9] PlotUtils v1.0.10
  [91a5bcdd] Plots v1.10.1
  [c3e4b0f8] Pluto v0.12.18
  [e409e4f3] PoissonRandom v0.4.0
  [647866c9] PolygonOps v0.1.1
  [2dfb63ee] PooledArrays v0.5.3
  [85a6dd25] PositiveFactorizations v0.2.4
  [8162dcfd] PrettyPrint v0.2.0
  [08abe8d2] PrettyTables v0.10.1
  [27ebfcd6] Primes v0.5.0
  [33c8b6b6] ProgressLogging v0.1.4
  [92933f4c] ProgressMeter v1.5.0
  [438e738f] PyCall v1.92.2
  [d330b81b] PyPlot v2.9.0
  [ede63266] Qt_jll v5.15.2+1
  [1fd47b50] QuadGK v2.4.1
  [8a4e6c94] QuasiMonteCarlo v0.2.2
  [74087812] Random123 v1.3.1
  [fb686558] RandomExtensions v0.4.3
  [e6cf234a] RandomNumbers v1.4.0
  [c84ed2f1] Ratios v0.4.0
  [3cdcf5f2] RecipesBase v1.1.1
  [01d81517] RecipesPipeline v0.2.1
  [731186ca] RecursiveArrayTools v2.11.0
  [f2c3362d] RecursiveFactorization v0.1.8
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v1.1.2
  [ae5879a3] ResettableStacks v1.1.0
  [37e2e3b7] ReverseDiff v1.5.0
  [295af30f] Revise v3.1.11
  [79098fc4] Rmath v0.6.1
  [f50d1b31] Rmath_jll v0.2.2+2
  [7e49a35a] RuntimeGeneratedFunctions v0.5.1
  [21efa798] SIMDPirates v0.8.26
  [476501e8] SLEEFPirates v0.5.5
  [1bc83da4] SafeTestsets v0.0.1
  [0bca4576] SciMLBase v1.7.3
  [6c6a2e73] Scratch v1.0.3
  [d496a93d] SeawaterPolynomials v0.2.0
  [efcf1570] Setfield v0.7.0
  [992d4aef] Showoff v0.3.2
  [73760f76] SignedDistanceFields v0.4.0
  [699a6c99] SimpleTraits v0.9.3
  [ed01d8cd] Sobol v1.4.0
  [b85f4697] SoftGlobalScope v1.1.0
  [a2af1166] SortingAlgorithms v0.3.1
  [47a9eef4] SparseDiffTools v1.13.0
  [276daf66] SpecialFunctions v1.2.1
  [171d559e] SplittablesBase v0.1.13
  [860ef19b] StableRNGs v1.0.0
  [90137ffa] StaticArrays v0.12.5
  [15972242] StaticPermutations v0.2.1
  [2913bbd2] StatsBase v0.33.3
  [4c63d2b9] StatsFuns v0.9.6
  [9672c7b4] SteadyStateDiffEq v1.6.1
  [789caeaf] StochasticDiffEq v6.32.1
  [09ab397b] StructArrays v0.4.4
  [856f2bd8] StructTypes v1.2.3
  [bea87d4a] SuiteSparse_jll v5.4.0+9
  [c3572dad] Sundials v4.4.1
  [fb77eaff] Sundials_jll v5.2.0+1
  [d1185830] SymbolicUtils v0.8.2
  [3783bdb8] TableTraits v1.0.0
  [bd369af6] Tables v1.3.2
  [5d786b92] TerminalLoggers v0.1.3
  [b718987f] TextWrap v1.0.1
  [f269a46b] TimeZones v1.5.3
  [a759f4b9] TimerOutputs v0.5.7
  [bdfc003b] TimesDates v0.2.6
  [9f7883ad] Tracker v0.2.15
  [3bb67fe8] TranscodingStreams v0.9.5
  [28d57a85] Transducers v0.4.59
  [592b5752] Trapz v2.0.2
  [a2a6695c] TreeViews v0.3.0
  [bc48ee85] Tullio v0.2.12
  [30578b45] URIParser v0.4.1
  [3a884ed6] UnPack v1.0.2
  [1cfade01] UnicodeFun v0.4.1
  [1986cc42] Unitful v1.5.0
  [3d5dd08c] VectorizationBase v0.12.33
  [81def892] VersionParsing v1.2.0
  [19fa3120] VertexSafeGraphs v0.1.2
  [a2964d1f] Wayland_jll v1.17.0+4
  [2381bf8a] Wayland_protocols_jll v1.18.0+4
  [efce3f68] WoodburyMatrices v0.5.3
  [02c8fc9c] XML2_jll v2.9.10+3
  [aed1982a] XSLT_jll v1.1.33+4
  [4f6342f7] Xorg_libX11_jll v1.6.9+4
  [0c0b7dd1] Xorg_libXau_jll v1.0.9+4
  [935fb764] Xorg_libXcursor_jll v1.2.0+4
  [a3789734] Xorg_libXdmcp_jll v1.1.3+4
  [1082639a] Xorg_libXext_jll v1.3.4+4
  [d091e8ba] Xorg_libXfixes_jll v5.0.3+4
  [a51aa0fd] Xorg_libXi_jll v1.7.10+4
  [d1454406] Xorg_libXinerama_jll v1.1.4+4
  [ec84b674] Xorg_libXrandr_jll v1.5.2+4
  [ea2f1a96] Xorg_libXrender_jll v0.9.10+4
  [14d82f49] Xorg_libpthread_stubs_jll v0.1.0+3
  [c7cfdc94] Xorg_libxcb_jll v1.13.0+3
  [cc61e674] Xorg_libxkbfile_jll v1.1.0+4
  [12413925] Xorg_xcb_util_image_jll v0.4.0+1
  [2def613f] Xorg_xcb_util_jll v0.4.0+1
  [975044d2] Xorg_xcb_util_keysyms_jll v0.4.0+1
  [0d47668e] Xorg_xcb_util_renderutil_jll v0.3.9+1
  [c22f9ab0] Xorg_xcb_util_wm_jll v0.4.1+1
  [35661453] Xorg_xkbcomp_jll v1.4.2+4
  [33bec58e] Xorg_xkeyboard_config_jll v2.27.0+4
  [c5fb5394] Xorg_xtrans_jll v1.4.0+3
  [c2297ded] ZMQ v1.2.1
  [8f1865be] ZeroMQ_jll v4.3.2+6
  [a5390f91] ZipFile v0.9.3
  [83775a58] Zlib_jll v1.2.11+18
  [3161d3a3] Zstd_jll v1.4.8+0
  [e88e6eb3] Zygote v0.5.17
  [700de1a5] ZygoteRules v0.2.1
  [9a68df92] isoband_jll v0.2.2+0
  [0ac62f75] libass_jll v0.14.0+4
  [f638f0a6] libfdk_aac_jll v0.1.6+4
  [b53b4c65] libpng_jll v1.6.37+6
  [a9144af2] libsodium_jll v1.0.18+1
  [f27f6e37] libvorbis_jll v1.3.6+6
  [3f19e933] p7zip_jll v16.2.0+3
  [1270edf5] x264_jll v2020.7.14+2
  [dfaa095f] x265_jll v3.0.0+3
  [d8fb68d0] xkbcommon_jll v0.9.1+5
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8bb1440f] DelimitedFiles
  [8ba89e20] Distributed
  [7b1f6079] FileWatching
  [9fa8497b] Future
  [b77e0a4c] InteractiveUtils
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [44cfe95a] Pkg
  [de0858da] Printf
  [9abbd945] Profile
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [4607b0f0] SuiteSparse
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode

@maleadt
Copy link
Member

maleadt commented Apr 7, 2021

Ah yes, this worked on 1.5 because we were then using byval:

source_filename = "text"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define void @kernel( [1 x i128]* byval([1 x i128]) ) {
  ret void
}

I had to disable that due to a performance regression; JuliaGPU/GPUCompiler.jl#92. Maybe it's time to revisit that, especially with https://reviews.llvm.org/D98469.

@maleadt
Copy link
Member

maleadt commented Apr 27, 2024

Let's close this in favor of #974.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Somebody else's problem.
Projects
None yet
Development

No branches or pull requests

2 participants