Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVTX-related segfault on Windows under compute-sanitizer #2204

Closed
maleadt opened this issue Dec 15, 2023 · 3 comments
Closed

NVTX-related segfault on Windows under compute-sanitizer #2204

maleadt opened this issue Dec 15, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@maleadt
Copy link
Member

maleadt commented Dec 15, 2023

As seen in FluxML/Zygote.jl#1473 (comment):

julia> CUDA.run_compute_sanitizer()
Re-starting your active Julia session...
========= COMPUTE-SANITIZER
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.4 (2023-11-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CUDA

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6be45c40 -- nvtxGlobals_v3 at C:\Users\Gerhard\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
in expression starting at REPL[1]:1
nvtxGlobals_v3 at C:\Users\Gerhard\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
Allocations: 703421 (Pool: 702539; Big: 882); GC: 1
========= Error: Target application terminated before first instrumented API call
ERROR: failed process: Process(setenv(`'C:\Users\Gerhard\.julia\artifacts\0cdffaf70d865a7149744c4c5670ea6b2145e80d\bin\compute-sanitizer.exe' --tool memcheck --launch-timeout=0 --target-processes=all --report-api-errors=no 'C:\Users\Gerhard\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\bin\julia.exe' -Cnative '-JC:\Users\Gerhard\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll' -g1 '--project=C:\Users\Gerhard\.julia\environments\v1.9\Project.toml'`,["WINDIR=C:\\WINDOWS", "PATH=C:\\Users\\Gerhard\\.julia\\artifacts\\0cdffaf70d865a7149744c4c5670ea6b2145e80d\\bin;C:\\Users\\Gerhard\\.julia\\juliaup\\julia-1.9.4+0.x64.w64.mingw32\\bin\\..\\lib\\julia;C:\\Users\\Gerhard\\.julia\\juliaup\\julia-1.9.4+0.x64.w64.mingw32\\bin\\..\\lib;C:\\Users\\Gerhard\\.julia\\juliaup\\julia-1.9.4+0.x64.w64.mingw32\\bin;E:\\Programs\\VM Ware\\bin\\;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files (x86)\\Razer Chroma SDK\\bin;C:\\Program Files\\Razer Chroma SDK\\bin;C:\\Program Files (x86)\\Razer\\ChromaBroadcast\\bin;C:\\Program Files\\Razer\\ChromaBroadcast\\bin;C:\\Program Files\\ImageMagick-6.9.10-Q16;C:\\Program Files (x86)\\Common Files\\Oracle\\Java\\javapath;C:\\ProgramData\\Oracle\\Java\\javapath;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program Files\\MiKTeX 2.9\\miktex\\bin\\x64\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\Android;C:\\Program Files\\MATLAB\\R2019a\\bin;C:\\Program Files\\MATLAB\\R2018b\\bin;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\gs\\gs9.20\\bin;C:\\ad;C:\\Program Files (x86)\\Intel\\iCLS Client\\;C:\\Program Files\\Intel\\iCLS Client\\;C:\\Program Files (x86)\\GNU\\GnuPG\\pub;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\Int;C:\\WINDOWS\\system32\\config\\systemprofile\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Program Files\\PuTTY\\;C:\\Program Files (x86)\\Gpg4win\\..\\GnuPG\\bin;C:\\platform-tools\\;C:\\Program Files\\Git\\cmd;C:\\Users\\Gerhard\\AppData\\Roaming\\Python\\Python39\\Scripts;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\PDFtk\\bin\\;C:\\Program Files (x86)\\PDFtk Server\\bin\\;C:\\Program Files (x86)\\GitExtensions\\;C:\\Program Files\\dotnet\\;C:\\Program Files\\ArangoDB3 3.9.3\\usr\\bin;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\Git LFS;C:\\Program Files\\Java\\jdk-20\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Gerhard\\AppData\\Local\\Microsoft\\WindowsApps;C;C:\\Users\\Gerhard\\scoop\\shims;C:\\Users\\Gerhard\\AppData\\Local\\Programs\\Python\\Python39\\Scripts\\;C:\\Users\\Gerhard\\AppData\\Local\\Programs\\Python\\Python39\\;C:\\Program Files\\MATLAB\\R2018a\\bin;C:\\Program Files (x86)\\Intel\\Intel(R) Management E;C:\\Users\\Gerhard\\AppData\\Local\\Programs\\Microsoft VS Code\\bin", "USERDOMAIN_ROAMINGPROFILE=DESKTOP-BTMG2IL", "ZES_ENABLE_SYSMAN=1", "LOCALAPPDATA=C:\\Users\\Gerhard\\AppData\\Local", "HOMEPATH=\\Users\\Gerhard", "PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 158 Stepping 9, GenuineIntel", "NUMBER_OF_PROCESSORS=8", "PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC", "CYGWIN=nodosfilewarning"  …  "USERPROFILE=C:\\Users\\Gerhard", "DRIVERDATA=C:\\Windows\\System32\\Drivers\\DriverData", "ANDROID_SDK_HOME=C:\\Android", "PROCESSOR_LEVEL=6", "SYSTEMDRIVE=C:", "PROGRAMW6432=C:\\Program Files", "TEMP=C:\\Users\\Gerhard\\AppData\\Local\\Temp", "HOMEDRIVE=C:", "OPENBLAS_MAIN_FREE=1", "PROCESSOR_ARCHITECTURE=AMD64"]), ProcessExited(4294967295)) [4294967295]

Stacktrace:
 [1] pipeline_error
   @ .\process.jl:565 [inlined]
 [2] run(::Cmd; wait::Bool)
   @ Base .\process.jl:480
 [3] run
   @ .\process.jl:477 [inlined]
 [4] run_compute_sanitizer(julia_args::Cmd; tool::String, sanitizer_args::Cmd)
   @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\utilities.jl:200
 [5] run_compute_sanitizer
   @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\utilities.jl:196 [inlined]
 [6] run_compute_sanitizer()
   @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\utilities.jl:196
 [7] top-level scope
   @ REPL[3]:1
 [8] top-level scope
   @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\initialization.jl:208
@maleadt maleadt added the bug Something isn't working label Dec 15, 2023
@GianlucaFuwa
Copy link

I am encountering a very similar issue, when trying to use nvprof.
The reason I am using nvprof is that Pascal support was deprecated, then dropped from Nsight Compute after Nsight Compute 2019.5.1.

nvprof --profile-from-start off C:\Users\gianl\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\bin\julia.exe
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.    
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.4 (2023-11-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release  
|__/                   |

julia> using CUDA

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6be45c40 -- nvtxGlobals_v3 at C:\Users\gianl\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
in expression starting at REPL[1]:1
nvtxGlobals_v3 at C:\Users\gianl\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
Allocations: 1427174 (Pool: 1426172; Big: 1002); GC: 2
======== Warning: No CUDA application was profiled, exiting
======== Error: Application returned non-zero code 1

Additional info:

julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa29 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 4 × Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

julia> CUDA.versioninfo()
CUDA runtime 12.3, artifact installation
CUDA driver 12.3
Unknown NVIDIA driver

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: missing

Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0

Toolchain:
- Julia: 1.9.4
- LLVM: 14.0.6

1 device:
  0: NVIDIA GeForce GTX 1060 3GB (sm_61, 2.416 GiB / 3.000 GiB available)

@charleskawczynski
Copy link
Contributor

I ran into this issue, too.

@maleadt
Copy link
Member Author

maleadt commented Feb 13, 2024

Tracking this here now: JuliaGPU/NVTX.jl#37
I don't have much Windows experience, so if anybody can help out there, that would be great.

@maleadt maleadt closed this as completed Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants