Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUTENSOR breaks after device_reset! #2319

Closed
maleadt opened this issue Apr 8, 2024 · 2 comments
Closed

CUTENSOR breaks after device_reset! #2319

maleadt opened this issue Apr 8, 2024 · 2 comments
Labels
bug Something isn't working upstream Somebody else's problem.

Comments

@maleadt
Copy link
Member

maleadt commented Apr 8, 2024

No description provided.

@maleadt maleadt added bug Something isn't working upstream Somebody else's problem. labels Apr 8, 2024
@maleadt
Copy link
Member Author

maleadt commented Apr 11, 2024

MWE:

using CUDA, cuTENSOR, LinearAlgebra

function test()
    A = CuArray{Float32}(undef, 1000, 8, 3, 2)
    B = CuArray{Float32}(undef, 3, 2, 2, 8)
    C = CuArray{Float32}(undef, 3, 3, 1000, 2)

    tA = CuTensor(A, ['a', 'f', 'b', 'e'])
    tB = CuTensor(B, ['c', 'e', 'd', 'f'])
    tC = CuTensor(C, ['b', 'c', 'a', 'd'])
    mul!(tC, tA, tB)
end

test()
CUDA.device_reset!()
test()

ERROR: LoadError: CUTENSORError: an internal operation failed (code 14, CUTENSOR_STATUS_INTERNAL_ERROR)

I have an API trace, but it doesn't look like I'm reusing any resources past the reset, so I guess this is cuTENSOR itself holding on to some stale things.

libcuda.cuInit(0) = CUDA_SUCCESS
libcuda.cuDevicePrimaryCtxRetain(Base.RefValue{Ptr{CUDA.CUctx_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUctx_st} @0x0000000001560f00
libcuda.cuCtxSetCurrent(CuContext(0x0000000001560f00, instance ae5ac34e40912f32)) = CUDA_SUCCESS
libcuda.cuStreamCreate(Base.RefValue{Ptr{CUDA.CUstream_st}}, CU_STREAM_DEFAULT) = CUDA_SUCCESS
 1: Ptr{CUDA.CUstream_st} @0x0000000001f894c0
libcutensor.cutensorCreate(Base.RefValue{Ptr{cuTENSOR.cutensorHandle}}) = CUTENSOR_STATUS_SUCCESS
 1: Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0
libcuda.cuMemPoolCreate(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, Base.RefValue{CUDA.CUmemPoolProps_st}) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468
 2: CUDA.CUmemPoolProps_st(CUDA.CU_MEM_ALLOCATION_TYPE_PINNED, CUDA.CU_MEM_HANDLE_TYPE_NONE, CUDA.CUmemLocation_st(CUDA.CU_MEM_LOCATION_TYPE_DEVICE, 0), Ptr{Nothing} @0x0000000000000000, 0x0000000000000000, (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00))
libcuda.cuDeviceSetMemPool(CuDevice(0), CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468, CuContext(0x0000000001560f00, instance ae5ac34e40912f32))) = CUDA_SUCCESS
libcuda.cuMemPoolSetAttribute(CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)), CU_MEMPOOL_ATTR_RELEASE_THRESHOLD, Base.RefValue{UInt64}) = CUDA_SUCCESS
 3: 18446744073709551615
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 192000, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)), CuStream(0x0000000001f894c0, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x0000000302000000)
libcuda.cuDeviceGetMemPool(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 384, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)), CuStream(0x0000000001f894c0, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x000000030202ee00)
libcuda.cuDeviceGetMemPool(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 72000, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)), CuStream(0x0000000001f894c0, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x000000030202f000)
libcutensor.cutensorCreateTensorDescriptor(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Base.RefValue{Ptr{cuTENSOR.cutensorTensorDescriptor}}, 4, 4-element Vector{Int64}, 4-element Vector{Int64}, Float32, 128) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorTensorDescriptor} @0x0000000010650bf0
libcutensor.cutensorCreateTensorDescriptor(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Base.RefValue{Ptr{cuTENSOR.cutensorTensorDescriptor}}, 4, 4-element Vector{Int64}, 4-element Vector{Int64}, Float32, 128) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorTensorDescriptor} @0x0000000010688990
libcutensor.cutensorCreateTensorDescriptor(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Base.RefValue{Ptr{cuTENSOR.cutensorTensorDescriptor}}, 4, 4-element Vector{Int64}, 4-element Vector{Int64}, Float32, 128) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorTensorDescriptor} @0x000000001068da50
libcutensor.cutensorCreateContraction(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Base.RefValue{Ptr{cuTENSOR.cutensorOperationDescriptor}}, CuTensorDescriptor(0x0000000010650bf0), 4-element Vector{Int32}, CUTENSOR_OP_IDENTITY, CuTensorDescriptor(0x0000000010688990), 4-element Vector{Int32}, CUTENSOR_OP_IDENTITY, CuTensorDescriptor(0x000000001068da50), 4-element Vector{Int32}, CUTENSOR_OP_IDENTITY, CuTensorDescriptor(0x000000001068da50), 4-element Vector{Int32}, Float32) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorOperationDescriptor} @0x0000000010693d90
libcutensor.cutensorCreatePlanPreference(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Base.RefValue{Ptr{cuTENSOR.cutensorPlanPreference}}, CUTENSOR_ALGO_DEFAULT, CUTENSOR_JIT_MODE_DEFAULT) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorPlanPreference} @0x00000000107994f0
libcutensor.cutensorEstimateWorkspaceSize(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Ptr{cuTENSOR.cutensorOperationDescriptor} @0x0000000010693d90, Ptr{cuTENSOR.cutensorPlanPreference} @0x00000000107994f0, CUTENSOR_WORKSPACE_DEFAULT, Base.RefValue{UInt64}) = CUTENSOR_STATUS_SUCCESS
 5: 16777216
libcutensor.cutensorOperationDescriptorGetAttribute(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Ptr{cuTENSOR.cutensorOperationDescriptor} @0x0000000010693d90, CUTENSOR_OPERATION_DESCRIPTOR_SCALAR_TYPE, Base.RefValue{cuTENSOR.cutensorDataType_t}, 4) = CUTENSOR_STATUS_SUCCESS
 4: CUTENSOR_R_32F
libcutensor.cutensorCreatePlan(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Base.RefValue{Ptr{cuTENSOR.cutensorPlan}}, Ptr{cuTENSOR.cutensorOperationDescriptor} @0x0000000010693d90, Ptr{cuTENSOR.cutensorPlanPreference} @0x00000000107994f0, 16777216) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorPlan} @0x000000001ad16250
libcutensor.cutensorPlanGetAttribute(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, Ptr{cuTENSOR.cutensorPlan} @0x000000001ad16250, CUTENSOR_PLAN_REQUIRED_WORKSPACE, Base.RefValue{UInt64}, 8) = CUTENSOR_STATUS_SUCCESS
 4: 16777216
libcuda.cuDeviceGetMemPool(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 16777216, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)), CuStream(0x0000000001f894c0, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x0000000302040a00)
libcutensor.cutensorContract(Ptr{cuTENSOR.cutensorHandle} @0x000000000f6b9aa0, CuTensorPlan(0x000000001ad16250), Base.RefValue{Float32}, 1000×8×3×2 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, 3×2×2×8 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.RefValue{Float32}, 3×3×1000×2 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, 3×3×1000×2 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, 16777216-element CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, 16777216, CuStream(0x0000000001f894c0, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)) = CUTENSOR_STATUS_SUCCESS
 3: 1.0
 6: 0.0
libcutensor.cutensorDestroyTensorDescriptor(CuTensorDescriptor(0x000000001068da50)) = CUTENSOR_STATUS_SUCCESS
libcutensor.cutensorDestroyTensorDescriptor(CuTensorDescriptor(0x0000000010688990)) = CUTENSOR_STATUS_SUCCESS
libcutensor.cutensorDestroyTensorDescriptor(CuTensorDescriptor(0x0000000010650bf0)) = CUTENSOR_STATUS_SUCCESS
libcuda.cuDeviceGetMemPool(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x000000000fd17468
libcuda.cuMemFreeAsync(DeviceBuffer(16.000 MiB at 0x0000000302040a00), CuStream(0x0000000001f894c0, CuContext(0x0000000001560f00, instance ae5ac34e40912f32)) = CUDA_SUCCESS
libcutensor.cutensorDestroyPlan(CuTensorPlan(0x000000001ad16250)) = CUTENSOR_STATUS_SUCCESS

libcuda.cuDevicePrimaryCtxReset_v2(CuDevice(0)) = CUDA_SUCCESS

libcuda.cuDevicePrimaryCtxRetain(Base.RefValue{Ptr{CUDA.CUctx_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUctx_st} @0x0000000001560f00
libcuda.cuCtxSetCurrent(CuContext(0x0000000001560f00, instance 3218372044399810)) = CUDA_SUCCESS
libcuda.cuStreamCreate(Base.RefValue{Ptr{CUDA.CUstream_st}}, CU_STREAM_DEFAULT) = CUDA_SUCCESS
 1: Ptr{CUDA.CUstream_st} @0x00000000031fa9a0
libcutensor.cutensorCreate(Base.RefValue{Ptr{cuTENSOR.cutensorHandle}}) = CUTENSOR_STATUS_SUCCESS
 1: Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0
libcuda.cuMemPoolCreate(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, Base.RefValue{CUDA.CUmemPoolProps_st}) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68
 2: CUDA.CUmemPoolProps_st(CUDA.CU_MEM_ALLOCATION_TYPE_PINNED, CUDA.CU_MEM_HANDLE_TYPE_NONE, CUDA.CUmemLocation_st(CUDA.CU_MEM_LOCATION_TYPE_DEVICE, 0), Ptr{Nothing} @0x0000000000000000, 0x0000000000000000, (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00))
libcuda.cuDeviceSetMemPool(CuDevice(0), CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68, CuContext(0x0000000001560f00, instance 3218372044399810))) = CUDA_SUCCESS
libcuda.cuMemPoolSetAttribute(CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68, CuContext(0x0000000001560f00, instance 3218372044399810)), CU_MEMPOOL_ATTR_RELEASE_THRESHOLD, Base.RefValue{UInt64}) = CUDA_SUCCESS
 3: 18446744073709551615
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 192000, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68, CuContext(0x0000000001560f00, instance 3218372044399810)), CuStream(0x00000000031fa9a0, CuContext(0x0000000001560f00, instance 3218372044399810)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x0000001aa8000000)
libcuda.cuDeviceGetMemPool(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 384, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68, CuContext(0x0000000001560f00, instance 3218372044399810)), CuStream(0x00000000031fa9a0, CuContext(0x0000000001560f00, instance 3218372044399810)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x0000001aa802ee00)
libcuda.cuDeviceGetMemPool(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 72000, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68, CuContext(0x0000000001560f00, instance 3218372044399810)), CuStream(0x00000000031fa9a0, CuContext(0x0000000001560f00, instance 3218372044399810)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x0000001aa802f000)
libcutensor.cutensorCreateTensorDescriptor(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Base.RefValue{Ptr{cuTENSOR.cutensorTensorDescriptor}}, 4, 4-element Vector{Int64}, 4-element Vector{Int64}, Float32, 128) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorTensorDescriptor} @0x000000001acf33e0
libcutensor.cutensorCreateTensorDescriptor(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Base.RefValue{Ptr{cuTENSOR.cutensorTensorDescriptor}}, 4, 4-element Vector{Int64}, 4-element Vector{Int64}, Float32, 128) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorTensorDescriptor} @0x000000001070b260
libcutensor.cutensorCreateTensorDescriptor(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Base.RefValue{Ptr{cuTENSOR.cutensorTensorDescriptor}}, 4, 4-element Vector{Int64}, 4-element Vector{Int64}, Float32, 128) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorTensorDescriptor} @0x000000001085b8e0
libcutensor.cutensorCreateContraction(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Base.RefValue{Ptr{cuTENSOR.cutensorOperationDescriptor}}, CuTensorDescriptor(0x000000001acf33e0), 4-element Vector{Int32}, CUTENSOR_OP_IDENTITY, CuTensorDescriptor(0x000000001070b260), 4-element Vector{Int32}, CUTENSOR_OP_IDENTITY, CuTensorDescriptor(0x000000001085b8e0), 4-element Vector{Int32}, CUTENSOR_OP_IDENTITY, CuTensorDescriptor(0x000000001085b8e0), 4-element Vector{Int32}, Float32) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorOperationDescriptor} @0x00000000021f4830
libcutensor.cutensorCreatePlanPreference(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Base.RefValue{Ptr{cuTENSOR.cutensorPlanPreference}}, CUTENSOR_ALGO_DEFAULT, CUTENSOR_JIT_MODE_DEFAULT) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorPlanPreference} @0x000000001a28c970
libcutensor.cutensorEstimateWorkspaceSize(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Ptr{cuTENSOR.cutensorOperationDescriptor} @0x00000000021f4830, Ptr{cuTENSOR.cutensorPlanPreference} @0x000000001a28c970, CUTENSOR_WORKSPACE_DEFAULT, Base.RefValue{UInt64}) = CUTENSOR_STATUS_SUCCESS
 5: 16777216
libcutensor.cutensorOperationDescriptorGetAttribute(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Ptr{cuTENSOR.cutensorOperationDescriptor} @0x00000000021f4830, CUTENSOR_OPERATION_DESCRIPTOR_SCALAR_TYPE, Base.RefValue{cuTENSOR.cutensorDataType_t}, 4) = CUTENSOR_STATUS_SUCCESS
 4: CUTENSOR_R_32F
libcutensor.cutensorCreatePlan(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Base.RefValue{Ptr{cuTENSOR.cutensorPlan}}, Ptr{cuTENSOR.cutensorOperationDescriptor} @0x00000000021f4830, Ptr{cuTENSOR.cutensorPlanPreference} @0x000000001a28c970, 16777216) = CUTENSOR_STATUS_SUCCESS
 2: Ptr{cuTENSOR.cutensorPlan} @0x000000000fe71880
libcutensor.cutensorPlanGetAttribute(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, Ptr{cuTENSOR.cutensorPlan} @0x000000000fe71880, CUTENSOR_PLAN_REQUIRED_WORKSPACE, Base.RefValue{UInt64}, 8) = CUTENSOR_STATUS_SUCCESS
 4: 16777216
libcuda.cuDeviceGetMemPool(Base.RefValue{Ptr{CUDA.CUmemPoolHandle_st}}, CuDevice(0)) = CUDA_SUCCESS
 1: Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68
libcuda.cuMemAllocFromPoolAsync(Base.RefValue{CuPtr{Nothing}}, 16777216, CuMemoryPool(Ptr{CUDA.CUmemPoolHandle_st} @0x0000000010843f68, CuContext(0x0000000001560f00, instance 3218372044399810)), CuStream(0x00000000031fa9a0, CuContext(0x0000000001560f00, instance 3218372044399810)) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x0000001aa8040a00)
libcutensor.cutensorContract(Ptr{cuTENSOR.cutensorHandle} @0x000000001132c2a0, CuTensorPlan(0x000000000fe71880), Base.RefValue{Float32}, 1000×8×3×2 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, 3×2×2×8 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.RefValue{Float32}, 3×3×1000×2 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, 3×3×1000×2 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, 16777216-element CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, 16777216, CuStream(0x00000000031fa9a0, CuContext(0x0000000001560f00, instance 3218372044399810)) = CUTENSOR_STATUS_INTERNAL_ERROR

@maleadt
Copy link
Member Author

maleadt commented May 24, 2024

According to upstream this is as intended, and device resets shouldn't be used.

@maleadt maleadt closed this as completed May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Somebody else's problem.
Projects
None yet
Development

No branches or pull requests

1 participant