Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero length allocation failure #194

Open
agerasev opened this issue Dec 4, 2023 · 5 comments
Open

Zero length allocation failure #194

agerasev opened this issue Dec 4, 2023 · 5 comments

Comments

@agerasev
Copy link

agerasev commented Dec 4, 2023

Hi!

I'm facing an issue with zero length memory allocation (while trying to run candle on GTX 970). Here is the minimal reproducer:

let dev = cudarc::driver::CudaDevice::new(0).unwrap();
dev.null::<f32>().unwrap();

On my machine it fails with DriverError(CUDA_ERROR_INVALID_VALUE, "invalid argument"). With this workaround it works fine.

I didn't find documentation for cuMemAlloc_v2 but for cuMemAlloc it says:

If bytesize is 0, cuMemAlloc() returns CUDA_ERROR_INVALID_VALUE

Maybe cuMemAlloc_v2 shouldn't be called at all if num_bytes is zero?

@agerasev
Copy link
Author

agerasev commented Dec 4, 2023

My system:

$ uname -a
Linux  6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Mon Dec  4 13:40:18 2023
Driver Version                            : 525.125.06
CUDA Version                              : 12.0

Attached GPUs                             : 1
GPU 00000000:03:00.0
    Product Name                          : NVIDIA GeForce GTX 970
    Product Brand                         : GeForce
    Product Architecture                  : Maxwell
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : Enabled
...

@coreylowman
Copy link
Owner

coreylowman commented Jan 9, 2024

I think this function is behaving as it should - it's returning a result (and the unwrap turns it into a panic). I think this should probably be raised as an issue on candle's repo. Do you know where in candle it's coming from?

@agerasev
Copy link
Author

agerasev commented Jan 9, 2024

Do you know where in candle it's coming from?

It can occur in many places in candle_core::cuda_backend where alloc or htod_copy called. There is no checks for zero length here, they are assumed to be successful.

I think this function is behaving as it should - it's returning a result (and the unwrap turns it into a panic).

The problem is that this behavior is inconsistent - it seems that on most devices zero allocation succeeds (and candle relies on this) but on GTX 970 it fails.

@coreylowman
Copy link
Owner

I'm not really sure what we can do in this case - this seems like a driver level issue. We don't have any device specific code in cudarc, so I guess I'm not sure what the outcome should be. I'm hesitant to use a null pointer (i.e. not actually call cuMalloc) because I don't really know what the downstream effect of that would be or how the cuda driver interacts with all of those.

Can you print out the CudaDevice in your example? I want to see if the is_async is false

let dev = cudarc::driver::CudaDevice::new(0).unwrap();
println!("{:?}", dev);

@agerasev
Copy link
Author

Can you print out the CudaDevice in your example?

CudaDevice {
    cu_device: 0,
    cu_primary_ctx: 0x000055759b945ec0,
    stream: 0x0000000000000000,
    event: 0x000055759bc8d4f0,
    modules: RwLock {
        data: {},
        poisoned: false,
        ..
    },
    ordinal: 0,
    is_async: false,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants