-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for convolution row sizes undivisible by 32 in ggml_conv_2d_sk_p0
#274
Conversation
kernel_size = 14
in ggml_conv_2d_sk_p0
ggml_conv_2d_sk_p0
The failing test is |
ggml_conv_2d_sk_p0
ggml_conv_2d_sk_p0
I suppose that the calculation for the work size buffer here is wrong: Lines 16525 to 16541 in 80be908
I think it should be: cur = sizeof(ggml_fp16_t)*(ne10*ne11*ne12)*ggml_up32(ne00*ne01*ne02); But not sure. Let me know if you give it a try - it's better to try to understand why the 32 rounding does not work |
Unfortunately it didn't work. It first increased the memory requirement for the computation buffer, and when I allocated the required memory the NaN issue kicked back. But I believe that you pointed out the right place that I should look into. I'll revisit it with a fresh mind and let you know. |
I think it's more related to the kernel data ( |
Ah yes - that might be the case. So in that case, I'll merge the current PR for now |
Thanks, I'll dig deeper into it later on. Now that this is merged, I'll raise a PR to add a link to clip.cpp shortly. |
…erganov#274) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.
Rounding up the size of the convolution row in
ggml_compute_forward_conv_2d_sk_p0_f16_f32
yields NaN values withkernel_size = 14
, which is first reported for large CLIP models in monatis/clip.cpp#9.I debugged it in monatis/clip.cpp#11 and found out that the first NaN value comes into being when the size of the convolution row is rounded up to the multiples of 32, and then it propogates through all the calculations, leading to a all-NaN tensor.
ggml/src/ggml.c
Lines 13181 to 13184 in 0a63fc0
With kernel sizes 32 and 16, this is not an issue because
ew0 % 32 = 0
in these cases. However, it's 608 (rounded up from 588) with kernel size 14, and a NaN value comes into being in the index 598 ofx
passed as an argument toggml_vec_dot_f16
. I couldn't discover the exact mechanism behind it, but disabling rounding up as in this PR eliminated the NaN issue in expense for multiplications of the leftover being done elementwise inggml_vec_dot_f16
.So I'm not sure that this is the most elegant solution, but I wanted to bring this into the attention with a working solution.