Support for convolution row sizes undivisible by 32 in `ggml_conv_2d_sk_p0` #274

monatis · 2023-06-21T12:21:42Z

Rounding up the size of the convolution row in ggml_compute_forward_conv_2d_sk_p0_f16_f32 yields NaN values with kernel_size = 14, which is first reported for large CLIP models in monatis/clip.cpp#9.
I debugged it in monatis/clip.cpp#11 and found out that the first NaN value comes into being when the size of the convolution row is rounded up to the multiples of 32, and then it propogates through all the calculations, leading to a all-NaN tensor.

ggml/src/ggml.c

Lines 13181 to 13184 in 0a63fc0

    
           // size of the convolution row - the kernel size unrolled across all channels 
        
           // round-up so it is more suitable for SIMD 
        
           const int ew0 = ggml_up32(nk0*nk1*ne02);

With kernel sizes 32 and 16, this is not an issue because ew0 % 32 = 0 in these cases. However, it's 608 (rounded up from 588) with kernel size 14, and a NaN value comes into being in the index 598 of x passed as an argument to ggml_vec_dot_f16. I couldn't discover the exact mechanism behind it, but disabling rounding up as in this PR eliminated the NaN issue in expense for multiplications of the leftover being done elementwise in ggml_vec_dot_f16.

So I'm not sure that this is the most elegant solution, but I wanted to bring this into the attention with a working solution.

monatis · 2023-06-21T12:44:22Z

The failing test is test-grad0, which is failing also in master due to a timeout.

ggerganov · 2023-06-24T15:54:36Z

I suppose that the calculation for the work size buffer here is wrong:

ggml/src/ggml.c

Lines 16525 to 16541 in 80be908

    
           const int64_t nk = ne00*ne01; 
        
           UNUSED(ne02); 
        
           UNUSED(ne03); 
        
           UNUSED(nk); 
        
           size_t cur = 0; 
        
           if (node->src0->type == GGML_TYPE_F16 && 
        
               node->src1->type == GGML_TYPE_F32) { 
        
               cur = sizeof(ggml_fp16_t)*(ne10*ne11*ne12); 
        
           } else if (node->src0->type == GGML_TYPE_F32 && 
        
                      node->src1->type == GGML_TYPE_F32) { 
        
               cur = sizeof(float)*      (ne10*ne11*ne12); 
        
           } else { 
        
               GGML_ASSERT(false); 
        
           }

I think it should be:

cur = sizeof(ggml_fp16_t)*(ne10*ne11*ne12)*ggml_up32(ne00*ne01*ne02);

But not sure. Let me know if you give it a try - it's better to try to understand why the 32 rounding does not work

monatis · 2023-06-24T19:26:47Z

Unfortunately it didn't work. It first increased the memory requirement for the computation buffer, and when I allocated the required memory the NaN issue kicked back. But I believe that you pointed out the right place that I should look into. I'll revisit it with a fresh mind and let you know.

monatis · 2023-06-25T11:33:44Z

I think it's more related to the kernel data (src0) not prepared in wdata unlike src1 --trying to understand the memory layout there.

ggerganov · 2023-06-25T12:58:14Z

Ah yes - that might be the case.
Overall, the conv operators are quite cryptic - wonder if there is a better way to implement those.

So in that case, I'll merge the current PR for now

monatis · 2023-06-25T13:33:19Z

Thanks, I'll dig deeper into it later on.

Now that this is merged, I'll raise a PR to add a link to clip.cpp shortly.

…erganov#274) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.

Do not round up the convolution row size

157bf8f

monatis changed the title ~~Support for kernel_size = 14 in ggml_conv_2d_sk_p0~~ Support for kernel sizes not-divisible by 32 in ggml_conv_2d_sk_p0 Jun 21, 2023

monatis changed the title ~~Support for kernel sizes not-divisible by 32 in ggml_conv_2d_sk_p0~~ Support for convolution row sizes undivisible by 32 in ggml_conv_2d_sk_p0 Jun 21, 2023

Green-Sky mentioned this pull request Jun 22, 2023

support for larger models monatis/clip.cpp#9

Closed

ggerganov merged commit 70c5a5c into ggerganov:master Jun 25, 2023

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023

Warn user if a context size greater than 2048 tokens is specified (gg…

d7def1a

…erganov#274) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for convolution row sizes undivisible by 32 in `ggml_conv_2d_sk_p0` #274

Support for convolution row sizes undivisible by 32 in `ggml_conv_2d_sk_p0` #274

monatis commented Jun 21, 2023 •

edited

Loading

monatis commented Jun 21, 2023

ggerganov commented Jun 24, 2023

monatis commented Jun 24, 2023

monatis commented Jun 25, 2023

ggerganov commented Jun 25, 2023

monatis commented Jun 25, 2023

	// size of the convolution row - the kernel size unrolled across all channels
	// round-up so it is more suitable for SIMD
	const int ew0 = ggml_up32(nk0nk1ne02);

Support for convolution row sizes undivisible by 32 in ggml_conv_2d_sk_p0 #274

Support for convolution row sizes undivisible by 32 in ggml_conv_2d_sk_p0 #274

Conversation

monatis commented Jun 21, 2023 • edited Loading

monatis commented Jun 21, 2023

ggerganov commented Jun 24, 2023

monatis commented Jun 24, 2023

monatis commented Jun 25, 2023

ggerganov commented Jun 25, 2023

monatis commented Jun 25, 2023

Support for convolution row sizes undivisible by 32 in `ggml_conv_2d_sk_p0` #274

Support for convolution row sizes undivisible by 32 in `ggml_conv_2d_sk_p0` #274

monatis commented Jun 21, 2023 •

edited

Loading