Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for convolution row sizes undivisible by 32 in ggml_conv_2d_sk_p0 #274

Merged
merged 1 commit into from
Jun 25, 2023

Conversation

monatis
Copy link
Contributor

@monatis monatis commented Jun 21, 2023

Rounding up the size of the convolution row in ggml_compute_forward_conv_2d_sk_p0_f16_f32 yields NaN values with kernel_size = 14, which is first reported for large CLIP models in monatis/clip.cpp#9.
I debugged it in monatis/clip.cpp#11 and found out that the first NaN value comes into being when the size of the convolution row is rounded up to the multiples of 32, and then it propogates through all the calculations, leading to a all-NaN tensor.

ggml/src/ggml.c

Lines 13181 to 13184 in 0a63fc0

// size of the convolution row - the kernel size unrolled across all channels
// round-up so it is more suitable for SIMD
const int ew0 = ggml_up32(nk0*nk1*ne02);

With kernel sizes 32 and 16, this is not an issue because ew0 % 32 = 0 in these cases. However, it's 608 (rounded up from 588) with kernel size 14, and a NaN value comes into being in the index 598 of x passed as an argument to ggml_vec_dot_f16. I couldn't discover the exact mechanism behind it, but disabling rounding up as in this PR eliminated the NaN issue in expense for multiplications of the leftover being done elementwise in ggml_vec_dot_f16.

So I'm not sure that this is the most elegant solution, but I wanted to bring this into the attention with a working solution.

@monatis monatis changed the title Support for kernel_size = 14 in ggml_conv_2d_sk_p0 Support for kernel sizes not-divisible by 32 in ggml_conv_2d_sk_p0 Jun 21, 2023
@monatis
Copy link
Contributor Author

monatis commented Jun 21, 2023

The failing test is test-grad0, which is failing also in master due to a timeout.

@monatis monatis changed the title Support for kernel sizes not-divisible by 32 in ggml_conv_2d_sk_p0 Support for convolution row sizes undivisible by 32 in ggml_conv_2d_sk_p0 Jun 21, 2023
@ggerganov
Copy link
Owner

I suppose that the calculation for the work size buffer here is wrong:

ggml/src/ggml.c

Lines 16525 to 16541 in 80be908

const int64_t nk = ne00*ne01;
UNUSED(ne02);
UNUSED(ne03);
UNUSED(nk);
size_t cur = 0;
if (node->src0->type == GGML_TYPE_F16 &&
node->src1->type == GGML_TYPE_F32) {
cur = sizeof(ggml_fp16_t)*(ne10*ne11*ne12);
} else if (node->src0->type == GGML_TYPE_F32 &&
node->src1->type == GGML_TYPE_F32) {
cur = sizeof(float)* (ne10*ne11*ne12);
} else {
GGML_ASSERT(false);
}

I think it should be:

cur = sizeof(ggml_fp16_t)*(ne10*ne11*ne12)*ggml_up32(ne00*ne01*ne02);

But not sure. Let me know if you give it a try - it's better to try to understand why the 32 rounding does not work

@monatis
Copy link
Contributor Author

monatis commented Jun 24, 2023

Unfortunately it didn't work. It first increased the memory requirement for the computation buffer, and when I allocated the required memory the NaN issue kicked back. But I believe that you pointed out the right place that I should look into. I'll revisit it with a fresh mind and let you know.

@monatis
Copy link
Contributor Author

monatis commented Jun 25, 2023

I think it's more related to the kernel data (src0) not prepared in wdata unlike src1 --trying to understand the memory layout there.

@ggerganov
Copy link
Owner

Ah yes - that might be the case.
Overall, the conv operators are quite cryptic - wonder if there is a better way to implement those.

So in that case, I'll merge the current PR for now

@ggerganov ggerganov merged commit 70c5a5c into ggerganov:master Jun 25, 2023
@monatis
Copy link
Contributor Author

monatis commented Jun 25, 2023

Thanks, I'll dig deeper into it later on.

Now that this is merged, I'll raise a PR to add a link to clip.cpp shortly.

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023
…erganov#274)

LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants