Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues about YaRN #835

Open
foldl opened this issue May 24, 2024 · 0 comments
Open

issues about YaRN #835

foldl opened this issue May 24, 2024 · 0 comments

Comments

@foldl
Copy link
Contributor

foldl commented May 24, 2024

I am using YaRN when implementing DeepSeek V2 models. And current YaRN does not look good me.

@cebtenzzre Could you take a look on this? Correct me if I am wrong.

  1. theta_base *= freq_scale is done again later in rope_yarn:

    ggml/src/ggml.c

    Line 14077 in 0cbb7c0

    theta_base *= freq_scale;

    In a basic case (ext_factor is 0), the theta uses for cos/sin is scaled by freq_scale * freq_scale. I think this is wrong and this line should be deleted.

  2. value passed to int64_t i0 is wrong: (data type does not matches, either.)

    ggml/src/ggml.c

    Lines 14082 to 14088 in 0cbb7c0

    // simplified from `(ib * n_dims + ic) * inv_ndims`
    float cur_rot = inv_ndims * ic - ib;
    float cos_theta, sin_theta;
    rope_yarn(
    theta_base, freq_scale, corr_dims, cur_rot, ext_factor, attn_factor,
    &cos_theta, &sin_theta

    I think it should be ic here.

    (Confirmed when implementing DeepSeek V2 models)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant