Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama: implement YaRN RoPE scaling #2268

Merged
merged 36 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
8dec38c
llama: implement NTK-By-Parts (NTKv2) RoPE scaling
cebtenzzre Jul 18, 2023
6aeb46b
CUDA implementation
cebtenzzre Jul 19, 2023
9348aa4
Metal implementation
cebtenzzre Jul 21, 2023
a30ae20
implement new YaRN algorithm
cebtenzzre Sep 5, 2023
b5ced4f
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Sep 5, 2023
826269a
ggml : increase GGML_MAX_OP_PARAMS
cebtenzzre Sep 5, 2023
cf731d5
YaRN : avoid NaN if unused betas are zero
cebtenzzre Sep 5, 2023
dcb058c
YaRN : fix missing parameter in CUDA impl
cebtenzzre Sep 5, 2023
281b26e
convert : reduce unnecessary variables in Params
cebtenzzre Sep 6, 2023
a06c729
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Sep 21, 2023
dc26a0d
llama : simplify use of context params
cebtenzzre Sep 21, 2023
904d4ed
llama : store YaRN parameters in GGUF
cebtenzzre Sep 14, 2023
56abb9a
fix convert scripts
cebtenzzre Sep 21, 2023
43eaf06
llama : fix C compatibility
cebtenzzre Sep 21, 2023
fe788c4
don't hardcode max_pos_emb
cebtenzzre Sep 21, 2023
e0b120c
address review comments
cebtenzzre Sep 21, 2023
19bb74e
restore backwards compatiblity with *.rope.scale_linear
cebtenzzre Sep 21, 2023
4d5fe73
better option descriptions in help
cebtenzzre Sep 21, 2023
7466415
gguf : store scaling type as a string instead of an int
cebtenzzre Oct 7, 2023
4f4e948
improve printing of YaRN parameters
cebtenzzre Oct 7, 2023
5d7a3a5
allow forcing ext_factor to zero if scaling type is YaRN
cebtenzzre Oct 7, 2023
9bd050f
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 7, 2023
babf0e0
fix rope_cuda parameter order
cebtenzzre Oct 8, 2023
0050e1e
default n_yarn_orig_ctx to n_ctx_train
cebtenzzre Oct 8, 2023
09c3102
fix uninitialized cparams
cebtenzzre Oct 8, 2023
57c3442
make printed param formatting more consistent
cebtenzzre Oct 8, 2023
a20b3e6
fix missing import
cebtenzzre Oct 11, 2023
9ef91b1
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 13, 2023
9ae10b3
Fix YaRN inverted scaling and add "rope.scaling.type" to GGUF (#1)
jquesnelle Oct 20, 2023
14cf93b
fix YaRN ramp, make mscale conditional, add --yarn-orig-ctx (#2)
jquesnelle Oct 20, 2023
237f1e7
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 22, 2023
bc8395d
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 23, 2023
4d5ed83
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 24, 2023
9fc8238
fix loading rope.scaling.original_context_length from GGUF (#3)
jquesnelle Oct 30, 2023
15f26ef
implement YaRN for GPT-NeoX RoPE
cebtenzzre Nov 1, 2023
081f738
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Nov 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
YaRN : avoid NaN if unused betas are zero
  • Loading branch information
cebtenzzre committed Sep 21, 2023
commit cf731d56480b8f155cc163d9bd45b681c80fba47
7 changes: 5 additions & 2 deletions ggml-cuda.cu
Original file line number Diff line number Diff line change
Expand Up @@ -4058,8 +4058,11 @@ static __device__ void rope_yarn(
) {
// Get n-d rotational scaling corrected for extrapolation
float theta_interp = freq_scale * theta_extrap;
float ramp_mix = rope_yarn_ramp(corr_dims.v[0], corr_dims.v[1], i0) * ext_factor;
float theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;
float theta = theta_interp;
if (ext_factor != 0.0f) {
float ramp_mix = rope_yarn_ramp(corr_dims.v[0], corr_dims.v[1], i0) * ext_factor;
theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;
}

// Get n-d magnitude scaling corrected for interpolation
if (freq_scale > 1.0f)
Expand Down
7 changes: 5 additions & 2 deletions ggml-metal.metal
Original file line number Diff line number Diff line change
Expand Up @@ -688,8 +688,11 @@ static void rope_yarn(
) {
// Get n-d rotational scaling corrected for extrapolation
float theta_interp = freq_scale * theta_extrap;
float ramp_mix = rope_yarn_ramp(corr_dims[0], corr_dims[1], i0) * ext_factor;
float theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;
float theta = theta_interp;
if (ext_factor != 0.0f) {
ramp_mix = rope_yarn_ramp(corr_dims[0], corr_dims[1], i0) * ext_factor;
theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;
}

// Get n-d magnitude scaling corrected for interpolation
if (freq_scale > 1.0f)
Expand Down
7 changes: 5 additions & 2 deletions ggml.c
Original file line number Diff line number Diff line change
Expand Up @@ -12626,8 +12626,11 @@ static void rope_yarn(
) {
// Get n-d rotational scaling corrected for extrapolation
float theta_interp = freq_scale * theta_extrap;
float ramp_mix = rope_yarn_ramp(corr_dims[0], corr_dims[1], i0) * ext_factor;
float theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;
float theta = theta_interp;
if (ext_factor != 0.0f) {
float ramp_mix = rope_yarn_ramp(corr_dims[0], corr_dims[1], i0) * ext_factor;
theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;
}

// Get n-d magnitude scaling corrected for interpolation
if (freq_scale > 1.0f)
Expand Down