Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama: implement YaRN RoPE scaling #2268

Merged
merged 36 commits into from
Nov 1, 2023
Merged
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
8dec38c
llama: implement NTK-By-Parts (NTKv2) RoPE scaling
cebtenzzre Jul 18, 2023
6aeb46b
CUDA implementation
cebtenzzre Jul 19, 2023
9348aa4
Metal implementation
cebtenzzre Jul 21, 2023
a30ae20
implement new YaRN algorithm
cebtenzzre Sep 5, 2023
b5ced4f
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Sep 5, 2023
826269a
ggml : increase GGML_MAX_OP_PARAMS
cebtenzzre Sep 5, 2023
cf731d5
YaRN : avoid NaN if unused betas are zero
cebtenzzre Sep 5, 2023
dcb058c
YaRN : fix missing parameter in CUDA impl
cebtenzzre Sep 5, 2023
281b26e
convert : reduce unnecessary variables in Params
cebtenzzre Sep 6, 2023
a06c729
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Sep 21, 2023
dc26a0d
llama : simplify use of context params
cebtenzzre Sep 21, 2023
904d4ed
llama : store YaRN parameters in GGUF
cebtenzzre Sep 14, 2023
56abb9a
fix convert scripts
cebtenzzre Sep 21, 2023
43eaf06
llama : fix C compatibility
cebtenzzre Sep 21, 2023
fe788c4
don't hardcode max_pos_emb
cebtenzzre Sep 21, 2023
e0b120c
address review comments
cebtenzzre Sep 21, 2023
19bb74e
restore backwards compatiblity with *.rope.scale_linear
cebtenzzre Sep 21, 2023
4d5fe73
better option descriptions in help
cebtenzzre Sep 21, 2023
7466415
gguf : store scaling type as a string instead of an int
cebtenzzre Oct 7, 2023
4f4e948
improve printing of YaRN parameters
cebtenzzre Oct 7, 2023
5d7a3a5
allow forcing ext_factor to zero if scaling type is YaRN
cebtenzzre Oct 7, 2023
9bd050f
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 7, 2023
babf0e0
fix rope_cuda parameter order
cebtenzzre Oct 8, 2023
0050e1e
default n_yarn_orig_ctx to n_ctx_train
cebtenzzre Oct 8, 2023
09c3102
fix uninitialized cparams
cebtenzzre Oct 8, 2023
57c3442
make printed param formatting more consistent
cebtenzzre Oct 8, 2023
a20b3e6
fix missing import
cebtenzzre Oct 11, 2023
9ef91b1
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 13, 2023
9ae10b3
Fix YaRN inverted scaling and add "rope.scaling.type" to GGUF (#1)
jquesnelle Oct 20, 2023
14cf93b
fix YaRN ramp, make mscale conditional, add --yarn-orig-ctx (#2)
jquesnelle Oct 20, 2023
237f1e7
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 22, 2023
bc8395d
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 23, 2023
4d5ed83
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 24, 2023
9fc8238
fix loading rope.scaling.original_context_length from GGUF (#3)
jquesnelle Oct 30, 2023
15f26ef
implement YaRN for GPT-NeoX RoPE
cebtenzzre Nov 1, 2023
081f738
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Nov 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix uninitialized cparams
  • Loading branch information
cebtenzzre committed Oct 8, 2023
commit 09c31027db2e620d7b97b827fe5b6e3945fd7504
20 changes: 12 additions & 8 deletions llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7888,14 +7888,18 @@ struct llama_context * llama_new_context_with_model(
const auto & hparams = model->hparams;
auto & cparams = ctx->cparams;

cparams.n_batch = params.n_batch;
cparams.n_ctx = params.n_ctx == 0 ? hparams.n_ctx_train : params.n_ctx;
cparams.rope_freq_base = params.rope_freq_base == 0.0f ? hparams.rope_freq_base_train : params.rope_freq_base;
cparams.rope_freq_scale = params.rope_freq_scale == 0.0f ? hparams.rope_freq_scale_train : params.rope_freq_scale;
cparams.yarn_ext_factor = params.yarn_ext_factor;
cparams.n_threads = params.n_threads;
cparams.n_threads_batch = params.n_threads_batch;
cparams.mul_mat_q = params.mul_mat_q;
cparams.n_batch = params.n_batch;
cparams.n_threads = params.n_threads;
cparams.n_threads_batch = params.n_threads_batch;
cparams.yarn_ext_factor = params.yarn_ext_factor;
cparams.yarn_attn_factor = params.yarn_attn_factor;
cparams.yarn_beta_fast = params.yarn_beta_fast;
cparams.yarn_beta_slow = params.yarn_beta_slow;
cparams.mul_mat_q = params.mul_mat_q;

cparams.n_ctx = params.n_ctx == 0 ? hparams.n_ctx_train : params.n_ctx;
cparams.rope_freq_base = params.rope_freq_base == 0.0f ? hparams.rope_freq_base_train : params.rope_freq_base;
cparams.rope_freq_scale = params.rope_freq_scale == 0.0f ? hparams.rope_freq_scale_train : params.rope_freq_scale;

auto rope_scaling_type = params.rope_scaling_type;
if (rope_scaling_type == LLAMA_ROPE_SCALING_UNSPECIFIED) {
Expand Down