Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DeepseekV2ForCausalLM #7519

Merged
merged 30 commits into from
May 28, 2024
Merged
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c8c353f
Added initial support for DeepseekV2ForCausalLM.
sszymczy May 16, 2024
b24c9ed
Merge branch 'ggerganov:master' into deepseek-v2
fairydreaming May 17, 2024
0398964
Removed unnecessary tensor operations.
sszymczy May 18, 2024
b50c07c
Added five new DeepSeek-V2-specific parameters:
sszymczy May 18, 2024
79f8417
Added initial support for DeepSeek-V2-Lite model.
sszymczy May 18, 2024
6050941
Corrected mscale calculation.
sszymczy May 18, 2024
7e4786b
Added expert_weights_scale parameter for scaling MoE gate weights.
sszymczy May 19, 2024
71a7422
Temporarily hard-coded mscale value for DeepSeek-V2 (FIXME!).
sszymczy May 19, 2024
f99df46
Replaced hardcoded mscale value with rescaling attn_factor that resul…
sszymczy May 19, 2024
3ae7235
Whitespace formatting fixes.
sszymczy May 19, 2024
68a5103
Referenced the relevant GitHub discussion instead of providing long c…
sszymczy May 20, 2024
7be56da
Added YaRN log multiplier model header parameter corresponding to the…
sszymczy May 20, 2024
842ff3f
Added 16B and 236B model types for DeepSeek-V2.
sszymczy May 21, 2024
c033958
Removed usage of output bias tensor since it's not present in DeepSee…
sszymczy May 21, 2024
a54685b
Merge remote-tracking branch 'upstream/master' into deepseek-v2
sszymczy May 24, 2024
bb9c361
gguf-py : re-add SCALING_YARN_LOG_MUL removed during merge by accident
sszymczy May 24, 2024
f3b5e7d
llama : correct llm_build_moe_ffn() arguments in build_arctic()
sszymczy May 26, 2024
abef8b2
llama : code style corrections
sszymczy May 27, 2024
a654cd9
llama : rename n_expert_ff to n_ff_exp
sszymczy May 27, 2024
5a3e6b6
llama : rename qk_rope_head_dim, qk_nope_head_dim variables to n_embd…
sszymczy May 27, 2024
20769c0
llama : remove trailing whitespaces
sszymczy May 27, 2024
fac1e80
llama : rename moe_intermediate_size variable to n_ff_exp
sszymczy May 27, 2024
56f7011
llama : rename n_leading_dense_layer to n_layer_dense_lead
sszymczy May 27, 2024
82cec8b
llama : use attn_factor in mscale calculation to match the rope_yarn(…
sszymczy May 27, 2024
5cc7ec1
llama : rename query_states, key_states, value_states to q_states, k_…
sszymczy May 27, 2024
d02130d
llama : print DeekSeek-V2-specific parameters in llm_load_print_meta()
sszymczy May 27, 2024
bde971a
convert-hf : fix flake8 Lint errors
sszymczy May 27, 2024
98ff6e1
Merge remote-tracking branch 'upstream/master' into deepseek-v2
sszymczy May 28, 2024
841cd47
llama : replace ggml_new_tensor_3d + ggml_set_inplace + ggml_set_inpl…
sszymczy May 28, 2024
3efb659
gguf-py, llama : whitespace formatting fixes
sszymczy May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Prev Previous commit
Next Next commit
Whitespace formatting fixes.
  • Loading branch information
sszymczy committed May 19, 2024
commit 3ae7235e9419085ec47dab72d38f8dcae9dd7e27
2 changes: 1 addition & 1 deletion llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1884,7 +1884,7 @@ struct llama_hparams {
if (!is_float_close(this->f_norm_rms_eps, other.f_norm_rms_eps, EPSILON)) return true;
if (!is_float_close(this->rope_freq_base_train, other.rope_freq_base_train, EPSILON)) return true;
if (!is_float_close(this->rope_freq_scale_train, other.rope_freq_scale_train, EPSILON)) return true;
if (!is_float_close(this->expert_weights_scale, other.expert_weights_scale,EPSILON)) return true;
if (!is_float_close(this->expert_weights_scale, other.expert_weights_scale, EPSILON)) return true;

return false;
}
Expand Down