Skip to content

Commit

Permalink
llama.cpp : add documentation about rope_freq_base and scale values (g…
Browse files Browse the repository at this point in the history
…gerganov#3401)

* llama.cpp : add documentation about rope_freq_base and scale values

* add notice to hot topics
  • Loading branch information
slaren committed Sep 29, 2023
1 parent bc34dd4 commit 40e07a6
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 5 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++

### Hot topics

- ‼️ Breaking change: `rope_freq_base` and `rope_freq_scale` must be set to zero to use the model default values: [#3401](https://github.com/ggerganov/llama.cpp/pull/3401)
- Parallel decoding + continuous batching support added: [#3228](https://github.com/ggerganov/llama.cpp/pull/3228) \
**Devs should become familiar with the new API**
- Local Falcon 180B inference on Mac Studio
Expand Down
10 changes: 5 additions & 5 deletions llama.h
Original file line number Diff line number Diff line change
Expand Up @@ -167,18 +167,18 @@ extern "C" {

struct llama_context_params {
uint32_t seed; // RNG seed, -1 for random
uint32_t n_ctx; // text context
uint32_t n_batch; // prompt processing batch size
uint32_t n_ctx; // text context, 0 = from model
uint32_t n_batch; // prompt processing maximum batch size
uint32_t n_threads; // number of threads to use for generation
uint32_t n_threads_batch; // number of threads to use for batch processing

// ref: https://github.com/ggerganov/llama.cpp/pull/2054
float rope_freq_base; // RoPE base frequency
float rope_freq_scale; // RoPE frequency scaling factor
float rope_freq_base; // RoPE base frequency, 0 = from model
float rope_freq_scale; // RoPE frequency scaling factor, 0 = from model

// Keep the booleans together to avoid misalignment during copy-by-value.
bool mul_mat_q; // if true, use experimental mul_mat_q kernels
bool f16_kv; // use fp16 for KV cache
bool f16_kv; // use fp16 for KV cache, fp32 otherwise
bool logits_all; // the llama_eval() call computes all logits, not just the last one
bool embedding; // embedding mode only
};
Expand Down

0 comments on commit 40e07a6

Please sign in to comment.