Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune LORA #2632

Merged
merged 247 commits into from
Sep 28, 2023
Merged
Changes from 1 commit
Commits
Show all changes
247 commits
Select commit Hold shift + click to select a range
5d124d0
fix track_max_mem in forward_batch_wo_cache_flash_attn_train
xaedes Jun 15, 2023
d39c8e6
remove unnecessary Adam(W) optimizer tensors.
xaedes Jun 15, 2023
d395b19
add gradient clipping to AdamW
xaedes Jun 15, 2023
d7003a9
Fix reset of unused g->nodes and g->grads to NULL
xaedes Jun 17, 2023
6e3f95b
implement gradient checkpointing for training
xaedes Jul 28, 2023
e05e441
remove unused compute buffer 3
xaedes Jun 27, 2023
ed4319e
add and use function ggml_build_backward_expand to avoid stack overfl…
xaedes Jul 28, 2023
a80f184
change AdamW decay parameter to work like the torch AdamW decay param…
xaedes Jun 29, 2023
f175ead
change default AdamW weight decay parameter used in training to 0.1 a…
xaedes Jun 29, 2023
97964a4
change default AdamW weight decay parameter defined in ggml to 0.0, m…
xaedes Jun 29, 2023
2c6985f
bug fixes for cross entropy loss
xaedes Jul 2, 2023
2d1e6e0
fix test-grad0 for cross_entropy_loss
xaedes Jul 2, 2023
864e7e3
fix test-grad0 for soft_max
xaedes Jul 2, 2023
87febee
improve finite differences of test-grad0 by using double instead of f…
xaedes Jul 2, 2023
51dc770
change cross_entropy_loss to output average over all rows
xaedes Jul 2, 2023
3744a9b
improve gradient checkpointing
xaedes Jul 2, 2023
fc379a2
disable gradient checkpointing debug output
xaedes Jul 2, 2023
d0fbb7d
llama : fix rope usage in train-text-from-scratch after ChatGLM change
xaedes Jul 28, 2023
c6a18e1
add more training parameters:
xaedes Jul 2, 2023
ce937bc
replace memcpy with reshape operation so that the graph is not cut at…
xaedes Jul 2, 2023
ff759d9
remove unused function argument from get_example_targets_batch
xaedes Jul 2, 2023
e843d6e
measure and print total training time
xaedes Jul 2, 2023
bfc3119
add optimization callback to ggml_opt_resume_g
xaedes Jul 2, 2023
d7aa4d9
use optimization callback in training
xaedes Jul 2, 2023
e6ff072
add minimum number of tensor dimensions to apply weight decay (defaul…
xaedes Jul 2, 2023
58024d3
rename training parameter cos-decay-alpha to cos-decay-min and clarif…
xaedes Jul 3, 2023
17a0898
fix increase of model.train_samples and model.train_tokens
xaedes Jul 3, 2023
24a4b09
change sampling parameters for prediction after training to defaults …
xaedes Jul 3, 2023
1065c3b
tighten abs error bounds for cross_entropy_loss in test-grad0
xaedes Jul 3, 2023
dbbc263
add conditional compilation of using F16 exp in flash attention
xaedes Jul 3, 2023
47055c9
tighten abs error bounds for flash_attn in test-grad0
xaedes Jul 3, 2023
0f6a8ab
tighten abs error bounds for sqrt in test-grad0
xaedes Jul 3, 2023
87035b9
remove out-commented vectorized code of opt_adam
xaedes Jul 3, 2023
ecdc161
ggml : update ggml_rms_norm_back with configurable eps
xaedes Jul 28, 2023
c1a5e11
llama training : fix ggml_rms_norm_back calls to pass configurable eps
xaedes Jul 28, 2023
22cb368
remove trailing whitespace
xaedes Jul 28, 2023
d43af4b
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes Aug 6, 2023
2bf422e
add train function using automatic gradient checkpointing backward pa…
xaedes Aug 6, 2023
fc826c8
in train function replace add_inplace by regular add
xaedes Aug 14, 2023
d437415
don't use allocate hash_map on context
xaedes Aug 14, 2023
cfddc36
correctly clone reshape and permute operations by also cloning tensor…
xaedes Aug 14, 2023
0dd496c
fix variable name and add missing type cast
xaedes Aug 14, 2023
52c92c0
terminate recursive tensor cloning when reaching tensor without src t…
xaedes Aug 14, 2023
345f516
correctly clone view tensors by setting data pointers
xaedes Aug 14, 2023
5a11b75
fix variable names
xaedes Aug 14, 2023
b2f1310
swap arguments to commutative ops to be the same as in `forward_batch…
xaedes Aug 14, 2023
5884b43
add input tensors as checkpoints
xaedes Aug 14, 2023
9716eb8
fix variable name and add missing boolean negation
xaedes Aug 14, 2023
38f4438
make sure some tensors are not reallocated by inserting new temporary…
xaedes Aug 14, 2023
d6c5b03
fix ASSERT to work with zero layers
xaedes Aug 14, 2023
4ed096c
add training options whether to use allocator and/or unified training…
xaedes Aug 14, 2023
865c4cd
integrate unified training function which may use memory allocator
xaedes Aug 14, 2023
3e99a8d
format name of cloned tensors with " (clone)" suffix
xaedes Aug 14, 2023
75baed2
set names for tensors in unified train function for easier debugging
xaedes Aug 14, 2023
fe788a1
allocate graph on context using ggml_new_graph
xaedes Aug 14, 2023
c954f41
remove handwritten training functions
xaedes Aug 14, 2023
271e4d6
remove unused training parameters "use_scratch" and "use_unified"
xaedes Aug 14, 2023
6f161c7
remove trailing whitespace
xaedes Aug 14, 2023
3794dce
remove unused train params: mem_compute1_gb & mem_compute2_gb
xaedes Aug 14, 2023
6e280b2
remove unused forward_batch function
xaedes Aug 14, 2023
faf3e21
add debug asserts in ggml_allocr_alloc to some common pitfalls when u…
xaedes Aug 14, 2023
098654c
only use ggml_allocr_alloc when tensor has NULL data and is no view
xaedes Aug 14, 2023
3e6468b
fix test when to create temporary backward graph
xaedes Aug 14, 2023
5622846
fix memory "leak" in optimizers
xaedes Aug 14, 2023
3b5515b
reverse order of for loop in ggml_build_backward_expand to save memor…
xaedes Aug 14, 2023
316b070
add API functions to access llama model tensors
xaedes Aug 6, 2023
5e059ac
add stub example for finetuning, based on train-text-from-scratch
xaedes Aug 15, 2023
9eb1ef8
move and remove code
xaedes Aug 15, 2023
c0a372f
add API functions to access remaining model parameters:
xaedes Aug 16, 2023
28ee0c8
first draft for LORA finetune training
xaedes Aug 16, 2023
50b1e66
remove const model and layer arguments in API functions for accessing…
xaedes Aug 16, 2023
be7e564
bug fixes to make finetune compile
xaedes Aug 16, 2023
6202753
add debug prints for training memory improvements
xaedes Aug 16, 2023
0ab2507
fix names of lora tensors
xaedes Aug 16, 2023
39a2d15
avoid stack overflow resulting from big ggml_cgraph
xaedes Aug 16, 2023
1151653
replace llama API functions to get model tensors by one function to g…
xaedes Aug 16, 2023
79ad888
remove unused call to not existing llama_get_layer_from_model
xaedes Aug 16, 2023
83cb9ed
implement ggml_compute_forward_out_prod_q_f32
xaedes Aug 16, 2023
83a4ad7
remove trailing whitespace
xaedes Aug 16, 2023
f80e245
add lora finetune support on quantized base model tensors
xaedes Aug 16, 2023
9198b24
add ggml_add_cast API function
xaedes Aug 16, 2023
714fec0
use ggml_add_cast in finetuning
xaedes Aug 16, 2023
0bb897c
bug fix: actually use result type passed to ggml_add_cast
xaedes Aug 17, 2023
44526cb
make sure base model tensors data cannot be used in viewable operations
xaedes Aug 18, 2023
a252111
fix bug in ggml_out_prod which resulted in wrong n_dims of result ten…
xaedes Aug 18, 2023
f358204
avoid keeping in memory ALL of the gradients
xaedes Aug 18, 2023
011f47f
remove trailing whitespace
xaedes Aug 18, 2023
a0c2752
remove debug prints and function to compute tensor data hash
xaedes Aug 18, 2023
113c90f
improve optimization iteration prints
xaedes Aug 18, 2023
7a63d42
adjust maximal values to support finetuning 3B models
xaedes Aug 18, 2023
63cb374
change default finetune params lora_r and lora_alpha to match the n_r…
xaedes Aug 18, 2023
6c98640
bug fix: make sure finetune input gradient is allocated at begin and …
xaedes Aug 18, 2023
65b0561
remove unnecessary src tensor from ggml_get_rows_back
xaedes Aug 18, 2023
3e47890
remove unnecessary src tensor from ggml_repeat & ggml_repeat_back
xaedes Aug 18, 2023
37dfb54
resolve todo
xaedes Aug 18, 2023
d61ed6b
mixing multiple LORA adapters is now possible
xaedes Aug 20, 2023
27c24ff
add option to save finetune output every N iterations
xaedes Aug 20, 2023
8b4106a
also save latest finetune output with ITERATION="LATEST" and print wh…
xaedes Aug 21, 2023
77a3092
update checkpoint train stats before saving via "--save-every"
xaedes Aug 23, 2023
1a5f0a3
add command line option `--rank-wo N` for rank of wo tensor
xaedes Aug 23, 2023
7df517c
update finetune README
xaedes Aug 23, 2023
b04263c
Merge branch 'master' into finetune-lora
xaedes Aug 28, 2023
aecc3b3
fix dump_non_result_info_yaml to output multiple lora adapters
xaedes Aug 28, 2023
aa8016e
bug fix: replace GGML_TYPE_SIZE[t] by ggml_type_size(t)
xaedes Aug 28, 2023
daedc6f
replace llama_n_mult by llama_n_ff
xaedes Aug 28, 2023
5ce92ae
finetune bug fixes to compile with merged in code from master
xaedes Aug 28, 2023
271c030
remove prediction related code to reduce duplicated code with main
xaedes Aug 28, 2023
9a28bce
reduce large memory overhead in train-text-from-scratch
xaedes Aug 28, 2023
49af7fb
add comment explaining why finetune checkpoints are allocated in one …
xaedes Aug 28, 2023
007280c
make default value of float member a float literal
xaedes Aug 28, 2023
1faee64
handle rms_norm and rope parameters the same as in train-text-from-sc…
xaedes Aug 28, 2023
a3b4529
remove unused code
xaedes Aug 28, 2023
ca97583
remove vocab related code as it is unnecessary
xaedes Aug 28, 2023
e030f7b
add LLM_KV_TRAINING_TYPE to train-text-from-scratch checkpoints
xaedes Aug 28, 2023
ecb1b20
add gguf constants and load/save functions from train-text-from-scratch
xaedes Aug 28, 2023
0564f4e
add load & save lora finetune checkpoints via gguf
xaedes Aug 29, 2023
6134ad4
add python script to convert old finetune checkpoint files to gguf
xaedes Aug 29, 2023
1425968
remove old checkpoint save & load code
xaedes Aug 29, 2023
ebff3a1
remove code to print data checksums which was used to verify correctn…
xaedes Aug 29, 2023
5813ac8
omit tokenization when training is disabled, only save llama lora ada…
xaedes Aug 29, 2023
a6165da
remove trailing whitespace
xaedes Aug 29, 2023
e28cf7e
update README.md
xaedes Aug 29, 2023
794bb7e
implement ggml_compute_forward_repeat_f16
xaedes Aug 29, 2023
5f0a4e9
avoid stack overflow of large cgraphs in test-grad0
xaedes Aug 29, 2023
82c5247
add ggml API functions ggml_unravel_index, ggml_get_i32_nd and its an…
xaedes Aug 29, 2023
5fcfa7e
increase test-grad0 context mem size to accommodate for bigger cgraph
xaedes Aug 29, 2023
b1aa26f
add sanity check to ggml_compute_backward, asserting the correct shap…
xaedes Aug 29, 2023
a76e66a
fix ggml_acc_or_set to return tensor of correct shape
xaedes Aug 29, 2023
dd4e4bc
remove unused 'inplace' argument from ggml_compute_backward function
xaedes Aug 29, 2023
8a96d4c
add missing argument 'int i0' to ggml_get_i32_nd & ggml_set_i32_nd he…
xaedes Aug 29, 2023
281245a
Merge branch 'master' into finetune-lora
xaedes Aug 29, 2023
5854f51
fix error message in ggml_allocr_alloc to display actual max_avail
xaedes Aug 29, 2023
bf70e27
fix check_gradient
xaedes Aug 29, 2023
b1709f2
Merge branch 'master' into finetune-lora
xaedes Aug 30, 2023
2392b67
use tensor->view_src instead of ggml_is_view and get_view_source
xaedes Aug 30, 2023
d487e05
move gradient checkpointing code into ggml, new API function:
xaedes Aug 30, 2023
e6b7158
replace custom data getters and setters by ggml functions
xaedes Aug 30, 2023
fc456ed
train-text-from-scratch can train (full finetune) gguf models
xaedes Aug 30, 2023
f3590ad
remove trailing whitespace
xaedes Aug 30, 2023
b26bd4c
add option to save train-text-from-scratch output every N iterations
xaedes Aug 30, 2023
4e986ac
update README.md
xaedes Aug 30, 2023
0c57f9f
fix warnings
xaedes Aug 30, 2023
4fd51c4
fix warnings
xaedes Aug 30, 2023
e0da168
remove finetune option to disable allocator
xaedes Aug 31, 2023
4914f85
add tensor checkpoints only when gradient checkpointing is enabled
xaedes Aug 31, 2023
d554a70
initialize opt ggml context if none was provided
xaedes Sep 1, 2023
7e01d11
add ggml-alloc API function 'ggml_allocr_max_size' to get max size of…
xaedes Sep 1, 2023
5bba329
finetune: automatically allocate all memory and changes to command li…
xaedes Sep 1, 2023
6cbf55a
add finetune to Makefile
xaedes Sep 1, 2023
7acb124
update README.md
xaedes Sep 1, 2023
6809eb7
Merge branch 'master' into finetune-lora
xaedes Sep 1, 2023
c32ad44
print time per iteration and estimate remaining time
xaedes Sep 1, 2023
6ee12b1
increase measured alloc size by tensor_alignment
xaedes Sep 2, 2023
cfe217f
fix README.md
xaedes Sep 2, 2023
ded6382
add some more allocator debug prints
xaedes Sep 2, 2023
8d982c8
bug fix, probably solves the 'ggml_allocr_alloc: not enough space in …
xaedes Sep 2, 2023
1ce7023
revert last commit
xaedes Sep 2, 2023
2d2bdc0
remove unnecessary "0x" before "%p" output
xaedes Sep 2, 2023
80ac697
move measurement memory segment to upper region of the address space
xaedes Sep 2, 2023
406e075
update README.md
xaedes Sep 3, 2023
e07f5c5
fix printf format warnings
xaedes Sep 3, 2023
bdb7092
add missing gguf_free in load_checkpoint_lora_file
xaedes Sep 3, 2023
50589ed
load default rms_norm and rope parameters from base model
xaedes Sep 3, 2023
9ea2f7f
Merge branch 'master' into finetune-lora
xaedes Sep 4, 2023
d3afd71
Merge branch 'master' into finetune-lora
xaedes Sep 4, 2023
c1c3b0e
add gradient accumulation
xaedes Sep 4, 2023
d07b6aa
fix tracking of train_samples and train_tokens
xaedes Sep 5, 2023
786e786
build : fix compile warnings
ggerganov Sep 5, 2023
d375b8f
ggml : fix L-BFGS linesearch loop
ggerganov Sep 5, 2023
867e7c2
Merge branch 'master' into finetune-lora
xaedes Sep 5, 2023
8c2d7e3
improve finetune time measurement
xaedes Sep 6, 2023
c08fcf5
specify default lora rank with '--lora-r N'
xaedes Sep 6, 2023
0393116
Merge branch 'master' into finetune-lora
xaedes Sep 6, 2023
de6170d
fix gradient accumulation bug where the same batch was used for each …
xaedes Sep 6, 2023
0c2c9c7
fix gradient accumulation bug where the same batch was used for each …
xaedes Sep 6, 2023
d7aade7
support grouped-query-attention in ggml_flash_attn and ggml_flash_att…
xaedes Sep 9, 2023
833a56c
add llama API functions to get grouped-query-attention n_head paramet…
xaedes Sep 9, 2023
35260f7
fix finetune to support grouped-query-attention (using flash-attention)
xaedes Sep 9, 2023
aea8b6b
support broadcastable a in out_prod(a, b) and backward pass of broadc…
xaedes Sep 9, 2023
dd32786
test broadcasting mul_mat backward pass
xaedes Sep 9, 2023
9738526
decouple random number generator of each operation test
xaedes Sep 9, 2023
d3aaf08
add comment briefly describing what ggml_repeat_back does
xaedes Sep 9, 2023
d3f1b43
simplify broadcasting mul_mat backward using ggml_repeat_back
xaedes Sep 9, 2023
917d287
add cgraph evaluation order member and corresponding enum type
xaedes Sep 9, 2023
ace9088
measure max compute size for each cgraph eval order and use best order
xaedes Sep 9, 2023
54b21a3
Merge branch 'master' into finetune-lora
xaedes Sep 9, 2023
1cef459
remove unused command line options
xaedes Sep 9, 2023
0e32932
add sample start patterns and options to force new or by default resu…
xaedes Sep 13, 2023
7898652
update shuffle rng state on reshuffle
xaedes Sep 13, 2023
ec57689
exclude known zero values from computations in flash_attn_f32 & flash…
xaedes Sep 13, 2023
7f378a7
remove probably unnecessary exception type flags from stringstream
xaedes Sep 13, 2023
f627e2f
pass correct max number of tokens to llama_tokenize
xaedes Sep 14, 2023
2c59f7b
account for possible leading whitespace that will be added by tokenizer
xaedes Sep 14, 2023
20cf1a4
use unrolled vec_mad in out_prod
xaedes Sep 14, 2023
3a9c1d7
set lora_alpha to value of lora_r if it is not set via command line
xaedes Sep 14, 2023
0971fee
reshuffle original sample order instead of the previous shuffled order
xaedes Sep 14, 2023
d88dae2
block tiling for out-prod inspired by mul-mat
xaedes Sep 14, 2023
76804fa
exclude some more known zero values from computations in flash_attn_f…
xaedes Sep 14, 2023
4f2ce91
add static keywords
xaedes Sep 15, 2023
cc60b3f
remove outcommented old code
xaedes Sep 15, 2023
ab56b63
update train-text-from-scratch with tokenization, sample selection an…
xaedes Sep 15, 2023
00b656f
remove lbfgs related train parameters
xaedes Sep 16, 2023
9f4b1bf
move common train functions into common/train.[h|cpp]
xaedes Sep 16, 2023
a8c8907
move train state into struct train_state
xaedes Sep 16, 2023
ee27333
move train data saving code into callback to unify code of opt_callback
xaedes Sep 16, 2023
e9758ae
move common train params into common/train
xaedes Sep 16, 2023
bef1e97
move common opt_callback into common/train
xaedes Sep 16, 2023
7aa9ea7
fix consume_common_train_arg
xaedes Sep 16, 2023
48d3509
save and load head_count_kv in lora checkpoints
xaedes Sep 16, 2023
571dc94
increase train_samples by used_samples instead of number of batches
xaedes Sep 16, 2023
d3e06d3
Merge branch 'master' into finetune-lora
xaedes Sep 16, 2023
7930caf
fix usage of llama_tokenize
xaedes Sep 16, 2023
8d82d4c
remove static from process_escape since we need it exposed in header
xaedes Sep 16, 2023
9139fec
fix code formating of long function declarations
xaedes Sep 16, 2023
1d33ec5
fix condition in load_train_state_gguf
xaedes Sep 16, 2023
1d09965
use die("msg") instead of replace GGML_ASSERT(!"msg") or throw std::r…
xaedes Sep 16, 2023
9db2664
fix saving and loading of training type
xaedes Sep 16, 2023
dd3e763
remove terminating '\0' from tokenization
xaedes Sep 16, 2023
83061fb
fix compile warnings
xaedes Sep 16, 2023
8721785
fix compile warnings
xaedes Sep 16, 2023
ddf5ac2
use new/delete for train_state instead of malloc/free
xaedes Sep 17, 2023
151bfe9
assert that sample_count > 0, avoiding division by zero
xaedes Sep 17, 2023
bf2ad65
fix frand to return value in interval [0,1)
xaedes Sep 17, 2023
d1bb6fb
add train option "--sample-random-offsets"
xaedes Sep 17, 2023
56a03fa
deduplicate code into function
xaedes Sep 17, 2023
1dbd6bc
remove n_rot hparam, as it must always be hparam.n_embd_head()
xaedes Sep 17, 2023
5ed3098
align code
xaedes Sep 17, 2023
b0ee563
assert correct base model tensor shapes
xaedes Sep 17, 2023
934ad8d
move some params from lora hparams into model hparams and load model …
xaedes Sep 17, 2023
dd94ce4
remove now unnecessary llama API functions to get model params that w…
xaedes Sep 17, 2023
9e10fa9
train-text-from-scratch: automatically allocate model tensors, remove…
xaedes Sep 17, 2023
db38d2b
train-text-from-scratch: automatically allocate opt context
xaedes Sep 17, 2023
f9b5d9b
train-text-from-scratch: automatically allocate input tensors
xaedes Sep 17, 2023
c993246
train-text-from-scratch: automatically allocate compute memory
xaedes Sep 17, 2023
3b9d974
remove unused options and equalize train-text-from-scratch with finetune
xaedes Sep 17, 2023
5ce74ee
initialize opt->loss_after with zero
xaedes Sep 17, 2023
0ede0f4
add export-lora program
xaedes Sep 22, 2023
b91e3dd
remove trailing whitespace
xaedes Sep 22, 2023
d38260b
add export-lora build in Makefile
xaedes Sep 22, 2023
904c19b
remove unused struct tensor_info from export-lora
xaedes Sep 22, 2023
758c46c
add export-lora build dependency to llama
xaedes Sep 22, 2023
9145c87
update finetune README.md
xaedes Sep 22, 2023
da05205
cancel optimization when specified number of epochs is completed
xaedes Sep 22, 2023
2912f17
improve handling of export-lora arguments
xaedes Sep 24, 2023
ad64e33
Fix export-lora.cpp "not enough space in the context's memory pool" (#1)
meatbag-18a Sep 24, 2023
1660658
improve handling of not yet supported tensor types
xaedes Sep 24, 2023
5461129
Merge branch 'master' into HEAD
ggerganov Sep 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
change sampling parameters for prediction after training to defaults …
…of common.h

and clarify what is context for prediction and what are generated tokens
  • Loading branch information
xaedes committed Jul 28, 2023
commit 24a4b099f37ae2deef2296a0dae4b6fc5f27b266
50 changes: 30 additions & 20 deletions examples/train-text-from-scratch/train-text-from-scratch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2799,19 +2799,19 @@ void shuffle_ints(int * begin, int * end) {
}

struct my_llama_sampler_params {
float temp = 0.0f; // <= 0.0 disabled
int top_k = 20; // <= 0 to use vocab size
float top_p = 0.95f; // 1.0 = disabled
float tfs_z = 1.00f; // 1.0 = disabled
float typical_p = 1.00f; // 1.0 = disabled
int repeat_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size)
float repeat_penalty = 1.0f; // 1.0 = disabled
float alpha_presence = 0.0f; // 0.0 = disabled
float alpha_frequency = 0.0f; // 0.0 = disabled
int mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
float mirostat_tau = 5.00f; // target entropy
float mirostat_eta = 0.10f; // learning rate
bool penalize_nl = true; // consider newlines as a repeatable token
float temp = 0.0f; // <= 0.0 disabled
int top_k = 20; // <= 0 to use vocab size
float top_p = 0.95f; // 1.0 = disabled
float tfs_z = 1.00f; // 1.0 = disabled
float typical_p = 1.00f; // 1.0 = disabled
int repeat_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size)
float repeat_penalty = 1.0f; // 1.0 = disabled
float presence_penalty = 0.0f; // 0.0 = disabled
float frequency_penalty = 0.0f; // 0.0 = disabled
int mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
float mirostat_tau = 5.00f; // target entropy
float mirostat_eta = 0.10f; // learning rate
bool penalize_nl = true; // consider newlines as a repeatable token
};

struct my_llama_sampler {
Expand Down Expand Up @@ -2871,8 +2871,8 @@ llama_token sample(struct my_llama_sampler * sampler, float * logits, const llam
candidates_p,
last_tokens + n_last_tokens - n_last,
n_last,
params.alpha_frequency,
params.alpha_presence);
params.frequency_penalty,
params.presence_penalty);

if (!params.penalize_nl) {
logits[llama_token_nl()] = nl_logit;
Expand Down Expand Up @@ -4203,12 +4203,22 @@ int main(int argc, char ** argv) {
int n_gen = params.n_predict;
int sample_ctx = n_tokens - n_tokens/8;

sampler.params.temp = 0.2f;
sampler.params.repeat_penalty = 1.1f;
sampler.params.mirostat = 2;
// use defaults from common.h
sampler.params.top_k = 40;
sampler.params.top_p = 0.95f;
sampler.params.tfs_z = 1.00f;
sampler.params.typical_p = 1.00f;
sampler.params.temp = 0.8f;
sampler.params.repeat_penalty = 1.1f;
sampler.params.repeat_last_n = 64;
sampler.params.frequency_penalty = 0.0f;
sampler.params.presence_penalty = 0.0f;
sampler.params.mirostat = 0;
sampler.params.mirostat_tau = 5.00f;
sampler.params.mirostat_eta = 0.10f;
init_sampler(&sampler, lctx);

printf("Generating %d tokens.\n", n_gen);
printf("[Prediction context]\n");

struct ggml_tensor * tokens_input = ggml_new_tensor_1d(model.ctx, GGML_TYPE_I32, n_tokens);
struct ggml_tensor * target_logits = ggml_new_tensor_2d(model.ctx, GGML_TYPE_F32, n_vocab, n_tokens);
Expand All @@ -4223,7 +4233,7 @@ int main(int argc, char ** argv) {
print_token(lctx, ggml_get_i32_1d(tokens_input, i));
}

printf("---\n");
printf("\n[Generating %d tokens]\n", n_gen);
for (int i=0; i<n_gen; ++i) {
struct ggml_init_params cparams = {
compute_size, // .mem_size
Expand Down