Tags: a-h/llama.cpp
Tags
samplers : Min-P sampler implementation [alternative to Top P/Top K] (g… …gerganov#3841) * Introduce the new Min-P sampler by @kalomaze The Min-P sampling method was designed as an alternative to Top-P, and aims to ensure a balance of quality and variety. The parameter *p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token. * Min-P enabled and set to 0.05 default --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: cebtenzzre <[email protected]>
ggml : move FP16 <-> FP32 code to ggml-impl.h (ggerganov#3861) * ggml : move FP16 <-> FP32 stuff to ggml-impl.h ggml-ci * tests : fix ARM build * ggml : explicitly initialize deprecated type traits * ggml : add math.h to ggml-impl.h * ggml : remove duplicate static assert macros * ggml : prefix lookup tables with ggml_ ggml-ci * ggml-impl : move extern "C" to start of file
Extend llama_kv_cache_seq_rm to allow matching any sequence (ggergano… …v#3843) * Extend llama_kv_cache_seq_rm to allow matichng any sequence * Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
make : remove unnecessary dependency on build-info.h (ggerganov#3842)
ggml : quantization refactoring (ggerganov#3833) * ggml : factor all quantization code in ggml-quants ggml-ci * ggml-quants : fix Zig and Swift builds + quantize tool ggml-ci * quantize : --pure option for disabling k-quant mixtures --------- Co-authored-by: cebtenzzre <[email protected]>
metal : try cwd for ggml-metal.metal if bundle lookup fails (ggergano… …v#3793) * Try cwd for ggml-metal if bundle lookup fails When building with `-DBUILD_SHARED_LIBS=ON -DLLAMA_METAL=ON -DLLAMA_BUILD_SERVER=ON`, `server` would fail to load `ggml-metal.metal` because `[bundle pathForResource:...]` returns `nil`. In that case, fall back to `ggml-metal.metal` in the cwd instead of passing `null` as a path. Follows up on ggerganov#1782 * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <[email protected]>
llama : allow quantizing k-quants to fall back when tensor size incom… …patible (ggerganov#3747) * Allow quantizing k-quants to fall back when tensor size incompatible * quantizing: Add warning when tensors were incompatible with k-quants Clean up k-quants state passing a bit
llama : add option for greedy sampling with probs (ggerganov#3813) * llama : add option for greedy sampling with probs * llama : add comment about llama_sample_token_greedy() missing probs * sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs
common : print that one line of the syntax help *also* to standard ou… …tput (ggerganov#3823)
PreviousNext