Releases · ggerganov/llama.cpp

29 Jul 17:29

75af08c

ggml: bugfix: fix the inactive elements is agnostic for risc-v vector…

… (#8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <[email protected]>

Assets 20

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-07-29T17:29:54Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-07-29T17:30:00Z
llama-b3488-bin-macos-arm64.zip

45.6 MB 2024-07-29T17:30:08Z
llama-b3488-bin-macos-x64.zip

47 MB 2024-07-29T17:30:10Z
llama-b3488-bin-ubuntu-x64.zip

50.9 MB 2024-07-29T17:30:11Z
llama-b3488-bin-win-avx-x64.zip

7.24 MB 2024-07-29T17:30:12Z
llama-b3488-bin-win-avx2-x64.zip

7.24 MB 2024-07-29T17:30:13Z
llama-b3488-bin-win-avx512-x64.zip

7.24 MB 2024-07-29T17:30:14Z
llama-b3488-bin-win-cuda-cu11.7.1-x64.zip

124 MB 2024-07-29T17:30:15Z
llama-b3488-bin-win-cuda-cu12.2.0-x64.zip

123 MB 2024-07-29T17:30:17Z
Source code (zip)

2024-07-29T16:38:34Z
Source code (tar.gz)

2024-07-29T16:38:34Z
Loading

29 Jul 14:28

github-actions

b3487

439b3fc

b3487

cuda : organize vendor-specific headers into vendors directory (#8746)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 20

29 Jul 03:50

github-actions

b3486

0832de7

b3486

[SYCL] add conv support (#8688)

Assets 20

28 Jul 21:31

github-actions

b3485

6eeaeba

b3485

cmake: use 1 more thread for non-ggml in CI (#8740)

Assets 20

28 Jul 08:44

github-actions

b3484

4730fac

b3484

chore : Fix vulkan related compiler warnings, add help text, improve …

…CLI options (#8477)

* chore: Fix compiler warnings, add help text, improve CLI options

* Add prototypes for function definitions
* Invert logic of --no-clean option to be more intuitive
* Provide a new help prompt with clear instructions

* chore : Add ignore rule for vulkan shader generator

Signed-off-by: teleprint-me <[email protected]>

* Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp

Co-authored-by: 0cc4m <[email protected]>

* chore : Remove void and apply C++ style empty parameters

* chore : Remove void and apply C++ style empty parameters

---------

Signed-off-by: teleprint-me <[email protected]>
Co-authored-by: 0cc4m <[email protected]>

Assets 20

28 Jul 05:37

github-actions

b3483

4c676c8

b3483

llama : refactor session file management (#8699)

* llama : refactor session file management

* llama : saving and restoring state checks for overflow

The size of the buffers should now be given to the functions working
with them, otherwise a truncated file could cause out of bound reads.

* llama : stream from session file instead of copying into a big buffer

Loading session files should no longer cause a memory usage spike.

* llama : llama_state_get_size returns the actual size instead of max

This is a breaking change, but makes that function *much* easier
to keep up to date, and it also makes it reflect the behavior
of llama_state_seq_get_size.

* llama : share code between whole and seq_id-specific state saving

Both session file types now use a more similar format.

* llama : no longer store all hparams in session files

Instead, the model arch name is stored.
The layer count and the embedding dimensions of the KV cache
are still verified when loading.
Storing all the hparams is not necessary.

* llama : fix uint64_t format type

* llama : various integer type cast and format string fixes

Some platforms use "%lu" and others "%llu" for uint64_t.
Not sure how to handle that, so casting to size_t when displaying errors.

* llama : remove _context suffix for llama_data_context

* llama : fix session file loading

llama_state_get_size cannot be used to get the max size anymore.

* llama : more graceful error handling of invalid session files

* llama : remove LLAMA_MAX_RNG_STATE

It's no longer necessary to limit the size of the RNG state,
because the max size of session files is not estimated anymore.

* llama : cast seq_id in comparison with unsigned n_seq_max

Assets 20

28 Jul 00:33

github-actions

b3482

e54c35e

b3482

feat: Support Moore Threads GPU  (#8383)

* Update doc for MUSA

Signed-off-by: Xiaodong Ye <[email protected]>

* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <[email protected]>

* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <[email protected]>

* CUDA => MUSA

Signed-off-by: Xiaodong Ye <[email protected]>

* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <[email protected]>

* Fix CI build failure

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 20

27 Jul 15:39

github-actions

b3479

345c8c0

b3479

ggml : add missing semicolon (#0)

ggml-ci

Assets 20

27 Jul 14:07

github-actions

b3472

b5e9546

b3472

llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

---------

Co-authored-by: compilade <[email protected]>

Assets 20

27 Jul 13:48

github-actions

b3471

92090ec

b3471

llama : add function for model-based max number of graph nodes (#8622)

* llama : model-based max number of graph nodes

ggml-ci

* llama : disable 405B max_nodes path due to lack of complaints

ggml-ci

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b3488

b3487

b3486

b3485

b3484

b3483

b3482

b3479

b3472

b3471