Skip to content

Releases: allozaur/llama.cpp

b6637

29 Sep 17:10
5f7e166

Choose a tag to compare

Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` b…

b6625

29 Sep 08:05
66bb798

Choose a tag to compare

fix: preserved zero values in chat settings inputs and textareas by s…

b6623

29 Sep 07:21
3ffd0fa

Choose a tag to compare

perplexity : show more kl-divergence data (#16321)

Adds additional percentile data for displayed in the output of `llama-perplexity --kl-divergence`:
- Added 95 percentile (mirroring existing 5 percentile)
- Added 0.1 percentile (mirroring existing 99.9 percentile)

b6567

24 Sep 12:41
3a59971

Choose a tag to compare

model : add label for LiquidAI LFM2-2.6B model (#16204)

* model : add label for LiquidAI LFM2-2.6B model

HF link: [LiquidAI/LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B).

Support for GGUF conversion and inference is added in #14620.

However, due to similar `n_embd`, it identifies as a 1.2B model.
Fix the label by using `n_ff` to identify the model instead.

Output of `llama-bench`:
```
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| lfm2 1.2B F16                  |   2.18 GiB |     1.17 B | CPU        |      10 |           pp512 |        223.97 ± 5.32 |
| lfm2 2.6B F16                  |   4.79 GiB |     2.57 B | CPU        |      10 |           pp512 |         92.53 ± 4.14 |
| lfm2 350M F16                  | 676.25 MiB |   354.48 M | CPU        |      10 |           pp512 |       725.52 ± 11.70 |
| lfm2 700M F16                  |   1.38 GiB |   742.49 M | CPU        |      10 |           pp512 |       336.22 ± 12.93 |
```

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

b6565

24 Sep 07:51
152729f

Choose a tag to compare

common : add missing chrono header for common.cpp (#16211)

Signed-off-by: Uilian Ries <[email protected]>

b6556

23 Sep 08:49
264f1b5

Choose a tag to compare

zdnn: refactor codebase + add docs (#16178)

* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <[email protected]>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

b6520

19 Sep 08:19
4067f07

Choose a tag to compare

feat: Improve mobile UI for Settings Dialog (#16084)

* feat: Improve mobile UI for Settings Dialog

* chore: update webui build output

* fix: Linting errors

* chore: update webui build output

b6517

18 Sep 22:33
69ffd89

Choose a tag to compare

ggml-amx : fix ggml_amx_init() on generic Linux (#16049)

Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.

Without this commit, the code compiles (with warnings) but fails:

    register_backend: registered backend CPU (1 devices)
    register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
    build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
    system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
    ....
    print_info: n_ctx_orig_yarn  = 262144
    print_info: rope_finetuned   = unknown
    print_info: model type       = 4B
    Illegal instruction (core dumped)

Signed-off-by: Adrien Gallouët <[email protected]>

b6501

17 Sep 19:34
0320ac5

Choose a tag to compare

metal : refactor + optimize v2 (#15995)

* metal : improve naming

* metal : refactor device

ggml-ci

* cont : props

ggml-ci

* metal : apply ggml_mem_ranges_t

ggml-ci

* metal : remove GGML_METAL_USE_BF16

ggml-ci

* metal : refactor device buffer

ggml-ci

* cont : fix naming

* metal : sync before destroying the backend

ggml-ci

* metal : refactor context

ggml-ci

* metal : migrate ggml-metal.m to ggml-metal.cpp

ggml-ci

* metal : adjust ops API

ggml-ci

* metal : use C++ to store piplienes

ggml-ci

* metal : migrate ops to separate functions

ggml-ci

* metal : add ggml_metal_library_t

ggml-ci

* metal : improve naming

ggml-ci

* metal : cleanp

ggml-ci

* metal : add support for GGML_OP_LOG

ggml-ci

* metal : fix error handling

ggml-ci

b6393

05 Sep 21:06
408ff52

Choose a tag to compare

Implement --log-colors with always/never/auto (#15792)

With auto by default

Signed-off-by: Eric Curtin <[email protected]>