Releases · allozaur/llama.cpp

29 Sep 17:10

5f7e166

b6637

Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` b…

Assets 15

29 Sep 08:05

github-actions

b6625

66bb798

b6625

fix: preserved zero values in chat settings inputs and textareas by s…

Assets 15

29 Sep 07:21

github-actions

b6623

3ffd0fa

b6623

perplexity : show more kl-divergence data (#16321)

Adds additional percentile data for displayed in the output of `llama-perplexity --kl-divergence`:
- Added 95 percentile (mirroring existing 5 percentile)
- Added 0.1 percentile (mirroring existing 99.9 percentile)

Assets 15

24 Sep 12:41

github-actions

b6567

3a59971

b6567

model : add label for LiquidAI LFM2-2.6B model (#16204)

* model : add label for LiquidAI LFM2-2.6B model

HF link: [LiquidAI/LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B).

Support for GGUF conversion and inference is added in #14620.

However, due to similar `n_embd`, it identifies as a 1.2B model.
Fix the label by using `n_ff` to identify the model instead.

Output of `llama-bench`:
```
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| lfm2 1.2B F16                  |   2.18 GiB |     1.17 B | CPU        |      10 |           pp512 |        223.97 ± 5.32 |
| lfm2 2.6B F16                  |   4.79 GiB |     2.57 B | CPU        |      10 |           pp512 |         92.53 ± 4.14 |
| lfm2 350M F16                  | 676.25 MiB |   354.48 M | CPU        |      10 |           pp512 |       725.52 ± 11.70 |
| lfm2 700M F16                  |   1.38 GiB |   742.49 M | CPU        |      10 |           pp512 |       336.22 ± 12.93 |
```

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

24 Sep 07:51

github-actions

b6565

152729f

b6565

common : add missing chrono header for common.cpp (#16211)

Signed-off-by: Uilian Ries <[email protected]>

Assets 15

23 Sep 08:49

github-actions

b6556

264f1b5

b6556

zdnn: refactor codebase + add docs (#16178)

* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <[email protected]>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

Assets 15

19 Sep 08:19

github-actions

b6520

4067f07

b6520

feat: Improve mobile UI for Settings Dialog (#16084)

* feat: Improve mobile UI for Settings Dialog

* chore: update webui build output

* fix: Linting errors

* chore: update webui build output

Assets 15

18 Sep 22:33

github-actions

b6517

69ffd89

b6517

ggml-amx : fix ggml_amx_init() on generic Linux (#16049)

Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.

Without this commit, the code compiles (with warnings) but fails:

    register_backend: registered backend CPU (1 devices)
    register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
    build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
    system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
    ....
    print_info: n_ctx_orig_yarn  = 262144
    print_info: rope_finetuned   = unknown
    print_info: model type       = 4B
    Illegal instruction (core dumped)

Signed-off-by: Adrien Gallouët <[email protected]>

Assets 15

17 Sep 19:34

github-actions

b6501

0320ac5

b6501

metal : refactor + optimize v2 (#15995)

* metal : improve naming

* metal : refactor device

ggml-ci

* cont : props

ggml-ci

* metal : apply ggml_mem_ranges_t

ggml-ci

* metal : remove GGML_METAL_USE_BF16

ggml-ci

* metal : refactor device buffer

ggml-ci

* cont : fix naming

* metal : sync before destroying the backend

ggml-ci

* metal : refactor context

ggml-ci

* metal : migrate ggml-metal.m to ggml-metal.cpp

ggml-ci

* metal : adjust ops API

ggml-ci

* metal : use C++ to store piplienes

ggml-ci

* metal : migrate ops to separate functions

ggml-ci

* metal : add ggml_metal_library_t

ggml-ci

* metal : improve naming

ggml-ci

* metal : cleanp

ggml-ci

* metal : add support for GGML_OP_LOG

ggml-ci

* metal : fix error handling

ggml-ci

Assets 15

05 Sep 21:06

github-actions

b6393

408ff52

b6393

Implement --log-colors with always/never/auto (#15792)

With auto by default

Signed-off-by: Eric Curtin <[email protected]>

Assets 15

Releases: allozaur/llama.cpp

b6637

Uh oh!

b6625

Uh oh!

b6623

Uh oh!

b6567

Uh oh!

b6565

Uh oh!

b6556

Uh oh!

b6520

Uh oh!

b6517

Uh oh!

b6501

Uh oh!

b6393

Uh oh!