Skip to content

Releases: allozaur/llama.cpp

b6782

16 Oct 22:03
1bb4f43

Choose a tag to compare

mtmd : support home-cooked Mistral Small Omni (#14928)

b6779

16 Oct 10:22
7a50cf3

Choose a tag to compare

CANN: format code using .clang-format (#15863)

This commit applies .clang-format rules to all source files under the
ggml-cann directory to ensure consistent coding style and readability.
The .clang-format option `SortIncludes: false` has been set to disable
automatic reordering of include directives.
No functional changes are introduced.

Co-authored-by: hipudding <[email protected]>

b6765

15 Oct 10:24
fa882fd

Choose a tag to compare

metal : avoid using Metal's gpuAddress property (#16576)

* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check

b6749

13 Oct 09:17
1fb9504

Choose a tag to compare

fix: add remark plugin to render raw HTML as literal text (#16505)

* fix: add remark plugin to render raw HTML as literal text

Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs
do ensuring consistent and safe Markdown rendering

Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the
Markdown AST into plain-text equivalents while preserving indentation and
line breaks. This ensures consistent rendering and prevents unintended HTML
execution, without altering valid Markdown structure

Kept 'remarkRehype' in the pipeline since it performs the required conversion
from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization

Refined the link-enhancement logic to skip unnecessary DOM rewrites,
fixing a subtle bug where extra paragraphs were injected after the first
line due to full innerHTML reconstruction, and ensuring links open in new
tabs only when required

Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml
-> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify

* fix: address review feedback from allozaur

* chore: update webui build output

b6746

13 Oct 08:28
f9bc66c

Choose a tag to compare

CANN: Update several operators to support FP16 data format (#16251)

Many Ascend operators internally use FP16 precision for computation.
If input data is in FP32, it must first be cast to FP16 before
computation, and then cast back to FP32 after computation, which
introduces unnecessary cast operations. Moreover, FP16 computation
requires significantly less workload compared to FP32, leading to
noticeable efficiency improvements.

In this change, `get_rows`, `rms_norm`, and `flash_attn_ext` are extended
to support multiple data types. Validation on the Qwen2 0.5b model shows
correct accuracy and about 10% performance gain in concurrent scenarios.

Co-authored-by: noemotiovon <[email protected]>

b6725

10 Oct 06:58
1faa13a

Choose a tag to compare

webui: updated the chat service to only include max_tokens in the req…

b6681

03 Oct 11:11
84c8e30

Choose a tag to compare

Fix missing messages on sibling navigation (#16408)

* fix: resolve message disappearing issue when navigating between regenerated siblings by using current leaf nodes instead of cached sibling IDs

* chore: update webui build output

* chore: update webui build output

b6677

03 Oct 10:02
7723327

Choose a tag to compare

Capture model name only after first token (streaming) or completed re…

b6673

03 Oct 06:30
d64c810

Choose a tag to compare

test-barrier : do not use more threads than physically available (#16…

b6653

30 Sep 22:59
e74c92e

Choose a tag to compare

model : support GLM 4.6 (make a few NextN/MTP tensors not required) (…