Releases · ngxson/llama.cpp

23 Jul 04:26

14c28df

b5966 Latest

Latest

CANN: weight format to NZ for Ascend310P3 (#14407)

* weight format to nz for 310p

* remove quant weight format to nz

* clean code

* fix

* make the conditions for converting weights to NZ format consistent

* clean code

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6
373 MB 2025-07-23T04:26:36Z
llama-b5966-bin-macos-arm64.zip

sha256:60035a5578070072526461900938615ab5576fe0e00a4239b29eedacc6195c3a
10.6 MB 2025-07-23T04:26:47Z
llama-b5966-bin-macos-x64.zip

sha256:34b2386367fed7b9ef1382ded03414073f76ee8738f34341c01be26fe5788ab8
27.2 MB 2025-07-23T04:26:48Z
llama-b5966-bin-ubuntu-vulkan-x64.zip

sha256:916a9b090ec521f03af9aca2754ae6f6abdc988010529373ee02a8cf743f28dc
20.9 MB 2025-07-23T04:26:50Z
llama-b5966-bin-ubuntu-x64.zip

sha256:3a67db7fa577277ba09effb61f4a0c4e720499b9f202652c7134a78f21b66dfa
12.5 MB 2025-07-23T04:26:51Z
llama-b5966-bin-win-cpu-arm64.zip

sha256:5dfa52553ebb4817b9afecaeb0455b6e611ed0eea12222d56983d84eaf086258
10.9 MB 2025-07-23T04:26:52Z
llama-b5966-bin-win-cpu-x64.zip

sha256:580c6c2a42219967fa97678b18821d58a9affcd188eb5cfca6125ca4cc5f45ac
13.7 MB 2025-07-23T04:26:53Z
llama-b5966-bin-win-cuda-12.4-x64.zip

sha256:5b63c0b4dfa8429e839a154500f91c44270fc8a52f264925f54e654207de0d2c
129 MB 2025-07-23T04:26:54Z
llama-b5966-bin-win-hip-radeon-x64.zip

sha256:6773578da355d18b68feeea3c3769bb8df413f149fdf65b9831d682e18b2e8af
299 MB 2025-07-23T04:26:59Z
llama-b5966-bin-win-opencl-adreno-arm64.zip

sha256:9ea43843f7fdff95e36978685b64a9e1b673f62f0f080960cd09cf4c669783e0
11.2 MB 2025-07-23T04:27:07Z
Source code (zip)

2025-07-23T03:58:00Z
Source code (tar.gz)

2025-07-23T03:58:00Z

23 Jul 02:13

github-actions

b5965

8c988fa

b5965

CUDA: add fused rms norm (#14800)

Assets 15

22 Jul 15:58

github-actions

b5963

84712b6

b5963

vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817)

Assets 15

22 Jul 15:25

github-actions

b5962

d4d1522

b5962

llama : add model type detection for rwkv7 7B&14B (#14816)

Signed-off-by: Molly Sophia <[email protected]>

Assets 15

22 Jul 12:52

github-actions

b5961

d1aa0cc

b5961

imatrix: add option to display importance score statistics for a give…

Assets 15

22 Jul 11:16

github-actions

b5960

c8ade30

b5960

Mtmd: add a way to select device for vision encoder (#14236)

* Mtmd: add a way to select device for vision encoder

* simplify

* format

* Warn user if manual device selection failed

* initialize backend to nullptr

Assets 15

22 Jul 10:50

github-actions

b5959

e28c0b8

b5959

cuda : implement bf16 cpy ops and enable bf16 cont (#14763)

* implement bf16 cpy ops and enable bf16 cont

* deduplicate copy functions

* deduplicate checks

Assets 15

22 Jul 07:16

github-actions

b5958

8e6f8bc

b5958

opencl: remove unreachable `return` (#14806)

Assets 15

22 Jul 01:42

github-actions

b5957

adef817

b5957

server : allow setting `--reverse-prompt` arg (#14799)

Signed-off-by: Molly Sophia <[email protected]>

Assets 15

22 Jul 00:08

github-actions

b5956

48b86c4

b5956

cuda: remove linking to cublasLt (#14790)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b5966

Uh oh!

b5965

Uh oh!

b5963

Uh oh!

b5962

Uh oh!

b5961

Uh oh!

b5960

Uh oh!

b5959

Uh oh!

b5958

Uh oh!

b5957

Uh oh!

b5956

Uh oh!