You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -28,6 +29,30 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
28
29
29
30
----
30
31
32
+
## Quick start
33
+
34
+
Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
35
+
36
+
- Install `llama.cpp` using [brew, nix or winget](docs/install.md)
37
+
- Run with Docker - see our [Docker documentation](docs/docker.md)
38
+
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
39
+
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
40
+
41
+
Once installed, you'll need a model to work with. Head to the [Obtaining and quantizing models](#obtaining-and-quantizing-models) section to learn more.
42
+
43
+
Example command:
44
+
45
+
```sh
46
+
# Use a local model file
47
+
llama-cli -m my_model.gguf
48
+
49
+
# Or download and run a model directly from Hugging Face
50
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
51
+
52
+
# Launch OpenAI-compatible API server
53
+
llama-server -hf ggml-org/gemma-3-1b-it-GGUF
54
+
```
55
+
31
56
## Description
32
57
33
58
The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
@@ -230,6 +255,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
230
255
231
256
</details>
232
257
258
+
233
259
## Supported backends
234
260
235
261
| Backend | Target devices |
@@ -246,24 +272,18 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
246
272
|[OpenCL](docs/backend/OPENCL.md)| Adreno GPU |
247
273
|[RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc)| All |
248
274
249
-
## Building the project
250
-
251
-
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
252
-
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
253
-
254
-
- Clone this repository and build locally, see [how to build](docs/build.md)
255
-
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
256
-
- Use a Docker image, see [documentation for Docker](docs/docker.md)
257
-
- Download pre-built binaries from [releases](https://github.com/ggml-org/llama.cpp/releases)
258
-
259
275
## Obtaining and quantizing models
260
276
261
277
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`.
282
+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
283
+
284
+
```sh
285
+
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
286
+
```
267
287
268
288
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
Copy file name to clipboardExpand all lines: docs/build.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,9 @@
1
1
# Build llama.cpp locally
2
2
3
+
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
4
+
5
+
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server.
This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164).
37
-
38
-
## Flox
39
-
40
-
On Mac and Linux, Flox can be used to install llama.cpp within a Flox environment via
0 commit comments