You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build.md
+61-2Lines changed: 61 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -612,7 +612,7 @@ Follow the instructions below to install OpenVINO runtime and build llama.cpp wi
612
612
- Follow the guide to install OpenVINO Runtime from an archive file: [Linux](https://docs.openvino.ai/2025/get-started/install-openvino/install-openvino-archive-linux.html) | [Windows](https://docs.openvino.ai/2025/get-started/install-openvino/install-openvino-archive-windows.html)
613
613
614
614
<details>
615
-
<summary>📦 Click to expand OpenVINO 2025.3 installation on Ubuntu</summary>
615
+
<summary>📦 Click to expand OpenVINO 2025.3 installation from an archive file on Ubuntu</summary>
616
616
<br>
617
617
618
618
```bash
@@ -698,9 +698,68 @@ Control OpenVINO behavior using these environment variables:
698
698
export GGML_OPENVINO_CACHE_DIR=/tmp/ov_cache
699
699
export GGML_OPENVINO_PROFILING=1
700
700
701
-
./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
701
+
GGML_OPENVINO_DEVICE=GPU ./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
702
+
```
703
+
704
+
### Docker build Llama.cpp with OpenVINO Backend
705
+
You can build and run llama.cpp with OpenVINO backend using Docker.
706
+
707
+
```bash
708
+
# Build the base runtime image with compiled shared libraries and minimal dependencies.
# If you are behind a proxy, make sure to set NO_PROXY to avoid proxy for localhost
750
+
export NO_PROXY=localhost,127.0.0.1
751
+
752
+
# Test health endpoint
753
+
curl -f http://localhost:8080/health
754
+
755
+
# Test with a simple prompt
756
+
curl -X POST "http://localhost:8080/v1/chat/completions" -H "Content-Type: application/json" \
757
+
-d '{"messages":[{"role":"user","content":"Write a poem about OpenVINO"}],"max_tokens":100}'| jq .
758
+
759
+
```
760
+
761
+
762
+
---
704
763
## Notes about GPU-accelerated backends
705
764
706
765
The GPU may still be used to accelerate some parts of the computation even when using the `-ngl 0` option. You can fully disable GPU acceleration by using `--device none`.
0 commit comments