Skip to content

Commit 7d8ea73

Browse files
authored
Update OV dockerfile to use OV2025.3 and update build docs
1 parent 09ead55 commit 7d8ea73

File tree

2 files changed

+63
-4
lines changed

2 files changed

+63
-4
lines changed

.devops/openvino.Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
ARG OPENVINO_VERSION_MAJOR=2025.2
2-
ARG OPENVINO_VERSION_FULL=2025.2.0.19140.c01cd93e24d
1+
ARG OPENVINO_VERSION_MAJOR=2025.3
2+
ARG OPENVINO_VERSION_FULL=2025.3.0.19807.44526285f24
33
ARG UBUNTU_VERSION=24.04
44

55
# Optional proxy build arguments - empty by default

docs/build.md

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -612,7 +612,7 @@ Follow the instructions below to install OpenVINO runtime and build llama.cpp wi
612612
- Follow the guide to install OpenVINO Runtime from an archive file: [Linux](https://docs.openvino.ai/2025/get-started/install-openvino/install-openvino-archive-linux.html) | [Windows](https://docs.openvino.ai/2025/get-started/install-openvino/install-openvino-archive-windows.html)
613613

614614
<details>
615-
<summary>📦 Click to expand OpenVINO 2025.3 installation on Ubuntu</summary>
615+
<summary>📦 Click to expand OpenVINO 2025.3 installation from an archive file on Ubuntu</summary>
616616
<br>
617617

618618
```bash
@@ -698,9 +698,68 @@ Control OpenVINO behavior using these environment variables:
698698
export GGML_OPENVINO_CACHE_DIR=/tmp/ov_cache
699699
export GGML_OPENVINO_PROFILING=1
700700

701-
./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
701+
GGML_OPENVINO_DEVICE=GPU ./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
702+
```
703+
704+
### Docker build Llama.cpp with OpenVINO Backend
705+
You can build and run llama.cpp with OpenVINO backend using Docker.
706+
707+
```bash
708+
# Build the base runtime image with compiled shared libraries and minimal dependencies.
709+
docker build -t llama-openvino:base -f .devops/openvino.Dockerfile .
710+
711+
# Build the complete image with all binaries, Python tools, gguf-py library, and model conversion utilities.
712+
docker build --target=full -t llama-openvino:full -f .devops/openvino.Dockerfile .
713+
714+
# Build a minimal CLI-only image containing just the llama-cli executable.
715+
docker build --target=light -t llama-openvino:light -f .devops/openvino.Dockerfile .
716+
717+
# Builds a server-only image with llama-server executable, health check endpoint, and REST API support.
718+
docker build --target=server -t llama-openvino:server -f .devops/openvino.Dockerfile .
719+
720+
# If you are behind a proxy:
721+
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --target=light --t llama-openvino:light -f .devops/openvino.Dockerfile .
702722
```
703723

724+
Run llama.cpp with OpenVINO backend Docker container.
725+
Save sample models in `~/models` as [shown above](#3-download-sample-model). It will be mounted to the container in the examples below.
726+
727+
```bash
728+
# Run Docker container
729+
docker run --rm -it -v ~/models:/models llama-openvino:light --no-warmup -m /models/Llama-3.2-1B-Instruct.fp16.gguf
730+
731+
# With Intel GPU access (iGPU or dGPU)
732+
docker run --rm -it -v ~/models:/models \
733+
--device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
734+
llama-openvino:light --no-warmup -m /models/Llama-3.2-1B-Instruct.fp16.gguf
735+
736+
# With Intel NPU access
737+
docker run --rm -it --env GGML_OPENVINO_DEVICE=NPU -v ~/models:/models \
738+
--device=/dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
739+
llama-openvino:light --no-warmup -m /models/Llama-3.2-1B-Instruct.fp16.gguf
740+
```
741+
742+
Run Llama.cpp Server with OpenVINO Backend
743+
```bash
744+
# Run the Server Docker container server
745+
docker run --rm -it -p 8080:8080 -v ~/models:/models llama-openvino:server --no-warmup -m /models/Llama-3.2-1B-Instruct.fp16.gguf
746+
747+
# In a NEW terminal, test the server with curl
748+
749+
# If you are behind a proxy, make sure to set NO_PROXY to avoid proxy for localhost
750+
export NO_PROXY=localhost,127.0.0.1
751+
752+
# Test health endpoint
753+
curl -f http://localhost:8080/health
754+
755+
# Test with a simple prompt
756+
curl -X POST "http://localhost:8080/v1/chat/completions" -H "Content-Type: application/json" \
757+
-d '{"messages":[{"role":"user","content":"Write a poem about OpenVINO"}],"max_tokens":100}' | jq .
758+
759+
```
760+
761+
762+
---
704763
## Notes about GPU-accelerated backends
705764

706765
The GPU may still be used to accelerate some parts of the computation even when using the `-ngl 0` option. You can fully disable GPU acceleration by using `--device none`.

0 commit comments

Comments
 (0)