You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build.md
+3-18Lines changed: 3 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -592,7 +592,7 @@ To read documentation for how to build on IBM Z & LinuxONE, [click here](./build
592
592
593
593
## OpenVINO
594
594
595
-
[OpenVINO](https://docs.openvino.ai/2025/index.html) is an open-source toolkit for optimizing and deploying high-performance AI inference, specifically designed for Intel hardware, including CPUs, GPUs, and NPUs, in the cloud, on-premises, and on the edge.
595
+
[OpenVINO](https://docs.openvino.ai/2025/index.html) is an open-source toolkit for optimizing and deploying high-performance AI inference, specifically designed for Intel hardware, including CPUs, GPUs, and NPUs, in the cloud, on-premises, and on the edge.
596
596
The OpenVINO backend enhances performance by leveraging hardware-specific optimizations and can be enabled for use with llama.cpp.
597
597
598
598
Follow the instructions below to install OpenVINO runtime and build llama.cpp with OpenVINO support.
Control OpenVINO behavior using these environment variables:
696
696
697
-
-**`GGML_OPENVINO_DEVICE`**: Specify the target device for OpenVINO inference. If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables static compilation mode for optimal performance.
698
-
-**`GGML_OPENVINO_CACHE_DIR`**: Directory for model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet.
699
-
-**`GGML_OPENVINO_WEIGHT_AS_INPUT`**: Pass the weights as input to the OpenVINO model instead of creating Constant nodes for them.
697
+
-**`GGML_OPENVINO_DEVICE`**: Specify the target device for OpenVINO inference. If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables static compilation mode for optimal performance.
698
+
-**`GGML_OPENVINO_CACHE_DIR`**: Directory for model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet.
700
699
-**`GGML_OPENVINO_PROFILING`**: Enable execution time profiling.
701
700
-**`GGML_OPENVINO_DUMP_CGRAPH`**: Save compute graph to `cgraph.txt`.
702
701
-**`GGML_OPENVINO_DUMP_IR`**: Export OpenVINO IR files with timestamps.
0 commit comments