Skip to content

Commit b9e805f

Browse files
committed
Minor updates for rasing PR
1 parent 556fa6d commit b9e805f

File tree

3 files changed

+4
-40
lines changed

3 files changed

+4
-40
lines changed

CMakePresets.json

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,6 @@
11
{
22
"version": 4,
33
"configurePresets": [
4-
{
5-
"name": "ReleaseOV",
6-
"generator": "Ninja",
7-
"binaryDir": "${sourceDir}/build/${presetName}",
8-
"installDir": "${sourceDir}/build/install/${presetName}",
9-
"cacheVariables": {
10-
"CMAKE_BUILD_TYPE": "Release",
11-
"GGML_OPENVINO": true,
12-
"OpenVINO_DIR": "$env{OPENVINO_LLAMA_PATH}/build/Release"
13-
}
14-
},
15-
{
16-
"name": "ReleaseCPU",
17-
"generator": "Ninja",
18-
"binaryDir": "${sourceDir}/build/${presetName}",
19-
"installDir": "${sourceDir}/build/install/${presetName}",
20-
"cacheVariables": {
21-
"CMAKE_BUILD_TYPE": "Release"
22-
}
23-
},
244
{
255
"name": "base",
266
"hidden": true,

docs/build.md

Lines changed: 3 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -595,7 +595,7 @@ To read documentation for how to build on IBM Z & LinuxONE, [click here](./build
595595

596596
## OpenVINO
597597

598-
[OpenVINO](https://docs.openvino.ai/2025/index.html) is an open-source toolkit for optimizing and deploying high-performance AI inference, specifically designed for Intel hardware, including CPUs, GPUs, and NPUs, in the cloud, on-premises, and on the edge.
598+
[OpenVINO](https://docs.openvino.ai/2025/index.html) is an open-source toolkit for optimizing and deploying high-performance AI inference, specifically designed for Intel hardware, including CPUs, GPUs, and NPUs, in the cloud, on-premises, and on the edge.
599599
The OpenVINO backend enhances performance by leveraging hardware-specific optimizations and can be enabled for use with llama.cpp.
600600

601601
Follow the instructions below to install OpenVINO runtime and build llama.cpp with OpenVINO support.
@@ -697,9 +697,8 @@ export GGML_OPENVINO_CACHE_DIR=/tmp/ov_cache
697697

698698
Control OpenVINO behavior using these environment variables:
699699

700-
- **`GGML_OPENVINO_DEVICE`**: Specify the target device for OpenVINO inference. If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables static compilation mode for optimal performance.
701-
- **`GGML_OPENVINO_CACHE_DIR`**: Directory for model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet.
702-
- **`GGML_OPENVINO_WEIGHT_AS_INPUT`**: Pass the weights as input to the OpenVINO model instead of creating Constant nodes for them.
700+
- **`GGML_OPENVINO_DEVICE`**: Specify the target device for OpenVINO inference. If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables static compilation mode for optimal performance.
701+
- **`GGML_OPENVINO_CACHE_DIR`**: Directory for model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet.
703702
- **`GGML_OPENVINO_PROFILING`**: Enable execution time profiling.
704703
- **`GGML_OPENVINO_DUMP_CGRAPH`**: Save compute graph to `cgraph.txt`.
705704
- **`GGML_OPENVINO_DUMP_IR`**: Export OpenVINO IR files with timestamps.
@@ -714,20 +713,6 @@ export GGML_OPENVINO_PROFILING=1
714713

715714
./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
716715
```
717-
> **Note:** To apply your code changes, clear the `GGML_OPENVINO_CACHE_DIR` directory and rebuild the project.
718-
719-
### Using Llama.cpp's Built-in CPU Backend (for Comparison)
720-
721-
To compare performance with the default CPU backend:
722-
723-
```bash
724-
# Build CPU-only version
725-
cmake --preset ReleaseCPU
726-
cmake --build build/ReleaseCPU --parallel
727-
728-
# Run with the default CPU backend
729-
./build/ReleaseCPU/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
730-
```
731716

732717
## Notes about GPU-accelerated backends
733718

ggml/src/ggml-openvino/ggml-decoder.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,7 @@ GgmlOvDecoder::GgmlOvDecoder(struct ggml_cgraph* cgraph,
5757
}
5858

5959
if (getenv("GGML_OPENVINO_DUMP_CGRAPH")) {
60-
auto timestamp = (long long) ggml_time_us();
61-
std::string filename = "cgraph_" + std::to_string(timestamp) + ".txt";
60+
std::string filename = "cgraph.txt";
6261
dump_cgraph(cgraph, filename);
6362
}
6463

0 commit comments

Comments
 (0)