-
Notifications
You must be signed in to change notification settings - Fork 302
Description
System Information
- Laptop: DELL PRO 14 PLUS PB14250
- Processor: Intel® Core™ Ultra 7 265U, vPro®
- Operating System: Windows 11 Pro
- NPU Driver Version: 32.0.100.4239
- Python Environment: Clean
condaenvironment (see steps to reproduce) - Key Libraries:
openvino==2025.3.0openvino-genai==2025.3.0.0numpy==1.26.4(explicitly pinned) (I have also tried newer versions of numpy to no avail)
Describe the bug
When attempting to initialize openvino_genai.LLMPipeline with device='NPU', the Python kernel immediately crashes.
- The crash is instantaneous and silent (no Python exception is raised,
try...exceptblocks are ineffective). - In Jupyter, this manifests as a "Kernel for notebook appears to have died" message.
- There is a slight increase in CPU utilization immediately prior to the crash followed by CPU, RAM, or NPU utilizations immediately falling back to baseline.
Steps to Reproduce
-
Create a minimal, clean conda environment to isolate dependencies:
conda create --name npu_repro python=3.11 -y conda activate npu_repro -
Install the required packages with a pinned
numpyversion:pip install jupyterlab ipykernel "numpy<2.0" openvino==2025.3.0 openvino-tokenizers==2025.3.0.0 openvino-genai==2025.3.0.0 -
Download a pre-converted, NPU-ready model from Hugging Face:
- Model:
OpenVINO/DeepSeek-R1-Distill-Qwen-1.5B-int4-cw-ov(officially supported) - URL: https://huggingface.co/OpenVINO/DeepSeek-R1-Distill-Qwen-1.5B-int4-cw-ov
- Save all files into a local folder (e.g.,
DeepSeek-R1-Distill-Qwen-1.5B-int4-cw-ov).
- Model:
-
Run the following Python code in a Jupyter Notebook using the
npu_reprokernel:import os from pathlib import Path import openvino_genai as ov_genai print("Attempting to initialize LLMPipeline for NPU...") # Path to the downloaded model model_directory = Path("DeepSeek-R1-Distill-Qwen-1.5B-int4-cw-ov") try: # The following line causes an immediate kernel crash pipe = ov_genai.LLMPipeline( models_path=str(model_directory), device="NPU" (also crashes for CPU or GPU) ) print("This line is never reached.") except Exception as e: print(f"This exception block is never reached. Error: {e}")
Expected behavior
The LLMPipeline object should initialize successfully without crashing the kernel, allowing for subsequent inference on the NPU.
Additional context and workarounds attempted
The crash is persistent and appears to be a fundamental incompatibility between the openvino-genai library and the hardware/driver stack on this specific CPU. The following workarounds, based on official documentation and debugging, have been attempted and did not resolve the issue:
- Setting
DISABLE_OPENVINO_GENAI_NPU_L0=1: The kernel still crashes. - Using
GENERATE_HINT: "BEST_PERF": The kernel still crashes. - Explicit Tokenizer Loading: Manually creating an
ov_genai.Tokenizerobject and passing it to theLLMPipelineconstructor still results in a crash. - Using Larger Models (8B): Crashes as expected, likely due to memory, but the 1.5B model should be well within the system's capabilities.
- Verifying Driver: The installed NPU driver (
32.0.100.4239) is newer than the version32.0.100.3104mentioned as a requirement in the documentation.
This issue seems to be specific to the openvino-genai library's interaction with the NPU, as a similar setup using the older optimum.intel library can successfully run inference on the CPU and integrated GPU without crashing.
Thank you for your help in looking into this.