Environment
- OS: Ubuntu 24.04
- OpenVINO GenAI: 2025.3
- CPU: Intel Core Ultra 5 125H
Background
In the 2025.2 release notes, the following known issue was documented:
Component: OpenVINO GenAI
ID: 167065, 168564, 168360, 168339, 168361
Description:
Models such as Qwen-7B-Chat, Phi4-Reasoning, Llama-3.2-1B-Instruct, Qwen3-8B, and DeepSeek-R1-Distill* show reduced accuracy in chat scenarios compared to regular generation requests. Currently no workaround is available; a fix is planned for future releases.
Current Observation
With OpenVINO GenAI 2025.3, I tested Llama-3.2-1B-Instruct in chat scenarios, and still observed noticeable accuracy degradation compared to:
- The same model used in regular generation
- Running via
optimum.intel.openvino (OVModelForCausalLM)
This suggests the issue might still persist.
Question
- Has this issue already been fixed in 2025.3?
- If not yet fixed, is there an ETA for when the fix will be released?
- Until then, is using
optimum.intel.openvino's OVModelForCausalLM the recommended workaround to avoid accuracy degradation?