Skip to content

[GenAI 2025.3] Accuracy drop in chat mode for Llama-3.2-1B-Instruct (regression check for ID 167065, 168564, …) #2791

@e950280

Description

@e950280

Environment

  • OS: Ubuntu 24.04
  • OpenVINO GenAI: 2025.3
  • CPU: Intel Core Ultra 5 125H

Background

In the 2025.2 release notes, the following known issue was documented:

Component: OpenVINO GenAI
ID: 167065, 168564, 168360, 168339, 168361
Description:
Models such as Qwen-7B-Chat, Phi4-Reasoning, Llama-3.2-1B-Instruct, Qwen3-8B, and DeepSeek-R1-Distill* show reduced accuracy in chat scenarios compared to regular generation requests. Currently no workaround is available; a fix is planned for future releases.


Current Observation

With OpenVINO GenAI 2025.3, I tested Llama-3.2-1B-Instruct in chat scenarios, and still observed noticeable accuracy degradation compared to:

  • The same model used in regular generation
  • Running via optimum.intel.openvino (OVModelForCausalLM)

This suggests the issue might still persist.


Question

  • Has this issue already been fixed in 2025.3?
  • If not yet fixed, is there an ETA for when the fix will be released?
  • Until then, is using optimum.intel.openvino's OVModelForCausalLM the recommended workaround to avoid accuracy degradation?

Metadata

Metadata

Assignees

Labels

category: LLMLLM pipeline (stateful, static)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions