OpenVINO model inference significantly slower than ONNX model

Hi, I converted a PyTorch .pt model to both OpenVINO and ONNX formats using the ultralytics package.
Conversion code for OpenVINO:

```from ultralytics import YOLO

def load_model(model_path):
    session = YOLO(model_path)
    return session

model_path = "/home/Desktop/Open-Vino/models/TIP_DET_20032025.pt"
session = load_model(model_path)

session.export(format="openvino", dynamic=True, half=True)
```

Conversion code for ONNX:
```from ultralytics import YOLO

def load_model(model_path):
    session = YOLO(model_path)
    return session

model_path = "/home/Desktop/Open-Vino/models/TIP_DET_20032025.pt"
session = load_model(model_path)

session.export(format="onnx", nms=True)
```

Inference code for OpenVINO model:

```import openvino as ov
from pathlib import Path
from PIL import Image
import time

core = ov.Core()
device = "CPU"

det_model_path = Path("/home/Desktop/Open-Vino/models/TIP_DET_20032025_openvino_model/TIP_DET_20032025.xml")
det_ov_model = core.read_model(det_model_path)

ov_config = {}

if device != "CPU":
    det_ov_model.reshape({0: [1, 3, 320, 320]})

if "GPU" in device or (device == "AUTO" and "GPU" in core.available_devices):
    ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}

det_compiled_model = core.compile_model(det_ov_model, device, ov_config)

from ultralytics import YOLO
det_model = YOLO(det_model_path.parent, task="detect")

if det_model.predictor is None:
    custom = {"conf": 0.25, "batch": 1, "save": False, "mode": "predict"}
    args = {**det_model.overrides, **custom}
    det_model.predictor = det_model._smart_load("predictor")(overrides=args, _callbacks=det_model.callbacks)
    det_model.predictor.setup_model(model=det_model.model)

det_model.predictor.model.ov_compiled_model = det_compiled_model

IMAGE_PATH = "/home/Desktop/Open-Vino/SLAP/MN000MAR1_L__Final_Frame_16042024122233.bmp"

start = time.time()
res = det_model(IMAGE_PATH)
end = time.time()

print("=====")
print(f"Time taken {(end-start)*1000}ms")
Image.fromarray(res[0].plot()[:, :, ::-1])
```

Observation:

Average inference time using the OpenVINO model: ~200ms

Average inference time using the ONNX model: ~40ms

I expected the OpenVINO model to perform at least on par with the ONNX model (or even faster), especially on CPU. Is there something I might have missed during conversion or inference setup that could explain this difference?

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenVINO model inference significantly slower than ONNX model #2883

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenVINO model inference significantly slower than ONNX model #2883

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions