Skip to content

OpenVINO model inference significantly slower than ONNX model #2883

@rathoreaniket007

Description

@rathoreaniket007

Hi, I converted a PyTorch .pt model to both OpenVINO and ONNX formats using the ultralytics package.
Conversion code for OpenVINO:


def load_model(model_path):
    session = YOLO(model_path)
    return session

model_path = "/home/Desktop/Open-Vino/models/TIP_DET_20032025.pt"
session = load_model(model_path)

session.export(format="openvino", dynamic=True, half=True)

Conversion code for ONNX:


def load_model(model_path):
    session = YOLO(model_path)
    return session

model_path = "/home/Desktop/Open-Vino/models/TIP_DET_20032025.pt"
session = load_model(model_path)

session.export(format="onnx", nms=True)

Inference code for OpenVINO model:

from pathlib import Path
from PIL import Image
import time

core = ov.Core()
device = "CPU"

det_model_path = Path("/home/Desktop/Open-Vino/models/TIP_DET_20032025_openvino_model/TIP_DET_20032025.xml")
det_ov_model = core.read_model(det_model_path)

ov_config = {}

if device != "CPU":
    det_ov_model.reshape({0: [1, 3, 320, 320]})

if "GPU" in device or (device == "AUTO" and "GPU" in core.available_devices):
    ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}

det_compiled_model = core.compile_model(det_ov_model, device, ov_config)

from ultralytics import YOLO
det_model = YOLO(det_model_path.parent, task="detect")

if det_model.predictor is None:
    custom = {"conf": 0.25, "batch": 1, "save": False, "mode": "predict"}
    args = {**det_model.overrides, **custom}
    det_model.predictor = det_model._smart_load("predictor")(overrides=args, _callbacks=det_model.callbacks)
    det_model.predictor.setup_model(model=det_model.model)

det_model.predictor.model.ov_compiled_model = det_compiled_model

IMAGE_PATH = "/home/Desktop/Open-Vino/SLAP/MN000MAR1_L__Final_Frame_16042024122233.bmp"

start = time.time()
res = det_model(IMAGE_PATH)
end = time.time()

print("=====")
print(f"Time taken {(end-start)*1000}ms")
Image.fromarray(res[0].plot()[:, :, ::-1])

Observation:

Average inference time using the OpenVINO model: ~200ms

Average inference time using the ONNX model: ~40ms

I expected the OpenVINO model to perform at least on par with the ONNX model (or even faster), especially on CPU. Is there something I might have missed during conversion or inference setup that could explain this difference?

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions