Skip to content

Error when setting max_prompt_len #3703

@jpm-canonical

Description

@jpm-canonical

Describe the bug
When setting --max_prompt_len 2048 as cli option, ovms fails to start up. It complains that the value provided is not an integer.

/root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/external/llm_engine/src/cpp/src/utils.cpp:66:
Failed to extract MAX_PROMPT_LEN. Type mismatch: expected types: int or int64_t

To Reproduce
Steps to reproduce the behavior:

  1. Steps to prepare models repository:
    cd /model-repo && git clone https://huggingface.co/helenai/Qwen2.5-VL-3B-Instruct-ov-int4-npu
    
  2. OVMS launch command:
    ovms --rest_port 8326 --rest_bind_address 127.0.0.1 --source_model Qwen2.5-VL-3B-Instruct-ov-int4-npu --model_repository_path /model-repo --target_device NPU --task text_generation --max_prompt_len 2048 --log_level DEBUG
    

Logs

$ ovms --rest_port 8326 --rest_bind_address 127.0.0.1 --source_model Qwen2.5-VL-3B-Instruct-ov-int4-npu --model_repository_path /model-repo --target_device NPU --task text_generation --max_prompt_len 2048 --log_level DEBUG
[2025-10-15 13:17:18.989][171741][serving][info][server.cpp:91] OpenVINO Model Server 2025.3.0.6e2e910de
[2025-10-15 13:17:18.989][171741][serving][info][server.cpp:92] OpenVINO backend 2025.3.0.dev20250826
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:93] CLI parameters passed to ovms server
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:100] model_path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:101] model_name: Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:102] batch_size: 
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:103] shape: 
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:104] model_version_policy: 
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:105] nireq: 0
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:106] target_device: NPU
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:107] plugin_config: 
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:108] stateful: false
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:109] metrics_enabled: false
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:110] metrics_list: 
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:111] idle_sequence_cleanup: true
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:112] max_sequence_number: 500
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:113] low_latency_transformation: false
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:117] gRPC port: 0
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:118] REST port: 8326
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:119] gRPC bind address: 0.0.0.0
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:120] REST bind address: 127.0.0.1
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:121] REST workers: 22
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:122] gRPC workers: 1
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:123] gRPC channel arguments: 
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:124] log level: DEBUG
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:125] log path: 
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:126] file system poll wait milliseconds: 1000
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:127] sequence cleaner poll wait minutes: 5
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:128] model_repository_path: /model-repo
[2025-10-15 13:17:18.989][171741][serving][debug][optimum_export.cpp:160] Path already exists on local filesystem. Not downloading to path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
Model: Qwen2.5-VL-3B-Instruct-ov-int4-npu downloaded to: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][debug][filesystem.cpp:191] Creating file /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/graph.pbtxt
Graph: graph.pbtxt created in: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][info][pythoninterpretermodule.cpp:37] PythonInterpreterModule starting
Python version:
3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
Python sys.path output:
['', '/my-path/lib/python', '/my-path/lib/python3.12/site-packages', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/usr/lib/python3/dist-packages']
[2025-10-15 13:17:19.000][171741][serving][debug][python_backend.cpp:46] Creating python backend
[2025-10-15 13:17:19.000][171741][serving][info][pythoninterpretermodule.cpp:50] PythonInterpreterModule started
[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered Calculators: AddHeaderCalculator, AlignmentPointsRectsCalculator, AnnotationOverlayCalculator, AnomalyCalculator, AnomalySerializationCalculator, AssociationNormRectCalculator, BeginLoopDetectionCalculator, BeginLoopFloatCalculator, BeginLoopGpuBufferCalculator, BeginLoopImageCalculator, BeginLoopImageFrameCalculator, BeginLoopIntCalculator, BeginLoopMatrixCalculator, BeginLoopMatrixVectorCalculator, BeginLoopModelApiDetectionCalculator, BeginLoopNormalizedLandmarkListVectorCalculator, BeginLoopNormalizedRectCalculator, BeginLoopRectanglePredictionCalculator, BeginLoopStringCalculator, BeginLoopTensorCalculator, BeginLoopUint64tCalculator, BoxDetectorCalculator, BoxTrackerCalculator, CallbackCalculator, CallbackPacketCalculator, CallbackWithHeaderCalculator, ClassificationCalculator, ClassificationListVectorHasMinSizeCalculator, ClassificationListVectorSizeCalculator, ClassificationSerializationCalculator, ClipDetectionVectorSizeCalculator, ClipNormalizedRectVectorSizeCalculator, ColorConvertCalculator, ConcatenateBoolVectorCalculator, ConcatenateClassificationListCalculator, ConcatenateClassificationListVectorCalculator, ConcatenateDetectionVectorCalculator, ConcatenateFloatVectorCalculator, ConcatenateImageVectorCalculator, ConcatenateInt32VectorCalculator, ConcatenateJointListCalculator, ConcatenateLandmarListVectorCalculator, ConcatenateLandmarkListCalculator, ConcatenateLandmarkListVectorCalculator, ConcatenateLandmarkVectorCalculator, ConcatenateNormalizedLandmarkListCalculator, ConcatenateNormalizedLandmarkListVectorCalculator, ConcatenateRenderDataVectorCalculator, ConcatenateStringVectorCalculator, ConcatenateTensorVectorCalculator, ConcatenateTfLiteTensorVectorCalculator, ConcatenateUInt64VectorCalculator, ConstantSidePacketCalculator, CountingSourceCalculator, CropCalculator, DefaultSidePacketCalculator, DequantizeByteArrayCalculator, DetectionCalculator, DetectionClassificationCombinerCalculator, DetectionClassificationResultCalculator, DetectionClassificationSerializationCalculator, DetectionExtractionCalculator, DetectionLabelIdToTextCalculator, DetectionLetterboxRemovalCalculator, DetectionProjectionCalculator, DetectionSegmentationCombinerCalculator, DetectionSegmentationResultCalculator, DetectionSegmentationSerializationCalculator, DetectionSerializationCalculator, DetectionsToRectsCalculator, DetectionsToRenderDataCalculator, EmbeddingsCalculator, EmbeddingsCalculatorOV, EmptyLabelCalculator, EmptyLabelClassificationCalculator, EmptyLabelDetectionCalculator, EmptyLabelRotatedDetectionCalculator, EmptyLabelSegmentationCalculator, EndLoopAffineMatrixCalculator, EndLoopBooleanCalculator, EndLoopClassificationListCalculator, EndLoopDetectionCalculator, EndLoopFloatCalculator, EndLoopGpuBufferCalculator, EndLoopImageCalculator, EndLoopImageFrameCalculator, EndLoopImageSizeCalculator, EndLoopLandmarkListVectorCalculator, EndLoopMatrixCalculator, EndLoopModelApiDetectionClassificationCalculator, EndLoopModelApiDetectionSegmentationCalculator, EndLoopNormalizedLandmarkListVectorCalculator, EndLoopNormalizedRectCalculator, EndLoopPolygonPredictionsCalculator, EndLoopRectanglePredictionsCalculator, EndLoopRenderDataCalculator, EndLoopTensorCalculator, EndLoopTfLiteTensorCalculator, FaceLandmarksToRenderDataCalculator, FeatureDetectorCalculator, FlowLimiterCalculator, FlowPackagerCalculator, FlowToImageCalculator, FromImageCalculator, GateCalculator, GetClassificationListVectorItemCalculator, GetDetectionVectorItemCalculator, GetLandmarkListVectorItemCalculator, GetNormalizedLandmarkListVectorItemCalculator, GetNormalizedRectVectorItemCalculator, GetRectVectorItemCalculator, GraphProfileCalculator, HandDetectionsFromPoseToRectsCalculator, HandLandmarksToRectCalculator, HttpLLMCalculator, HttpSerializationCalculator, ImageCloneCalculator, ImageCroppingCalculator, ImageGenCalculator, ImagePropertiesCalculator, ImageToTensorCalculator, ImageTransformationCalculator, ImmediateMuxCalculator, InferenceCalculatorCpu, InstanceSegmentationCalculator, InverseMatrixCalculator, IrisToRenderDataCalculator, KeypointDetectionCalculator, LandmarkLetterboxRemovalCalculator, LandmarkListVectorSizeCalculator, LandmarkProjectionCalculator, LandmarkVisibilityCalculator, LandmarksRefinementCalculator, LandmarksSmoothingCalculator, LandmarksToDetectionCalculator, LandmarksToRenderDataCalculator, MakePairCalculator, MatrixMultiplyCalculator, MatrixSubtractCalculator, MatrixToVectorCalculator, MediaPipeInternalSidePacketToPacketStreamCalculator, MergeCalculator, MergeDetectionsToVectorCalculator, MergeGpuBuffersToVectorCalculator, MergeImagesToVectorCalculator, ModelInferHttpRequestCalculator, ModelInferRequestImageCalculator, MotionAnalysisCalculator, MuxCalculator, NonMaxSuppressionCalculator, NonZeroCalculator, NormalizedLandmarkListVectorHasMinSizeCalculator, NormalizedRectVectorHasMinSizeCalculator, OpenCvEncodedImageToImageFrameCalculator, OpenCvImageEncoderCalculator, OpenCvPutTextCalculator, OpenCvVideoDecoderCalculator, OpenCvVideoEncoderCalculator, OpenVINOConverterCalculator, OpenVINOInferenceAdapterCalculator, OpenVINOInferenceCalculator, OpenVINOModelServerSessionCalculator, OpenVINOTensorsToClassificationCalculator, OpenVINOTensorsToDetectionsCalculator, OverlayCalculator, PacketClonerCalculator, PacketGeneratorWrapperCalculator, PacketInnerJoinCalculator, PacketPresenceCalculator, PacketResamplerCalculator, PacketSequencerCalculator, PacketThinnerCalculator, PassThroughCalculator, PreviousLoopbackCalculator, PyTensorOvTensorConverterCalculator, PythonExecutorCalculator, QuantizeFloatVectorCalculator, RectToRenderDataCalculator, RectToRenderScaleCalculator, RectTransformationCalculator, RefineLandmarksFromHeatmapCalculator, RerankCalculator, RerankCalculatorOV, ResourceProviderCalculator, RoiTrackingCalculator, RotatedDetectionCalculator, RotatedDetectionSerializationCalculator, RoundRobinDemuxCalculator, SegmentationCalculator, SegmentationSerializationCalculator, SegmentationSmoothingCalculator, SequenceShiftCalculator, SerializationCalculator, SetLandmarkVisibilityCalculator, SidePacketToStreamCalculator, SplitAffineMatrixVectorCalculator, SplitClassificationListVectorCalculator, SplitDetectionVectorCalculator, SplitFloatVectorCalculator, SplitImageVectorCalculator, SplitJointListCalculator, SplitLandmarkListCalculator, SplitLandmarkVectorCalculator, SplitMatrixVectorCalculator, SplitNormalizedLandmarkListCalculator, SplitNormalizedLandmarkListVectorCalculator, SplitNormalizedRectVectorCalculator, SplitTensorVectorCalculator, SplitTfLiteTensorVectorCalculator, SplitUint64tVectorCalculator, SsdAnchorsCalculator, StreamToSidePacketCalculator, StringToInt32Calculator, StringToInt64Calculator, StringToIntCalculator, StringToUint32Calculator, StringToUint64Calculator, StringToUintCalculator, SwitchDemuxCalculator, SwitchMuxCalculator, TensorsToClassificationCalculator, TensorsToDetectionsCalculator, TensorsToFloatsCalculator, TensorsToLandmarksCalculator, TensorsToSegmentationCalculator, TfLiteConverterCalculator, TfLiteCustomOpResolverCalculator, TfLiteInferenceCalculator, TfLiteModelCalculator, TfLiteTensorsToDetectionsCalculator, TfLiteTensorsToFloatsCalculator, TfLiteTensorsToLandmarksCalculator, ThresholdingCalculator, ToImageCalculator, TrackedDetectionManagerCalculator, Tvl1OpticalFlowCalculator, UpdateFaceLandmarksCalculator, VideoPreStreamCalculator, VisibilityCopyCalculator, VisibilitySmoothingCalculator, WarpAffineCalculator, WarpAffineCalculatorCpu, WorldLandmarkProjectionCalculator

[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered Subgraphs: FaceDetection, FaceDetectionFrontDetectionToRoi, FaceDetectionFrontDetectionsToRoi, FaceDetectionShortRange, FaceDetectionShortRangeByRoiCpu, FaceDetectionShortRangeCpu, FaceLandmarkCpu, FaceLandmarkFrontCpu, FaceLandmarkLandmarksToRoi, FaceLandmarksFromPoseCpu, FaceLandmarksFromPoseToRecropRoi, FaceLandmarksModelLoader, FaceLandmarksToRoi, FaceTracking, HandLandmarkCpu, HandLandmarkModelLoader, HandLandmarksFromPoseCpu, HandLandmarksFromPoseToRecropRoi, HandLandmarksLeftAndRightCpu, HandLandmarksToRoi, HandRecropByRoiCpu, HandTracking, HandVisibilityFromHandLandmarksFromPose, HandWristForPose, HolisticLandmarkCpu, HolisticTrackingToRenderData, InferenceCalculator, IrisLandmarkCpu, IrisLandmarkLandmarksToRoi, IrisLandmarkLeftAndRightCpu, IrisRendererCpu, PoseDetectionCpu, PoseDetectionToRoi, PoseLandmarkByRoiCpu, PoseLandmarkCpu, PoseLandmarkFiltering, PoseLandmarkModelLoader, PoseLandmarksAndSegmentationInverseProjection, PoseLandmarksToRoi, PoseSegmentationFiltering, SwitchContainer, TensorsToFaceLandmarks, TensorsToFaceLandmarksWithAttention, TensorsToPoseLandmarksAndSegmentation

[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered InputStreamHandlers: BarrierInputStreamHandler, DefaultInputStreamHandler, EarlyCloseInputStreamHandler, FixedSizeInputStreamHandler, ImmediateInputStreamHandler, MuxInputStreamHandler, SyncSetInputStreamHandler, TimestampAlignInputStreamHandler

[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered OutputStreamHandlers: InOrderOutputStreamHandler

[2025-10-15 13:17:19.312][171741][modelmanager][info][modelmanager.cpp:156] Available devices for Open VINO: CPU, GPU, NPU
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: CPU; plugin configuration
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: CPU; plugin configuration: { AVAILABLE_DEVICES: , CPU_DENORMALS_OPTIMIZATION: NO, CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1, DEVICE_ARCHITECTURE: intel64, DEVICE_ID: , DEVICE_TYPE: integrated, DYNAMIC_QUANTIZATION_GROUP_SIZE: 32, ENABLE_CPU_PINNING: YES, ENABLE_CPU_RESERVATION: NO, ENABLE_HYPER_THREADING: YES, ENABLE_TENSOR_PARALLEL: NO, EXECUTION_DEVICES: CPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Core(TM) Ultra 7 155H, INFERENCE_NUM_THREADS: 0, INFERENCE_PRECISION_HINT: f32, KEY_CACHE_GROUP_SIZE: 0, KEY_CACHE_PRECISION: u8, KV_CACHE_PRECISION: u8, LOG_LEVEL: LOG_NONE, MODEL_DISTRIBUTION_POLICY: , NUM_STREAMS: 1, OPTIMIZATION_CAPABILITIES: FP32 INT8 BIN EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 1 1, RANGE_FOR_STREAMS: 1 22, SCHEDULING_CORE_TYPE: ANY_CORE, VALUE_CACHE_GROUP_SIZE: 0, VALUE_CACHE_PRECISION: u8 }
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: GPU; plugin configuration
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: GPU; plugin configuration: { ACTIVATIONS_SCALE_FACTOR: -1, AVAILABLE_DEVICES: 0, CACHE_DIR: , CACHE_ENCRYPTION_CALLBACKS: , CACHE_MODE: optimize_speed, COMPILATION_NUM_THREADS: 22, CONFIG_FILE: , DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.71.4, DEVICE_GOPS: {f16:9216,f32:4608,i8:18432,u8:18432}, DEVICE_ID: 0, DEVICE_LUID: 409a0000499a0000, DEVICE_PCI_INFO: {domain: 0 bus: 0 device: 0x2 function: 0}, DEVICE_TYPE: integrated, DEVICE_UUID: 8680557d080000000002000000000000, DYNAMIC_QUANTIZATION_GROUP_SIZE: 0, ENABLE_CPU_PINNING: NO, ENABLE_CPU_RESERVATION: NO, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Arc(TM) Graphics (iGPU), GPU_DEVICE_ID: 0x7d55, GPU_DEVICE_TOTAL_MEM_SIZE: 62509150208, GPU_DISABLE_WINOGRAD_CONVOLUTION: NO, GPU_ENABLE_LOOP_UNROLLING: YES, GPU_ENABLE_LORA_OPERATION: YES, GPU_ENABLE_SDPA_OPTIMIZATION: YES, GPU_EXECUTION_UNITS_COUNT: 128, GPU_HOST_TASK_PRIORITY: MEDIUM, GPU_MEMORY_STATISTICS: {cl_mem:0,unknown:0,usm_device:0,usm_host:0,usm_shared:0}, GPU_QUEUE_PRIORITY: MEDIUM, GPU_QUEUE_THROTTLE: MEDIUM, GPU_UARCH_VERSION: 12.71.4, INFERENCE_PRECISION_HINT: f16, KV_CACHE_PRECISION: dynamic, MAX_BATCH_SIZE: 1, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0, NUM_STREAMS: 1, OPTIMAL_BATCH_SIZE: 1, OPTIMIZATION_CAPABILITIES: FP32 BIN FP16 INT8 GPU_USM_MEMORY EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 2 1, RANGE_FOR_STREAMS: 1 2, WEIGHTS_PATH:  }
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: NPU; plugin configuration
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: NPU; plugin configuration: { AVAILABLE_DEVICES: 3720, CACHE_DIR: , COMPILATION_NUM_THREADS: 22, DEVICE_ARCHITECTURE: 3720, DEVICE_GOPS: {bf16:0,f16:4300.8,f32:0,i8:8601.6,u8:8601.6}, DEVICE_ID: , DEVICE_PCI_INFO: {domain: 0 bus: 0 device: 0xb function: 0}, DEVICE_TYPE: integrated, DEVICE_UUID: 80d1d11eb73811eab3de0242ac130004, ENABLE_CPU_PINNING: NO, EXECUTION_DEVICES: NPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) AI Boost, INFERENCE_PRECISION_HINT: f16, LOG_LEVEL: LOG_ERROR, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0, NPU_BYPASS_UMD_CACHING: NO, NPU_COMPILATION_MODE_PARAMS: , NPU_COMPILER_DYNAMIC_QUANTIZATION: NO, NPU_COMPILER_VERSION: 458777, NPU_DEFER_WEIGHTS_LOAD: NO, NPU_DEVICE_ALLOC_MEM_SIZE: 0, NPU_DEVICE_TOTAL_MEM_SIZE: 67001077760, NPU_DRIVER_VERSION: 1759931171, NPU_MAX_TILES: 2, NPU_QDQ_OPTIMIZATION: NO, NPU_QDQ_OPTIMIZATION_AGGRESSIVE: NO, NPU_TILES: -1, NPU_TURBO: NO, NUM_STREAMS: 1, OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1, OPTIMIZATION_CAPABILITIES: FP16 INT8 EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 1, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 10 1, RANGE_FOR_STREAMS: 1 4, WEIGHTS_PATH: , WORKLOAD_TYPE: DEFAULT }
[2025-10-15 13:17:19.313][171741][serving][info][capimodule.cpp:40] C-APIModule starting
[2025-10-15 13:17:19.313][171741][serving][info][capimodule.cpp:42] C-APIModule started
[2025-10-15 13:17:19.313][171741][serving][info][grpcservermodule.cpp:102] GRPCServerModule starting
[2025-10-15 13:17:19.313][171741][serving][info][grpcservermodule.cpp:106] GRPCServerModule started
[2025-10-15 13:17:19.313][171741][serving][info][grpcservermodule.cpp:107] Port was not set. GRPC server will not be started.
[2025-10-15 13:17:19.313][171741][serving][info][httpservermodule.cpp:35] HTTPServerModule starting
[2025-10-15 13:17:19.313][171741][serving][info][httpservermodule.cpp:39] Will start 22 REST workers
[2025-10-15 13:17:19.313][171741][serving][debug][drogon_http_server.cpp:40] Starting http thread pool for streaming (22 threads)
[2025-10-15 13:17:19.314][171741][serving][debug][drogon_http_server.cpp:42] Thread pool started
[2025-10-15 13:17:19.314][171741][serving][debug][drogon_http_server.cpp:66] DrogonHttpServer::startAcceptingRequests()
[2025-10-15 13:17:19.314][171741][serving][debug][drogon_http_server.cpp:148] Waiting for drogon to become ready on port 8326...
[2025-10-15 13:17:19.314][171785][serving][debug][drogon_http_server.cpp:102] Starting to listen on port 8326
[2025-10-15 13:17:19.314][171785][serving][debug][drogon_http_server.cpp:103] Thread pool size for unary (22 drogon threads)
[2025-10-15 13:17:19.365][171741][serving][debug][drogon_http_server.cpp:157] Drogon run procedure took: 50.069 ms
[2025-10-15 13:17:19.365][171741][serving][info][drogon_http_server.cpp:158] REST server listening on port 8326 with 22 unary threads and 22 streaming threads
[2025-10-15 13:17:19.365][171741][serving][info][httpservermodule.cpp:61] HTTPServerModule started
[2025-10-15 13:17:19.365][171741][serving][info][httpservermodule.cpp:62] Started REST server at 127.0.0.1:8326
[2025-10-15 13:17:19.365][171741][serving][info][servablemanagermodule.cpp:51] ServableManagerModule starting
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:446] Graph: Qwen2.5-VL-3B-Instruct-ov-int4-npu path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/graph.pbtxt exists
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:807] Loading metric cli settings only once per server start.
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:238] Adding mediapipe graph config for Qwen2.5-VL-3B-Instruct-ov-int4-npu, /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/graph.pbtxt
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:940] Subconfig path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/subconfig.json provided for graph: Qwen2.5-VL-3B-Instruct-ov-int4-npu does not exist. Loading subconfig models will be skipped.
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:971] Subconfiguration file doesn't have models property.
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:490] Mediapipe graph:Qwen2.5-VL-3B-Instruct-ov-int4-npu was not loaded so far. Triggering load
[2025-10-15 13:17:19.365][171741][modelmanager][debug][mediapipegraphdefinition.cpp:129] Started validation of mediapipe: Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:19.365][171741][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2025-10-15 13:17:19.365][171741][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2025-10-15 13:17:19.366][171741][serving][info][mediapipegraphdefinition.cpp:421] MediapipeGraphDefinition initializing graph nodes
[2025-10-15 13:17:19.366][171741][modelmanager][info][servable_initializer.cpp:491] Initializing Visual Language Model Legacy servable
[2025-10-15 13:17:19.459][171741][serving][error][servable_initializer.cpp:80] Error during llm node initialization for models_path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/./ exception: Exception from /root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/external/llm_engine/src/cpp/src/utils.cpp:66:
Failed to extract MAX_PROMPT_LEN. Type mismatch: expected types: int or int64_t

[2025-10-15 13:17:19.459][171741][modelmanager][error][servable_initializer.cpp:495] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2025-10-15 13:17:19.459][171741][serving][error][mediapipegraphdefinition.cpp:472] Failed to process LLM node graph Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:19.460][171741][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: Qwen2.5-VL-3B-Instruct-ov-int4-npu state: BEGIN handling: ValidationFailedEvent: 
[2025-10-15 13:17:19.460][171741][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: Qwen2.5-VL-3B-Instruct-ov-int4-npu state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent: 
[2025-10-15 13:17:19.460][171741][modelmanager][error][modelmanager.cpp:184] Couldn't start model manager
[2025-10-15 13:17:19.460][171741][serving][error][servablemanagermodule.cpp:58] ovms::ModelManager::Start() Error: The LLM Node resource initialization failed
[2025-10-15 13:17:19.460][171741][serving][info][grpcservermodule.cpp:188] GRPCServerModule shutting down
[2025-10-15 13:17:19.460][171741][serving][info][grpcservermodule.cpp:198] GRPCServerModule shutdown
[2025-10-15 13:17:19.460][171741][serving][info][httpservermodule.cpp:73] HTTPServerModule shutting down
[2025-10-15 13:17:19.465][171785][serving][debug][drogon_http_server.cpp:138] drogon::run() exits normally
[2025-10-15 13:17:19.465][171741][serving][info][httpservermodule.cpp:84] Shutdown HTTP server
[2025-10-15 13:17:19.465][171741][serving][info][servablemanagermodule.cpp:65] ServableManagerModule shutting down
[2025-10-15 13:17:19.465][171741][serving][info][servablemanagermodule.cpp:71] ServableManagerModule shutdown
[2025-10-15 13:17:19.465][171741][serving][info][pythoninterpretermodule.cpp:61] PythonInterpreterModule shutting down
[2025-10-15 13:17:19.465][171741][serving][debug][python_backend.cpp:52] Python backend destructor start
[2025-10-15 13:17:19.465][171741][serving][debug][python_backend.cpp:56] Python backend destructor end
[2025-10-15 13:17:19.465][171741][serving][info][pythoninterpretermodule.cpp:65] PythonInterpreterModule shutdown
[2025-10-15 13:17:19.467][171741][serving][info][capimodule.cpp:50] C-APIModule shutting down
[2025-10-15 13:17:19.467][171741][serving][info][capimodule.cpp:52] C-APIModule shutdown

Configuration

  1. OVMS version: 2025.3.0
  2. OVMS config.json file: none
  3. CPU, accelerator's versions if applicable: n/a
  4. Model repository directory structure: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
  5. Model or publicly available similar model that reproduces the issue: n/a

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions