-
Notifications
You must be signed in to change notification settings - Fork 232
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When setting --max_prompt_len 2048 as cli option, ovms fails to start up. It complains that the value provided is not an integer.
/root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/external/llm_engine/src/cpp/src/utils.cpp:66:
Failed to extract MAX_PROMPT_LEN. Type mismatch: expected types: int or int64_t
To Reproduce
Steps to reproduce the behavior:
- Steps to prepare models repository:
cd /model-repo && git clone https://huggingface.co/helenai/Qwen2.5-VL-3B-Instruct-ov-int4-npu - OVMS launch command:
ovms --rest_port 8326 --rest_bind_address 127.0.0.1 --source_model Qwen2.5-VL-3B-Instruct-ov-int4-npu --model_repository_path /model-repo --target_device NPU --task text_generation --max_prompt_len 2048 --log_level DEBUG
Logs
$ ovms --rest_port 8326 --rest_bind_address 127.0.0.1 --source_model Qwen2.5-VL-3B-Instruct-ov-int4-npu --model_repository_path /model-repo --target_device NPU --task text_generation --max_prompt_len 2048 --log_level DEBUG
[2025-10-15 13:17:18.989][171741][serving][info][server.cpp:91] OpenVINO Model Server 2025.3.0.6e2e910de
[2025-10-15 13:17:18.989][171741][serving][info][server.cpp:92] OpenVINO backend 2025.3.0.dev20250826
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:93] CLI parameters passed to ovms server
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:100] model_path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:101] model_name: Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:102] batch_size:
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:103] shape:
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:104] model_version_policy:
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:105] nireq: 0
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:106] target_device: NPU
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:107] plugin_config:
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:108] stateful: false
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:109] metrics_enabled: false
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:110] metrics_list:
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:111] idle_sequence_cleanup: true
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:112] max_sequence_number: 500
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:113] low_latency_transformation: false
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:117] gRPC port: 0
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:118] REST port: 8326
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:119] gRPC bind address: 0.0.0.0
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:120] REST bind address: 127.0.0.1
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:121] REST workers: 22
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:122] gRPC workers: 1
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:123] gRPC channel arguments:
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:124] log level: DEBUG
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:125] log path:
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:126] file system poll wait milliseconds: 1000
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:127] sequence cleaner poll wait minutes: 5
[2025-10-15 13:17:18.989][171741][serving][debug][server.cpp:128] model_repository_path: /model-repo
[2025-10-15 13:17:18.989][171741][serving][debug][optimum_export.cpp:160] Path already exists on local filesystem. Not downloading to path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
Model: Qwen2.5-VL-3B-Instruct-ov-int4-npu downloaded to: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][debug][filesystem.cpp:191] Creating file /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/graph.pbtxt
Graph: graph.pbtxt created in: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:18.989][171741][serving][info][pythoninterpretermodule.cpp:37] PythonInterpreterModule starting
Python version:
3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
Python sys.path output:
['', '/my-path/lib/python', '/my-path/lib/python3.12/site-packages', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/usr/lib/python3/dist-packages']
[2025-10-15 13:17:19.000][171741][serving][debug][python_backend.cpp:46] Creating python backend
[2025-10-15 13:17:19.000][171741][serving][info][pythoninterpretermodule.cpp:50] PythonInterpreterModule started
[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered Calculators: AddHeaderCalculator, AlignmentPointsRectsCalculator, AnnotationOverlayCalculator, AnomalyCalculator, AnomalySerializationCalculator, AssociationNormRectCalculator, BeginLoopDetectionCalculator, BeginLoopFloatCalculator, BeginLoopGpuBufferCalculator, BeginLoopImageCalculator, BeginLoopImageFrameCalculator, BeginLoopIntCalculator, BeginLoopMatrixCalculator, BeginLoopMatrixVectorCalculator, BeginLoopModelApiDetectionCalculator, BeginLoopNormalizedLandmarkListVectorCalculator, BeginLoopNormalizedRectCalculator, BeginLoopRectanglePredictionCalculator, BeginLoopStringCalculator, BeginLoopTensorCalculator, BeginLoopUint64tCalculator, BoxDetectorCalculator, BoxTrackerCalculator, CallbackCalculator, CallbackPacketCalculator, CallbackWithHeaderCalculator, ClassificationCalculator, ClassificationListVectorHasMinSizeCalculator, ClassificationListVectorSizeCalculator, ClassificationSerializationCalculator, ClipDetectionVectorSizeCalculator, ClipNormalizedRectVectorSizeCalculator, ColorConvertCalculator, ConcatenateBoolVectorCalculator, ConcatenateClassificationListCalculator, ConcatenateClassificationListVectorCalculator, ConcatenateDetectionVectorCalculator, ConcatenateFloatVectorCalculator, ConcatenateImageVectorCalculator, ConcatenateInt32VectorCalculator, ConcatenateJointListCalculator, ConcatenateLandmarListVectorCalculator, ConcatenateLandmarkListCalculator, ConcatenateLandmarkListVectorCalculator, ConcatenateLandmarkVectorCalculator, ConcatenateNormalizedLandmarkListCalculator, ConcatenateNormalizedLandmarkListVectorCalculator, ConcatenateRenderDataVectorCalculator, ConcatenateStringVectorCalculator, ConcatenateTensorVectorCalculator, ConcatenateTfLiteTensorVectorCalculator, ConcatenateUInt64VectorCalculator, ConstantSidePacketCalculator, CountingSourceCalculator, CropCalculator, DefaultSidePacketCalculator, DequantizeByteArrayCalculator, DetectionCalculator, DetectionClassificationCombinerCalculator, DetectionClassificationResultCalculator, DetectionClassificationSerializationCalculator, DetectionExtractionCalculator, DetectionLabelIdToTextCalculator, DetectionLetterboxRemovalCalculator, DetectionProjectionCalculator, DetectionSegmentationCombinerCalculator, DetectionSegmentationResultCalculator, DetectionSegmentationSerializationCalculator, DetectionSerializationCalculator, DetectionsToRectsCalculator, DetectionsToRenderDataCalculator, EmbeddingsCalculator, EmbeddingsCalculatorOV, EmptyLabelCalculator, EmptyLabelClassificationCalculator, EmptyLabelDetectionCalculator, EmptyLabelRotatedDetectionCalculator, EmptyLabelSegmentationCalculator, EndLoopAffineMatrixCalculator, EndLoopBooleanCalculator, EndLoopClassificationListCalculator, EndLoopDetectionCalculator, EndLoopFloatCalculator, EndLoopGpuBufferCalculator, EndLoopImageCalculator, EndLoopImageFrameCalculator, EndLoopImageSizeCalculator, EndLoopLandmarkListVectorCalculator, EndLoopMatrixCalculator, EndLoopModelApiDetectionClassificationCalculator, EndLoopModelApiDetectionSegmentationCalculator, EndLoopNormalizedLandmarkListVectorCalculator, EndLoopNormalizedRectCalculator, EndLoopPolygonPredictionsCalculator, EndLoopRectanglePredictionsCalculator, EndLoopRenderDataCalculator, EndLoopTensorCalculator, EndLoopTfLiteTensorCalculator, FaceLandmarksToRenderDataCalculator, FeatureDetectorCalculator, FlowLimiterCalculator, FlowPackagerCalculator, FlowToImageCalculator, FromImageCalculator, GateCalculator, GetClassificationListVectorItemCalculator, GetDetectionVectorItemCalculator, GetLandmarkListVectorItemCalculator, GetNormalizedLandmarkListVectorItemCalculator, GetNormalizedRectVectorItemCalculator, GetRectVectorItemCalculator, GraphProfileCalculator, HandDetectionsFromPoseToRectsCalculator, HandLandmarksToRectCalculator, HttpLLMCalculator, HttpSerializationCalculator, ImageCloneCalculator, ImageCroppingCalculator, ImageGenCalculator, ImagePropertiesCalculator, ImageToTensorCalculator, ImageTransformationCalculator, ImmediateMuxCalculator, InferenceCalculatorCpu, InstanceSegmentationCalculator, InverseMatrixCalculator, IrisToRenderDataCalculator, KeypointDetectionCalculator, LandmarkLetterboxRemovalCalculator, LandmarkListVectorSizeCalculator, LandmarkProjectionCalculator, LandmarkVisibilityCalculator, LandmarksRefinementCalculator, LandmarksSmoothingCalculator, LandmarksToDetectionCalculator, LandmarksToRenderDataCalculator, MakePairCalculator, MatrixMultiplyCalculator, MatrixSubtractCalculator, MatrixToVectorCalculator, MediaPipeInternalSidePacketToPacketStreamCalculator, MergeCalculator, MergeDetectionsToVectorCalculator, MergeGpuBuffersToVectorCalculator, MergeImagesToVectorCalculator, ModelInferHttpRequestCalculator, ModelInferRequestImageCalculator, MotionAnalysisCalculator, MuxCalculator, NonMaxSuppressionCalculator, NonZeroCalculator, NormalizedLandmarkListVectorHasMinSizeCalculator, NormalizedRectVectorHasMinSizeCalculator, OpenCvEncodedImageToImageFrameCalculator, OpenCvImageEncoderCalculator, OpenCvPutTextCalculator, OpenCvVideoDecoderCalculator, OpenCvVideoEncoderCalculator, OpenVINOConverterCalculator, OpenVINOInferenceAdapterCalculator, OpenVINOInferenceCalculator, OpenVINOModelServerSessionCalculator, OpenVINOTensorsToClassificationCalculator, OpenVINOTensorsToDetectionsCalculator, OverlayCalculator, PacketClonerCalculator, PacketGeneratorWrapperCalculator, PacketInnerJoinCalculator, PacketPresenceCalculator, PacketResamplerCalculator, PacketSequencerCalculator, PacketThinnerCalculator, PassThroughCalculator, PreviousLoopbackCalculator, PyTensorOvTensorConverterCalculator, PythonExecutorCalculator, QuantizeFloatVectorCalculator, RectToRenderDataCalculator, RectToRenderScaleCalculator, RectTransformationCalculator, RefineLandmarksFromHeatmapCalculator, RerankCalculator, RerankCalculatorOV, ResourceProviderCalculator, RoiTrackingCalculator, RotatedDetectionCalculator, RotatedDetectionSerializationCalculator, RoundRobinDemuxCalculator, SegmentationCalculator, SegmentationSerializationCalculator, SegmentationSmoothingCalculator, SequenceShiftCalculator, SerializationCalculator, SetLandmarkVisibilityCalculator, SidePacketToStreamCalculator, SplitAffineMatrixVectorCalculator, SplitClassificationListVectorCalculator, SplitDetectionVectorCalculator, SplitFloatVectorCalculator, SplitImageVectorCalculator, SplitJointListCalculator, SplitLandmarkListCalculator, SplitLandmarkVectorCalculator, SplitMatrixVectorCalculator, SplitNormalizedLandmarkListCalculator, SplitNormalizedLandmarkListVectorCalculator, SplitNormalizedRectVectorCalculator, SplitTensorVectorCalculator, SplitTfLiteTensorVectorCalculator, SplitUint64tVectorCalculator, SsdAnchorsCalculator, StreamToSidePacketCalculator, StringToInt32Calculator, StringToInt64Calculator, StringToIntCalculator, StringToUint32Calculator, StringToUint64Calculator, StringToUintCalculator, SwitchDemuxCalculator, SwitchMuxCalculator, TensorsToClassificationCalculator, TensorsToDetectionsCalculator, TensorsToFloatsCalculator, TensorsToLandmarksCalculator, TensorsToSegmentationCalculator, TfLiteConverterCalculator, TfLiteCustomOpResolverCalculator, TfLiteInferenceCalculator, TfLiteModelCalculator, TfLiteTensorsToDetectionsCalculator, TfLiteTensorsToFloatsCalculator, TfLiteTensorsToLandmarksCalculator, ThresholdingCalculator, ToImageCalculator, TrackedDetectionManagerCalculator, Tvl1OpticalFlowCalculator, UpdateFaceLandmarksCalculator, VideoPreStreamCalculator, VisibilityCopyCalculator, VisibilitySmoothingCalculator, WarpAffineCalculator, WarpAffineCalculatorCpu, WorldLandmarkProjectionCalculator
[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered Subgraphs: FaceDetection, FaceDetectionFrontDetectionToRoi, FaceDetectionFrontDetectionsToRoi, FaceDetectionShortRange, FaceDetectionShortRangeByRoiCpu, FaceDetectionShortRangeCpu, FaceLandmarkCpu, FaceLandmarkFrontCpu, FaceLandmarkLandmarksToRoi, FaceLandmarksFromPoseCpu, FaceLandmarksFromPoseToRecropRoi, FaceLandmarksModelLoader, FaceLandmarksToRoi, FaceTracking, HandLandmarkCpu, HandLandmarkModelLoader, HandLandmarksFromPoseCpu, HandLandmarksFromPoseToRecropRoi, HandLandmarksLeftAndRightCpu, HandLandmarksToRoi, HandRecropByRoiCpu, HandTracking, HandVisibilityFromHandLandmarksFromPose, HandWristForPose, HolisticLandmarkCpu, HolisticTrackingToRenderData, InferenceCalculator, IrisLandmarkCpu, IrisLandmarkLandmarksToRoi, IrisLandmarkLeftAndRightCpu, IrisRendererCpu, PoseDetectionCpu, PoseDetectionToRoi, PoseLandmarkByRoiCpu, PoseLandmarkCpu, PoseLandmarkFiltering, PoseLandmarkModelLoader, PoseLandmarksAndSegmentationInverseProjection, PoseLandmarksToRoi, PoseSegmentationFiltering, SwitchContainer, TensorsToFaceLandmarks, TensorsToFaceLandmarksWithAttention, TensorsToPoseLandmarksAndSegmentation
[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered InputStreamHandlers: BarrierInputStreamHandler, DefaultInputStreamHandler, EarlyCloseInputStreamHandler, FixedSizeInputStreamHandler, ImmediateInputStreamHandler, MuxInputStreamHandler, SyncSetInputStreamHandler, TimestampAlignInputStreamHandler
[2025-10-15 13:17:19.001][171741][modelmanager][debug][mediapipefactory.cpp:52] Registered OutputStreamHandlers: InOrderOutputStreamHandler
[2025-10-15 13:17:19.312][171741][modelmanager][info][modelmanager.cpp:156] Available devices for Open VINO: CPU, GPU, NPU
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: CPU; plugin configuration
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: CPU; plugin configuration: { AVAILABLE_DEVICES: , CPU_DENORMALS_OPTIMIZATION: NO, CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1, DEVICE_ARCHITECTURE: intel64, DEVICE_ID: , DEVICE_TYPE: integrated, DYNAMIC_QUANTIZATION_GROUP_SIZE: 32, ENABLE_CPU_PINNING: YES, ENABLE_CPU_RESERVATION: NO, ENABLE_HYPER_THREADING: YES, ENABLE_TENSOR_PARALLEL: NO, EXECUTION_DEVICES: CPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Core(TM) Ultra 7 155H, INFERENCE_NUM_THREADS: 0, INFERENCE_PRECISION_HINT: f32, KEY_CACHE_GROUP_SIZE: 0, KEY_CACHE_PRECISION: u8, KV_CACHE_PRECISION: u8, LOG_LEVEL: LOG_NONE, MODEL_DISTRIBUTION_POLICY: , NUM_STREAMS: 1, OPTIMIZATION_CAPABILITIES: FP32 INT8 BIN EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 1 1, RANGE_FOR_STREAMS: 1 22, SCHEDULING_CORE_TYPE: ANY_CORE, VALUE_CACHE_GROUP_SIZE: 0, VALUE_CACHE_PRECISION: u8 }
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: GPU; plugin configuration
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: GPU; plugin configuration: { ACTIVATIONS_SCALE_FACTOR: -1, AVAILABLE_DEVICES: 0, CACHE_DIR: , CACHE_ENCRYPTION_CALLBACKS: , CACHE_MODE: optimize_speed, COMPILATION_NUM_THREADS: 22, CONFIG_FILE: , DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.71.4, DEVICE_GOPS: {f16:9216,f32:4608,i8:18432,u8:18432}, DEVICE_ID: 0, DEVICE_LUID: 409a0000499a0000, DEVICE_PCI_INFO: {domain: 0 bus: 0 device: 0x2 function: 0}, DEVICE_TYPE: integrated, DEVICE_UUID: 8680557d080000000002000000000000, DYNAMIC_QUANTIZATION_GROUP_SIZE: 0, ENABLE_CPU_PINNING: NO, ENABLE_CPU_RESERVATION: NO, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Arc(TM) Graphics (iGPU), GPU_DEVICE_ID: 0x7d55, GPU_DEVICE_TOTAL_MEM_SIZE: 62509150208, GPU_DISABLE_WINOGRAD_CONVOLUTION: NO, GPU_ENABLE_LOOP_UNROLLING: YES, GPU_ENABLE_LORA_OPERATION: YES, GPU_ENABLE_SDPA_OPTIMIZATION: YES, GPU_EXECUTION_UNITS_COUNT: 128, GPU_HOST_TASK_PRIORITY: MEDIUM, GPU_MEMORY_STATISTICS: {cl_mem:0,unknown:0,usm_device:0,usm_host:0,usm_shared:0}, GPU_QUEUE_PRIORITY: MEDIUM, GPU_QUEUE_THROTTLE: MEDIUM, GPU_UARCH_VERSION: 12.71.4, INFERENCE_PRECISION_HINT: f16, KV_CACHE_PRECISION: dynamic, MAX_BATCH_SIZE: 1, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0, NUM_STREAMS: 1, OPTIMAL_BATCH_SIZE: 1, OPTIMIZATION_CAPABILITIES: FP32 BIN FP16 INT8 GPU_USM_MEMORY EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 2 1, RANGE_FOR_STREAMS: 1 2, WEIGHTS_PATH: }
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: NPU; plugin configuration
[2025-10-15 13:17:19.313][171741][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: NPU; plugin configuration: { AVAILABLE_DEVICES: 3720, CACHE_DIR: , COMPILATION_NUM_THREADS: 22, DEVICE_ARCHITECTURE: 3720, DEVICE_GOPS: {bf16:0,f16:4300.8,f32:0,i8:8601.6,u8:8601.6}, DEVICE_ID: , DEVICE_PCI_INFO: {domain: 0 bus: 0 device: 0xb function: 0}, DEVICE_TYPE: integrated, DEVICE_UUID: 80d1d11eb73811eab3de0242ac130004, ENABLE_CPU_PINNING: NO, EXECUTION_DEVICES: NPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) AI Boost, INFERENCE_PRECISION_HINT: f16, LOG_LEVEL: LOG_ERROR, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0, NPU_BYPASS_UMD_CACHING: NO, NPU_COMPILATION_MODE_PARAMS: , NPU_COMPILER_DYNAMIC_QUANTIZATION: NO, NPU_COMPILER_VERSION: 458777, NPU_DEFER_WEIGHTS_LOAD: NO, NPU_DEVICE_ALLOC_MEM_SIZE: 0, NPU_DEVICE_TOTAL_MEM_SIZE: 67001077760, NPU_DRIVER_VERSION: 1759931171, NPU_MAX_TILES: 2, NPU_QDQ_OPTIMIZATION: NO, NPU_QDQ_OPTIMIZATION_AGGRESSIVE: NO, NPU_TILES: -1, NPU_TURBO: NO, NUM_STREAMS: 1, OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1, OPTIMIZATION_CAPABILITIES: FP16 INT8 EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 1, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 10 1, RANGE_FOR_STREAMS: 1 4, WEIGHTS_PATH: , WORKLOAD_TYPE: DEFAULT }
[2025-10-15 13:17:19.313][171741][serving][info][capimodule.cpp:40] C-APIModule starting
[2025-10-15 13:17:19.313][171741][serving][info][capimodule.cpp:42] C-APIModule started
[2025-10-15 13:17:19.313][171741][serving][info][grpcservermodule.cpp:102] GRPCServerModule starting
[2025-10-15 13:17:19.313][171741][serving][info][grpcservermodule.cpp:106] GRPCServerModule started
[2025-10-15 13:17:19.313][171741][serving][info][grpcservermodule.cpp:107] Port was not set. GRPC server will not be started.
[2025-10-15 13:17:19.313][171741][serving][info][httpservermodule.cpp:35] HTTPServerModule starting
[2025-10-15 13:17:19.313][171741][serving][info][httpservermodule.cpp:39] Will start 22 REST workers
[2025-10-15 13:17:19.313][171741][serving][debug][drogon_http_server.cpp:40] Starting http thread pool for streaming (22 threads)
[2025-10-15 13:17:19.314][171741][serving][debug][drogon_http_server.cpp:42] Thread pool started
[2025-10-15 13:17:19.314][171741][serving][debug][drogon_http_server.cpp:66] DrogonHttpServer::startAcceptingRequests()
[2025-10-15 13:17:19.314][171741][serving][debug][drogon_http_server.cpp:148] Waiting for drogon to become ready on port 8326...
[2025-10-15 13:17:19.314][171785][serving][debug][drogon_http_server.cpp:102] Starting to listen on port 8326
[2025-10-15 13:17:19.314][171785][serving][debug][drogon_http_server.cpp:103] Thread pool size for unary (22 drogon threads)
[2025-10-15 13:17:19.365][171741][serving][debug][drogon_http_server.cpp:157] Drogon run procedure took: 50.069 ms
[2025-10-15 13:17:19.365][171741][serving][info][drogon_http_server.cpp:158] REST server listening on port 8326 with 22 unary threads and 22 streaming threads
[2025-10-15 13:17:19.365][171741][serving][info][httpservermodule.cpp:61] HTTPServerModule started
[2025-10-15 13:17:19.365][171741][serving][info][httpservermodule.cpp:62] Started REST server at 127.0.0.1:8326
[2025-10-15 13:17:19.365][171741][serving][info][servablemanagermodule.cpp:51] ServableManagerModule starting
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:446] Graph: Qwen2.5-VL-3B-Instruct-ov-int4-npu path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/graph.pbtxt exists
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:807] Loading metric cli settings only once per server start.
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:238] Adding mediapipe graph config for Qwen2.5-VL-3B-Instruct-ov-int4-npu, /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/graph.pbtxt
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:940] Subconfig path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/subconfig.json provided for graph: Qwen2.5-VL-3B-Instruct-ov-int4-npu does not exist. Loading subconfig models will be skipped.
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:971] Subconfiguration file doesn't have models property.
[2025-10-15 13:17:19.365][171741][modelmanager][debug][modelmanager.cpp:490] Mediapipe graph:Qwen2.5-VL-3B-Instruct-ov-int4-npu was not loaded so far. Triggering load
[2025-10-15 13:17:19.365][171741][modelmanager][debug][mediapipegraphdefinition.cpp:129] Started validation of mediapipe: Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:19.365][171741][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2025-10-15 13:17:19.365][171741][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2025-10-15 13:17:19.366][171741][serving][info][mediapipegraphdefinition.cpp:421] MediapipeGraphDefinition initializing graph nodes
[2025-10-15 13:17:19.366][171741][modelmanager][info][servable_initializer.cpp:491] Initializing Visual Language Model Legacy servable
[2025-10-15 13:17:19.459][171741][serving][error][servable_initializer.cpp:80] Error during llm node initialization for models_path: /model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu/./ exception: Exception from /root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/external/llm_engine/src/cpp/src/utils.cpp:66:
Failed to extract MAX_PROMPT_LEN. Type mismatch: expected types: int or int64_t
[2025-10-15 13:17:19.459][171741][modelmanager][error][servable_initializer.cpp:495] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2025-10-15 13:17:19.459][171741][serving][error][mediapipegraphdefinition.cpp:472] Failed to process LLM node graph Qwen2.5-VL-3B-Instruct-ov-int4-npu
[2025-10-15 13:17:19.460][171741][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: Qwen2.5-VL-3B-Instruct-ov-int4-npu state: BEGIN handling: ValidationFailedEvent:
[2025-10-15 13:17:19.460][171741][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: Qwen2.5-VL-3B-Instruct-ov-int4-npu state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent:
[2025-10-15 13:17:19.460][171741][modelmanager][error][modelmanager.cpp:184] Couldn't start model manager
[2025-10-15 13:17:19.460][171741][serving][error][servablemanagermodule.cpp:58] ovms::ModelManager::Start() Error: The LLM Node resource initialization failed
[2025-10-15 13:17:19.460][171741][serving][info][grpcservermodule.cpp:188] GRPCServerModule shutting down
[2025-10-15 13:17:19.460][171741][serving][info][grpcservermodule.cpp:198] GRPCServerModule shutdown
[2025-10-15 13:17:19.460][171741][serving][info][httpservermodule.cpp:73] HTTPServerModule shutting down
[2025-10-15 13:17:19.465][171785][serving][debug][drogon_http_server.cpp:138] drogon::run() exits normally
[2025-10-15 13:17:19.465][171741][serving][info][httpservermodule.cpp:84] Shutdown HTTP server
[2025-10-15 13:17:19.465][171741][serving][info][servablemanagermodule.cpp:65] ServableManagerModule shutting down
[2025-10-15 13:17:19.465][171741][serving][info][servablemanagermodule.cpp:71] ServableManagerModule shutdown
[2025-10-15 13:17:19.465][171741][serving][info][pythoninterpretermodule.cpp:61] PythonInterpreterModule shutting down
[2025-10-15 13:17:19.465][171741][serving][debug][python_backend.cpp:52] Python backend destructor start
[2025-10-15 13:17:19.465][171741][serving][debug][python_backend.cpp:56] Python backend destructor end
[2025-10-15 13:17:19.465][171741][serving][info][pythoninterpretermodule.cpp:65] PythonInterpreterModule shutdown
[2025-10-15 13:17:19.467][171741][serving][info][capimodule.cpp:50] C-APIModule shutting down
[2025-10-15 13:17:19.467][171741][serving][info][capimodule.cpp:52] C-APIModule shutdown
Configuration
- OVMS version: 2025.3.0
- OVMS config.json file: none
- CPU, accelerator's versions if applicable: n/a
- Model repository directory structure:
/model-repo/Qwen2.5-VL-3B-Instruct-ov-int4-npu - Model or publicly available similar model that reproduces the issue: n/a
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working