v1.0.5
infer/vllm/process
-
Improved Error and Status Reporting
-
The output now includes
statusanderror_msgfields, providing clarity on the run's outcome. Status can be OK (successful), PART_<err> (partially completed with a specific error), or ERR_<err> (failed with a specific error). A new function was also added to detect several predefined error patterns:- OOR:
terminate called after throwing an instance of 'std::out_of_range' - MFS:
DtException: Must find space in DDR - UMG:
DtException: Unable to map graph within architecture constraints - PVF:
DtException: Program verification failed - VMS:
DtException: Need to find a valid memory space - DIR:
RuntimeError.*DDR init retried - RPC:
TimeoutError: RPC call to execute_model timed out. - PLT:
assert prompt_len <= self.tkv - CTL:
Please reduce the length of the messages or completion - If a run fails without a recognizable error pattern, the status will be ERR_UNKNOWN.
- OOR:
-
-
Captured parameter information even for failed runs
- In
directmode, usedFMWORK ARGas a fallback to record values likebatch_size,input_size, andoutput_size. - In
servermode, parsed input and batch sizes fromserver.cmdandclient.cmd.
- In
-
Captured client request completion data, which is included in the
notesfield in the formatsuccessful_requests:<num>andnum-prompts:<num> -
Added
--modelto normalize the model name in the output. The script extracts the original model name from the logs and splits it into a standardized model name and a new model_version field.- For example, if the model in the log is
ibm-granite/granite-3.3-8b-instruct/mainand the--modelargument isibm-granite/granite-3.3-8b-instruct, the output will showmodel: "ibm-granite/granite-3.3-8b-instruct"andmodel_version: "main".
- For example, if the model in the log is
infer/vllm/runner
- In
servermode, therunnerscript now prints the contents ofserver.logdirectly to the console and appends them torunner.logafter execution. This allows pipeline users, who may not have access to the file system, to easily view the complete server logs.