Skip to content

v1.0.5

Choose a tag to compare

@WarningRan WarningRan released this 08 Sep 19:41
· 14 commits to dev since this release
625332e

infer/vllm/process

  • Improved Error and Status Reporting

    • The output now includes status and error_msg fields, providing clarity on the run's outcome. Status can be OK (successful), PART_<err> (partially completed with a specific error), or ERR_<err> (failed with a specific error). A new function was also added to detect several predefined error patterns:

      • OOR: terminate called after throwing an instance of 'std::out_of_range'
      • MFS: DtException: Must find space in DDR
      • UMG: DtException: Unable to map graph within architecture constraints
      • PVF: DtException: Program verification failed
      • VMS: DtException: Need to find a valid memory space
      • DIR: RuntimeError.*DDR init retried
      • RPC: TimeoutError: RPC call to execute_model timed out.
      • PLT: assert prompt_len <= self.tkv
      • CTL: Please reduce the length of the messages or completion
      • If a run fails without a recognizable error pattern, the status will be ERR_UNKNOWN.
  • Captured parameter information even for failed runs

    • In direct mode, used FMWORK ARG as a fallback to record values like batch_size, input_size, and output_size.
    • In server mode, parsed input and batch sizes from server.cmd and client.cmd.
  • Captured client request completion data, which is included in the notes field in the format successful_requests:<num> and num-prompts:<num>

  • Added --model to normalize the model name in the output. The script extracts the original model name from the logs and splits it into a standardized model name and a new model_version field.

    • For example, if the model in the log is ibm-granite/granite-3.3-8b-instruct/main and the --model argument is ibm-granite/granite-3.3-8b-instruct, the output will show model: "ibm-granite/granite-3.3-8b-instruct" and model_version: "main".

infer/vllm/runner

  • In server mode, the runner script now prints the contents of server.log directly to the console and appends them to runner.log after execution. This allows pipeline users, who may not have access to the file system, to easily view the complete server logs.