Skip to content

v1.0.8

Choose a tag to compare

@WarningRan WarningRan released this 01 Oct 14:28
· 5 commits to dev since this release
c418191

infer/vllm/client

  • Added UTC timestamp logging at benchmark start and end (client_time_start, client_time_end)

infer/vllm/process

  • Added --enable_prom_metrics flag to collect CPU and memory metrics from Prometheus/Thanos
  • Added new output fields: req_thp, num_prompts, server_id, client_mode, prom_metrics
  • Added support for multi-client benchmark mode (detects client.log.* files)
  • Extract warmup time from server.log for server mode
  • Better fallback logic: calculates average sizes from actual token counts when not explicitly configured

infer/vllm/prometheus_metrics.py (new file)

  • Query CPU and memory metrics from Prometheus/Thanos
  • Supports avg_over_time and max_over_time query functions
  • Requires environment variables: THANOS_API_TOKEN, THANOS_API_URL
  • Uses 5-minute step resolution for queries

infer/vllm/runner

  • Changed timestamp generation to UTC format

Dependencies

  • Added prometheus-api-client