v1.0.8
infer/vllm/client
- Added UTC timestamp logging at benchmark start and end (
client_time_start,client_time_end)
infer/vllm/process
- Added
--enable_prom_metricsflag to collect CPU and memory metrics from Prometheus/Thanos - Added new output fields:
req_thp,num_prompts,server_id,client_mode,prom_metrics - Added support for multi-client benchmark mode (detects
client.log.*files) - Extract warmup time from
server.logfor server mode - Better fallback logic: calculates average sizes from actual token counts when not explicitly configured
infer/vllm/prometheus_metrics.py (new file)
- Query CPU and memory metrics from Prometheus/Thanos
- Supports
avg_over_timeandmax_over_timequery functions - Requires environment variables:
THANOS_API_TOKEN,THANOS_API_URL - Uses 5-minute step resolution for queries
infer/vllm/runner
- Changed timestamp generation to UTC format
Dependencies
- Added
prometheus-api-client