v1.0.8

WarningRan released this 01 Oct 14:28

· 5 commits to dev since this release

c418191

infer/vllm/client

Added UTC timestamp logging at benchmark start and end (client_time_start, client_time_end)

infer/vllm/process

Added --enable_prom_metrics flag to collect CPU and memory metrics from Prometheus/Thanos
Added new output fields: req_thp, num_prompts, server_id, client_mode, prom_metrics
Added support for multi-client benchmark mode (detects client.log.* files)
Extract warmup time from server.log for server mode
Better fallback logic: calculates average sizes from actual token counts when not explicitly configured

infer/vllm/prometheus_metrics.py (new file)

Query CPU and memory metrics from Prometheus/Thanos
Supports avg_over_time and max_over_time query functions
Requires environment variables: THANOS_API_TOKEN, THANOS_API_URL
Uses 5-minute step resolution for queries

infer/vllm/runner

Changed timestamp generation to UTC format

Dependencies

Added prometheus-api-client

Assets 2