generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
What happened:
Seeing the following lifecycle metrics report
'{
"load_summary": {
"count": 300,
"schedule_accuracy": {
"mean": 41.392056,
"min": 0.00021657249,
"max": 190.29817,
"p90": 149.24591
},
"send_duration": 300.10364,
"requested_rate": 1,
"achieved_rate": 0.9996547
},
"successes": {
"count": 283,
"latency": {
"request_latency": {
"mean": 13.698023,
"min": 0.06531374,
"max": 180.07655,
"p90": 14.04373
},
"normalized_time_per_output_token": {},
"time_per_output_token": {
"mean": 0.042900182,
"min": 0.002089543,
"max": 3.380381,
"p90": 0.04745394
},
"time_to_first_token": {
"mean": 0.31647572,
"min": 0.02519054,
"max": 1.2869523,
"p90": 0.94861716
},
"inter_token_latency": {
"mean": 0.031301748,
"min": 0.0000011720113,
"max": 179.12752,
"p90": 0.016776484
}
},
"throughput": {
"input_tokens_per_sec": 243.59926,
"output_tokens_per_sec": 0,
"total_tokens_per_sec": 243.59926,
"requests_per_sec": 0.90133476
},
"prompt_len": {
"mean": 270.265,
"min": 2,
"max": 1927,
"p90": 715.4
},
"output_len": {}
},
"failures": {
"count": 17,
"request_latency": {
"mean": 4.065165,
"min": 0.030937817,
"max": 67.88596,
"p90": 0.18836944
}
}
}'
What you expected to happen:
output_len
should never be None, either better error handling is needed or theres a completion API response parsing bug.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
config.yaml
config:
api:
streaming: true
type: completion
data:
input_distribution: null
output_distribution: null
type: shareGPT
load:
stages:
- duration: 300
rate: 1
- duration: 300
rate: 12.02
- duration: 300
rate: 19.47
- duration: 300
rate: 24.5
- duration: 300
rate: 27.91
- duration: 300
rate: 30.21
- duration: 300
rate: 31.76
- duration: 300
rate: 32.81
- duration: 300
rate: 33.52
- duration: 300
rate: 34
type: constant
metrics:
prometheus:
filters:
- namespace="default"
google_managed: true
scrape_interval: 15
url: null
type: prometheus
report:
prometheus:
per_stage: true
summary: true
request_lifecycle:
per_request: true
per_stage: true
summary: true
server:
base_url: http://gemma-3-4b-it-vllm-service.default.svc.cluster.local:8000
ignore_eos: true
model_name: google/gemma-3-4b-it
type: vllm
storage:
google_cloud_storage:
bucket_name: slabe-dev-bucket
path: default
tokenizer:
pretrained_model_name_or_path: google/gemma-3-4b-it
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.