generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What happened:
Deployed with the following config and expected all 10 of the rates to be unique:
load:
interval: 10
sweep:
num_stages: 10
stage_duration: 300
type: linear
Observed the following requested rates: [1,1,2,2,2,3,3,3,4,4]
Environment:
- inference-perf version:
quay.io/inference-perf/inference-perf:latest
at2025-09-20, 06:33:01 PM
- config.yml (entire one printed by the benchmark run):
api:
streaming: true
type: completion
data:
input_distribution: null
output_distribution: null
shared_prefix:
num_groups: 1
num_prompts_per_group: 100000
output_len: 128
question_len: 64
system_prompt_len: 64
type: shared_prefix
load:
interval: 10
sweep:
num_stages: 10
stage_duration: 300
type: linear
type: constant
metrics:
prometheus:
filters:
- namespace="test-ns"
google_managed: true
scrape_interval: 15
url: null
type: prometheus
report:
prometheus:
per_stage: true
summary: true
request_lifecycle:
per_request: true
per_stage: true
summary: true
server:
base_url: http://gpt-oss-20b-vllm-service.test-ns.svc.cluster.local:8000
ignore_eos: true
model_name: openai/gpt-oss-20b
type: vllm
storage:
google_cloud_storage:
bucket_name: test-bucket
path: test-ns
tokenizer:
pretrained_model_name_or_path: openai/gpt-oss-20b
token: hf_xxxxx
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.