Skip to content

Load sweeps create duplicate request rates #229

@Bslabe123

Description

@Bslabe123

What happened:

Deployed with the following config and expected all 10 of the rates to be unique:

  load:
    interval: 10
    sweep:
      num_stages: 10
      stage_duration: 300
      type: linear

Observed the following requested rates: [1,1,2,2,2,3,3,3,4,4]

Environment:

  • inference-perf version: quay.io/inference-perf/inference-perf:latest at 2025-09-20, 06:33:01 PM
  • config.yml (entire one printed by the benchmark run):
api:
  streaming: true
  type: completion
data:
  input_distribution: null
  output_distribution: null
  shared_prefix:
    num_groups: 1
    num_prompts_per_group: 100000
    output_len: 128
    question_len: 64
    system_prompt_len: 64
  type: shared_prefix
load:
  interval: 10
  sweep:
    num_stages: 10
    stage_duration: 300
    type: linear
  type: constant
metrics:
  prometheus:
    filters:
    - namespace="test-ns"
    google_managed: true
    scrape_interval: 15
    url: null
  type: prometheus
report:
  prometheus:
    per_stage: true
    summary: true
  request_lifecycle:
    per_request: true
    per_stage: true
    summary: true
server:
  base_url: http://gpt-oss-20b-vllm-service.test-ns.svc.cluster.local:8000
  ignore_eos: true
  model_name: openai/gpt-oss-20b
  type: vllm
storage:
  google_cloud_storage:
    bucket_name: test-bucket
    path: test-ns
tokenizer:
  pretrained_model_name_or_path: openai/gpt-oss-20b
  token: hf_xxxxx

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions