Using max_attention_window (VSWA) reduces concurrent batch size and causes drop in throughput (gemma3 trt backend)

### System Info

GPU: NVIDIA A100, NVIDIA H100
TensorRT-LLM version: 1.0.0rc5
TensorRT-LLM commit: b3ca159787226d59717474a304041d9dcb27dd42

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Build: following [gemma3 guide](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/gemma#run-inference-under-bfloat16-for-hf-checkpoint-1)


Serve: trtllm-serve with max_attention_window = [512, 512, 512, 512, 512, 3100]

### Expected behavior

When the sequence length is shorter than the minimum attention window, 
concurrent batch size remains the same compared to when max_attention_window is not used.
When it is longer, the batch size increases.
It works well in vLLM.

### actual behavior

However the batch size decreases in both cases, resulting in a significant drop in throughput.

### additional notes

This behavior has continued since the referenced [commit](https://github.com/NVIDIA/TensorRT-LLM/pull/4667).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using max_attention_window (VSWA) reduces concurrent batch size and causes drop in throughput (gemma3 trt backend) #6503

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using max_attention_window (VSWA) reduces concurrent batch size and causes drop in throughput (gemma3 trt backend) #6503

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions