Skip to content

[Performance]: [AutoDeploy] Benchmark and analyze AD-vLLM perf gap for Nemotron MoE FP8 tp=1 #9268

@galagam

Description

@galagam

Proposal to improve performance

Nemotron MoE FP8 tp=1. Compare to vLLM perf on H100/B200.
Sweep over max concurrency and prepare output tok/s vs tok/user/s pareto curves.
Dump traces for both vLLM and AD.
Analyze traces and identify possible performance optimizations for AD.

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

General perf<NV>Broad performance issues not specific to a particular componentPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions