[Feature] metrics support #3534

CUHKSZzxy · 2025-05-09T13:07:32Z

Objective

Align with vLLM v1 metrics system and beyond. Here are several key alignments

Monotonic Timestamps:
-- Uses time.perf_counter() for interval calculations (avoids clock drift issues).
Metric Types:
-- Gauges: Active requests, cache usage, etc
-- Counters: Token totals, request success / failure counts, etc
-- Histograms: TTFT (Time-To-First-Token), TPOT (Inter-Token Latency), end-to-end latency, etc
Metrics Publishing:
-- CLI logging
-- Prometheus & Grafana

We only record critical timestamps and events inside the engine process without further processing. Heavy-weight metrics calculations or publishing are separated from the main loop to minimize overhead.

For convenient Grafana visualization and usage, we align with SGLang.

TODO

Refactor: After MP engine, things become different ... the global singleton context will be local for each process. Carrying the information from the engine to the async engine seems the most convenient and less error-prone way to do it; otherwise, we may perform IPC frequently.
Refactor: 1. Avoid parameter passing (singleton context), 2. Reduce computation overheads (high CPU overheads, but can be solved with MP engine)
Refactor: Decouple prometheus_client, only install / import when needed
Update: Add user guide
Refactor: Reduce messy parameters, pack things into a class
Feature: Grafana visualization
~~Feature: Expert information collections (deferred in another PR)~~
Refactor: Minimize the modifications to async engine generate() and engine _async_loop_main()
Fix: Use time.perf_counter()

Usage

Start the server with --enable-metrics

lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics

Metrics Publishing - Logging
With --enable-metrics, key metrics (e.g., finished / unfinished / running / waiting requests, token throughputs, cache usage) are printed to the terminal every 10 seconds.
Metrics Publishing - Prometheus & Grafana
-- Raw Metrics
Access the raw Prometheus metrics via http://localhost:23333/metrics/ .
You can also curl the metrics endpoint curl http:///localhost:23333/metrics/ to view raw Prometheus results. No extra setups are required for this step.

-- Prometheus Panel
Access the Prometheus panel via http://localhost:9090 (9090 is the current default port for the Prometheus panel). You need extra setups to access the Prometheus panel; please check the user guide for details.

-- Grafana Panel
Access the Grafana panel via http://localhost:3000 (3000 is the current default port for the Grafana panel). You need extra setups to access the Grafana panel; please check the user guide for details.

Request Timeline

The following diagram depicts how we define and calculate time intervals during the request lifecycle, which adheres to vLLM.

Performance Impacts

Conclusion

Tested with Qwen2.5-0.5B / Qwen2.5-7B / Qwen2.5-32B, no obvious performance impacts. (Requires #3627)

Check the following tables for output throughput details. We conducted tests using 1,000 prompts, with input length 1k and output length 1k. Each model was tested three times to reduce the impact of performance fluctuations.

QWen2.5-0.5B, TP1

W/O metrics (tokens/s)	W metrics (tokens/s)
20387	20555
20341	20877
20746	20771

QWen2.5-7B, TP1

W/O metrics (tokens/s)	W metrics (tokens/s)
8836	8721
8780	8736
8800	8723

QWen2.5-32B, TP2

W/O metrics (tokens/s)	W metrics (tokens/s)
3019	3160
3167	3165
3189	3173

Related Issues & PR

Conflicts: lmdeploy/messages.py lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/engine_instance.py lmdeploy/pytorch/messages.py lmdeploy/pytorch/paging/scheduler.py

Conflicts: lmdeploy/serve/openai/api_server.py

lmdeploy/pytorch/engine/engine_instance.py

docs/en/index.rst

docs/zh_cn/advance/metrics.md

RunningLeon · 2025-07-08T06:53:14Z

docs/en/advance/metrics.md

+  - job_name: lmdeploy
+    static_configs:
+      - targets:
+          - '$host_ip:$api_server_port1' # <= Modify this


can we config all dp server urls in here and show data in grafana board?

RunningLeon

LGTM

grimoire

LGTM

voycey · 2025-08-05T10:00:19Z

I see this has been merged but --enable-metrics is still not working?

RunningLeon · 2025-08-05T10:24:36Z

@voycey Hi, this feature only works for backend=pytorch, yet the default backend is turbomind. Metrics for turbomind backend will be added in another PR.

voycey · 2025-08-05T10:29:11Z

Docs dont mention anything about this being limited to PyTorch backend :(

Any ETA on Turbomind metrics? Its running incredibly fast for me and I would like to see what the token / second is on this - is there any other way?

CUHKSZzxy · 2025-08-05T10:44:03Z

@voycey
Thanks for your feedback, metrics support for turbomind is on the way and will be ready soon.
Check the following PR

Add turbomind metrics #3811

CUHKSZzxy added 2 commits May 9, 2025 20:38

metrics support prototype

f8b4000

Merge branch 'main' into metrics-support

3e4fca9

Conflicts: lmdeploy/messages.py lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/engine_instance.py lmdeploy/pytorch/messages.py lmdeploy/pytorch/paging/scheduler.py

CUHKSZzxy added the WIP label May 9, 2025

CUHKSZzxy added 22 commits May 12, 2025 18:01

Merge branch 'main' into metrics-support

02c46ec

Conflicts: lmdeploy/serve/openai/api_server.py

fix wrong conflict resolve

9ae6a1b

add GPU KV cache usage

7904d3a

independent logger for each DP

4a339c8

fix gpu cache usage

8c3ede1

Merge branch 'main' into metrics-support

ddeec2e

rename log stats

9229aa1

fix

862a708

update perf_counter and comments, some bug fix

74dc69a

Merge branch 'main' into metrics-support

19d81d4

overwrite with main branch

b87f099

Merge branch 'main' into metrics-support

d9f8e5a

refactor

0168eed

cleanup

d774cc3

fix

08200e1

add runtime cuda prometheus_client

a4d0ac9

fix

150d562

cleanup

1f80a8e

async log

aed3eea

fix gen throughput calculation

0931746

update max_model_len

57f3f91

Merge branch 'main' into metrics-support

4bdf89f

CUHKSZzxy removed the WIP label May 26, 2025

CUHKSZzxy added 2 commits May 26, 2025 20:29

fix running/waiting reqs calculations

83b7c60

Merge branch 'main' into metrics-support

67366b1

CUHKSZzxy marked this pull request as ready for review May 26, 2025 13:24

fix pr test

9729f0d

CUHKSZzxy added 6 commits June 20, 2025 15:37

update

95fd4a5

Merge branch 'main' into metrics-support

4470eb3

Merge branch 'pr-3627' into metrics-support

da61d89

optimize

892d5f0

Merge branch 'main' into metrics-support

002e7cf

fix merge

c121921

grimoire reviewed Jun 30, 2025

View reviewed changes

lmdeploy/pytorch/engine/engine_instance.py Outdated Show resolved Hide resolved

CUHKSZzxy added 4 commits July 2, 2025 16:33

refactor for MP engine

5daae5f

optimize

ab8b57a

Merge branch 'main' into metrics-support

b660df6

fix prometheus, grafana

b7b86e1

CUHKSZzxy removed the WIP label Jul 3, 2025

raise exception

149804c

lvhan028 mentioned this pull request Jul 7, 2025

Add metrics endpoint #1423

Closed

RunningLeon reviewed Jul 7, 2025

View reviewed changes

docs/en/index.rst Outdated Show resolved Hide resolved

fix docs

6fd552b

RunningLeon reviewed Jul 7, 2025

View reviewed changes

docs/zh_cn/advance/metrics.md Show resolved Hide resolved

CUHKSZzxy added 2 commits July 8, 2025 12:28

update DP>1 docs

1dcfd64

minor fix

1435ac8

RunningLeon reviewed Jul 8, 2025

View reviewed changes

cleanup

5e8b721

RunningLeon approved these changes Jul 8, 2025

View reviewed changes

add comments

dccc223

grimoire approved these changes Jul 8, 2025

View reviewed changes

lvhan028 merged commit 1e8ce56 into InternLM:main Jul 9, 2025
5 checks passed

CUHKSZzxy mentioned this pull request Aug 5, 2025

[Bug] Metrics endpoint 404 #3815

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] metrics support #3534

[Feature] metrics support #3534

Uh oh!

CUHKSZzxy commented May 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RunningLeon Jul 8, 2025 •

edited

Loading

Uh oh!

RunningLeon left a comment

Uh oh!

grimoire left a comment

Uh oh!

Uh oh!

voycey commented Aug 5, 2025

Uh oh!

RunningLeon commented Aug 5, 2025

Uh oh!

voycey commented Aug 5, 2025

Uh oh!

CUHKSZzxy commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Feature] metrics support #3534

[Feature] metrics support #3534

Uh oh!

Conversation

CUHKSZzxy commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

TODO

Usage

Request Timeline

Performance Impacts

Related Issues & PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RunningLeon Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RunningLeon left a comment

Choose a reason for hiding this comment

Uh oh!

grimoire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

voycey commented Aug 5, 2025

Uh oh!

RunningLeon commented Aug 5, 2025

Uh oh!

voycey commented Aug 5, 2025

Uh oh!

CUHKSZzxy commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CUHKSZzxy commented May 9, 2025 •

edited

Loading

RunningLeon Jul 8, 2025 •

edited

Loading