SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics #13152

sufeng-buaa · 2025-11-12T12:09:04Z

Motivation

The PR is response to #10916. For details on the motivation and visual output, please refer to the issue.

To reduce span overhead, we added the trace-level feature. To support broader use cases beyond request tracing, we introduced trace-module.

Modifications

Refactored tracing package from global-state functions to a class-based design with instance storage. This facilitates integration with request stage metrics and provides a hook for future dynamic instrumentation.
Implemented a wrapper class "SglangStageContext" that internally aggregates trace context and metric collector uniformly collect timestamps and route to different export paths based on configuration.
Added trace level mechanism to assign levels to each RequestStage, helping reduce excessive trace data in production environments.
Added trace module mechanism to extend the trace package beyond request tracing, enabling its use in other modules such as hicache.

I thought about unifying TimeStat too, but it would require too many changes, so I gave up on that. May I will push a draft patch later.

Instrumentation Overhead Evaluation

The overhead of each instrumentation point remains almost unchanged compared to before. See #9962 and #10804

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-11-12T12:09:40Z

Summary of Changes

Hello @sufeng-buaa, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the tracing and metrics collection system to enhance flexibility, reduce overhead, and broaden its applicability. By transitioning to a class-based RequestTimeRecorder and introducing granular control via trace levels and modules, the system can now more efficiently capture performance data and trace execution flows across various components, including cross-process and cross-node interactions. This foundational change aims to provide richer observability while allowing users to tailor the verbosity of tracing to their specific needs.

Highlights

Tracing System Refactor: The tracing package has been refactored from global-state functions to a class-based design, specifically introducing the RequestTimeRecorder class. This change centralizes trace context and metric collection, allowing for more flexible integration and future dynamic instrumentation.
Unified Metrics and Tracing: A new RequestTimeRecorder wrapper class is implemented to uniformly aggregate trace context and metric collection. This class routes timestamps to different export paths based on configuration, simplifying how both tracing and request-stage metrics are handled.
Trace Level Mechanism: A trace level mechanism has been added, allowing users to assign different levels (1 to 3) to each RequestStage. This enables more granular control over the amount of trace data collected, helping to reduce overhead in production environments by only capturing necessary details.
Trace Module Mechanism: A trace module mechanism is introduced to extend the tracing package's applicability beyond just request tracing. This allows other modules, such as hicache, to leverage the tracing framework, making it more versatile.
Command-Line Argument Changes: The --enable-trace command-line argument has been replaced with --trace-level (an integer from 0-3) and a new --trace-module argument to specify which module to trace (e.g., 'request'). This provides more precise control over tracing activation and scope.
Documentation Updates: The documentation for production request tracing (docs/references/production_request_trace.md) has been updated to reflect the new --trace-level and --trace-module options, as well as the revised API for marking request stages and propagating trace contexts using the RequestTimeRecorder.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant refactoring of the tracing system in sglang. It replaces the global tracing functions with a class-based design centered around RequestTimeRecorder, unifying tracing with request stage metrics. It also adds --trace-level and --trace-module for more granular control over tracing, replacing the old --enable-trace flag. The changes are extensive, touching documentation, server arguments, and core scheduler logic. My review focused on ensuring the new API is used consistently, the documentation is accurate, and the refactoring is sound. I've identified a few issues in the documentation that need correction and a critical typo in a function name that would lead to a runtime error. Overall, this is a solid enhancement to the project's observability features.

python/sglang/srt/managers/tokenizer_manager.py

python/sglang/srt/tracing/req_time_recorder.py

docs/references/production_request_trace.md

python/sglang/srt/managers/scheduler.py

…and trace-module

sufeng-buaa · 2025-11-13T04:37:35Z

All feedback from Bot Assist has been addressed.

zhanghaotong · 2025-11-13T08:42:44Z

Hi~ I'm running your code with the following command:

python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001

However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below:

And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

sufeng-buaa · 2025-11-13T08:46:22Z

Hi~ I'm running your code with the following command:
python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001
However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below: And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

I did forget to verify the case where OpenTelemetry is not installed but tracing is enabled. I'll fix it as soon as possible.

ShangmingCai · 2025-11-13T08:53:37Z

python/sglang/srt/tracing/trace.py



+@dataclass
+class SglangTraceEvent:


Suggested change

class SglangTraceEvent:

class SGLangTraceEvent:

nit: we should probably use the correct uppercase and lowercase of SGLang.

I have renamed all 'Sglang***' to 'SGLang***'

Signed-off-by: Feng Su <[email protected]>

sufeng-buaa · 2025-11-13T10:05:55Z

Hi~ I'm running your code with the following command:
python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001
However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below: And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

Fixed

acelyc111 · 2025-11-16T14:21:29Z

docs/references/production_request_trace.md

+        stage_context.metric_trace_slice_end(RequestStage.TOKENIZER)
        ```

    - In trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.


Should it be?

Suggested change

- In trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.

- In metric_trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.

ok, I will recorrect it

acelyc111 · 2025-11-16T14:34:08Z

python/sglang/srt/server_args.py

+        parser.add_argument(
+            "--trace-module",
+            type=str,
+            default=ServerArgs.trace_module,


What are the optional items for this argument?

The tracing package is not only used for tracking requests—for example, we are currently exploring its use in monitoring hierarchical caches. Therefore, we use the --trace-module parameter to enable tracing for specific modules. The default set is "request".

acelyc111 · 2025-11-16T14:36:40Z

python/sglang/srt/server_args.py

-            help="Enable opentelemetry trace",
+            "--trace-level",
+            type=int,
+            default=ServerArgs.trace_level,


How about listing the choices and describing the meanings like --log-requests-level?

ok, good suggestion.

ShangmingCai · 2025-11-17T09:16:40Z

docs/references/production_request_trace.md

 # Production Request Tracing

-SGlang exports request trace data based on the OpenTelemetry Collector. You can enable tracing by adding the `--enable-trace` and configure the OpenTelemetry Collector endpoint using `--otlp-traces-endpoint` when launching the server.
+SGLang exports request trace data based on the OpenTelemetry Collector. You can enable tracing by adding the `--trace-level` and configure the OpenTelemetry Collector endpoint using `--otlp-traces-endpoint` when launching the server. The `--trace-level` option accepts configurable values from `1` to `3`, with higher numbers indicating more detailed tracing. Additionally, you can use `--trace-module` to specify the module to trace; currently, only `request` is supported.


QQ: since you removed --enable-trace, should we inform the user --trace-level 0 is equal to setting tracing option to False here?

ok, I have updated the doc.

ShangmingCai · 2025-11-17T09:30:53Z

python/sglang/srt/tracing/trace_metric_warpper.py

+class SGLangStageContext(SGLangTraceReqContext):
+    def __init__(


Actually, I am not sure about the naming here. Do we really need the SGLang prefix when developing a feature in the python/sglang dir? Seems unnecessary to me.

Maybe something like InferenceStageContext or TracingStageContext, or another more accurate option?

emmm, As for the naming, I'm not entirely sure. This class encapsulates both tracing and metrics, so using "Tracing" as a prefix feels incomplete. My current naming is indeed not ideal. Let me think about it for a moment.

How about StageObserveContext or TraceMetricContext?

TraceMetricContext sounds better

ShangmingCai · 2025-11-17T09:40:19Z

python/sglang/srt/tracing/trace_metric_warpper.py

+    metric_trace_slice = metric_trace_slice_end
+
+
+class NoOpStageContext:


Does this mean "No operation"? NullStageContext sounds more accurate to me.

ok， it sounds good.

ShangmingCai · 2025-11-17T09:56:03Z

python/sglang/srt/tracing/trace_metric_warpper.py

+    name: str,
+    reqs: List,
+    ts: Optional[int] = None,
+    attrs: Dict[str, Any] = {},


Should this be Optional[Dict[str, Any]] = None as well?

ShangmingCai

typo: python/sglang/srt/tracing/trace_metric_warpper.py -> python/sglang/srt/tracing/trace_metric_wrapper.py

sufeng-buaa requested review from ByronHsu, CatherineSue, JustinTong0323, Ying1123, hnyls2002, ispobock, merrymercy, slin1237, xiezhq-hermann and zhyncs as code owners November 12, 2025 12:09

github-actions bot added the documentation Improvements or additions to documentation label Nov 12, 2025

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

sufeng-buaa added 4 commits November 13, 2025 12:30

Sglang Tracing: Unify tracing and req stage metrics, add trace-level …

ae001a6

…and trace-module

Sglang Tracing: update doc

8395446

Sglang Tracing: update test cases

5dac5da

fix lint

e75d844

sufeng-buaa force-pushed the sufeng-buaa/unify-trace-metric branch from c12a958 to e75d844 Compare November 13, 2025 04:34

stmatengss added the run-ci label Nov 13, 2025

rename 'NoOpTimeRecorder' to 'NoOpStageContext'

71f9d2a

ShangmingCai reviewed Nov 13, 2025

View reviewed changes

sufeng-buaa added 3 commits November 13, 2025 17:44

sglang tracing: fix crash when enable tracing but not install otlp

a847eba

sglang tracing: rename 'Sglang***' to 'SGLang***'

cd6c8d8

Signed-off-by: Feng Su <[email protected]>

sglang tracing: fix dependence install

a1c82a8

Signed-off-by: Feng Su <[email protected]>

sufeng-buaa requested a review from Fridge003 as a code owner November 13, 2025 10:04

github-actions bot added the dependencies Pull requests that update a dependency file label Nov 13, 2025

hnyls2002 assigned hnyls2002, ShangmingCai and sufeng-buaa Nov 13, 2025

acelyc111 reviewed Nov 16, 2025

View reviewed changes

sufeng-buaa and others added 2 commits November 17, 2025 11:01

trace: add more explanations

3ef8db2

fix tracing doc

6d3735a

ShangmingCai changed the title ~~Sglang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics~~ SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics Nov 17, 2025

ShangmingCai added 2 commits November 17, 2025 11:17

lint

2ed4718

Merge branch 'main' into sufeng-buaa/unify-trace-metric

9b13760

ShangmingCai reviewed Nov 17, 2025

View reviewed changes

trace: update doc

c9efa58

ShangmingCai reviewed Nov 17, 2025

View reviewed changes

	- In trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.
	- In metric_trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.

		class SGLangStageContext(SGLangTraceReqContext):
		def __init__(

		metric_trace_slice = metric_trace_slice_end


		class NoOpStageContext:

SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics #13152

Are you sure you want to change the base?

SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics #13152

Uh oh!

Conversation

sufeng-buaa commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Instrumentation Overhead Evaluation

Checklist

Uh oh!

gemini-code-assist bot commented Nov 12, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sufeng-buaa commented Nov 13, 2025

Uh oh!

zhanghaotong commented Nov 13, 2025

Uh oh!

sufeng-buaa commented Nov 13, 2025

Uh oh!

ShangmingCai Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sufeng-buaa commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

sufeng-buaa commented Nov 12, 2025 •

edited

Loading

ShangmingCai Nov 13, 2025 •

edited

Loading