-
Notifications
You must be signed in to change notification settings - Fork 3.4k
SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics #13152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics #13152
Conversation
Summary of ChangesHello @sufeng-buaa, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly overhauls the tracing and metrics collection system to enhance flexibility, reduce overhead, and broaden its applicability. By transitioning to a class-based Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant refactoring of the tracing system in sglang. It replaces the global tracing functions with a class-based design centered around RequestTimeRecorder, unifying tracing with request stage metrics. It also adds --trace-level and --trace-module for more granular control over tracing, replacing the old --enable-trace flag. The changes are extensive, touching documentation, server arguments, and core scheduler logic. My review focused on ensuring the new API is used consistently, the documentation is accurate, and the refactoring is sound. I've identified a few issues in the documentation that need correction and a critical typo in a function name that would lead to a runtime error. Overall, this is a solid enhancement to the project's observability features.
c12a958 to
e75d844
Compare
|
All feedback from Bot Assist has been addressed. |
python/sglang/srt/tracing/trace.py
Outdated
|
|
||
|
|
||
| @dataclass | ||
| class SglangTraceEvent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class SglangTraceEvent: | |
| class SGLangTraceEvent: |
nit: we should probably use the correct uppercase and lowercase of SGLang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed all 'Sglang***' to 'SGLang***'
Signed-off-by: Feng Su <[email protected]>
Signed-off-by: Feng Su <[email protected]>
| stage_context.metric_trace_slice_end(RequestStage.TOKENIZER) | ||
| ``` | ||
| - In trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be?
| - In trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed. | |
| - In metric_trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will recorrect it
| parser.add_argument( | ||
| "--trace-module", | ||
| type=str, | ||
| default=ServerArgs.trace_module, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the optional items for this argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tracing package is not only used for tracking requests—for example, we are currently exploring its use in monitoring hierarchical caches. Therefore, we use the --trace-module parameter to enable tracing for specific modules. The default set is "request".
| help="Enable opentelemetry trace", | ||
| "--trace-level", | ||
| type=int, | ||
| default=ServerArgs.trace_level, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about listing the choices and describing the meanings like --log-requests-level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, good suggestion.
| # Production Request Tracing | ||
|
|
||
| SGlang exports request trace data based on the OpenTelemetry Collector. You can enable tracing by adding the `--enable-trace` and configure the OpenTelemetry Collector endpoint using `--otlp-traces-endpoint` when launching the server. | ||
| SGLang exports request trace data based on the OpenTelemetry Collector. You can enable tracing by adding the `--trace-level` and configure the OpenTelemetry Collector endpoint using `--otlp-traces-endpoint` when launching the server. The `--trace-level` option accepts configurable values from `1` to `3`, with higher numbers indicating more detailed tracing. Additionally, you can use `--trace-module` to specify the module to trace; currently, only `request` is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: since you removed --enable-trace, should we inform the user --trace-level 0 is equal to setting tracing option to False here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I have updated the doc.
| class SGLangStageContext(SGLangTraceReqContext): | ||
| def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am not sure about the naming here. Do we really need the SGLang prefix when developing a feature in the python/sglang dir? Seems unnecessary to me.
Maybe something like InferenceStageContext or TracingStageContext, or another more accurate option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emmm, As for the naming, I'm not entirely sure. This class encapsulates both tracing and metrics, so using "Tracing" as a prefix feels incomplete. My current naming is indeed not ideal. Let me think about it for a moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about StageObserveContext or TraceMetricContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TraceMetricContext sounds better
| metric_trace_slice = metric_trace_slice_end | ||
|
|
||
|
|
||
| class NoOpStageContext: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean "No operation"? NullStageContext sounds more accurate to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it sounds good.
| name: str, | ||
| reqs: List, | ||
| ts: Optional[int] = None, | ||
| attrs: Dict[str, Any] = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Optional[Dict[str, Any]] = None as well?
ShangmingCai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: python/sglang/srt/tracing/trace_metric_warpper.py -> python/sglang/srt/tracing/trace_metric_wrapper.py



Motivation
The PR is response to #10916. For details on the motivation and visual output, please refer to the issue.
To reduce span overhead, we added the trace-level feature. To support broader use cases beyond request tracing, we introduced trace-module.
Modifications
I thought about unifying TimeStat too, but it would require too many changes, so I gave up on that. May I will push a draft patch later.
Instrumentation Overhead Evaluation
The overhead of each instrumentation point remains almost unchanged compared to before. See #9962 and #10804
Checklist