Skip to content

Conversation

@sufeng-buaa
Copy link
Contributor

@sufeng-buaa sufeng-buaa commented Nov 12, 2025

Motivation

The PR is response to #10916. For details on the motivation and visual output, please refer to the issue.

To reduce span overhead, we added the trace-level feature. To support broader use cases beyond request tracing, we introduced trace-module.

Modifications

  1. Refactored tracing package from global-state functions to a class-based design with instance storage. This facilitates integration with request stage metrics and provides a hook for future dynamic instrumentation.
  2. Implemented a wrapper class "SglangStageContext" that internally aggregates trace context and metric collector uniformly collect timestamps and route to different export paths based on configuration.
  3. Added trace level mechanism to assign levels to each RequestStage, helping reduce excessive trace data in production environments.
  4. Added trace module mechanism to extend the trace package beyond request tracing, enabling its use in other modules such as hicache.

I thought about unifying TimeStat too, but it would require too many changes, so I gave up on that. May I will push a draft patch later.

Instrumentation Overhead Evaluation

The overhead of each instrumentation point remains almost unchanged compared to before. See #9962 and #10804

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sufeng-buaa, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the tracing and metrics collection system to enhance flexibility, reduce overhead, and broaden its applicability. By transitioning to a class-based RequestTimeRecorder and introducing granular control via trace levels and modules, the system can now more efficiently capture performance data and trace execution flows across various components, including cross-process and cross-node interactions. This foundational change aims to provide richer observability while allowing users to tailor the verbosity of tracing to their specific needs.

Highlights

  • Tracing System Refactor: The tracing package has been refactored from global-state functions to a class-based design, specifically introducing the RequestTimeRecorder class. This change centralizes trace context and metric collection, allowing for more flexible integration and future dynamic instrumentation.
  • Unified Metrics and Tracing: A new RequestTimeRecorder wrapper class is implemented to uniformly aggregate trace context and metric collection. This class routes timestamps to different export paths based on configuration, simplifying how both tracing and request-stage metrics are handled.
  • Trace Level Mechanism: A trace level mechanism has been added, allowing users to assign different levels (1 to 3) to each RequestStage. This enables more granular control over the amount of trace data collected, helping to reduce overhead in production environments by only capturing necessary details.
  • Trace Module Mechanism: A trace module mechanism is introduced to extend the tracing package's applicability beyond just request tracing. This allows other modules, such as hicache, to leverage the tracing framework, making it more versatile.
  • Command-Line Argument Changes: The --enable-trace command-line argument has been replaced with --trace-level (an integer from 0-3) and a new --trace-module argument to specify which module to trace (e.g., 'request'). This provides more precise control over tracing activation and scope.
  • Documentation Updates: The documentation for production request tracing (docs/references/production_request_trace.md) has been updated to reflect the new --trace-level and --trace-module options, as well as the revised API for marking request stages and propagating trace contexts using the RequestTimeRecorder.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the tracing system in sglang. It replaces the global tracing functions with a class-based design centered around RequestTimeRecorder, unifying tracing with request stage metrics. It also adds --trace-level and --trace-module for more granular control over tracing, replacing the old --enable-trace flag. The changes are extensive, touching documentation, server arguments, and core scheduler logic. My review focused on ensuring the new API is used consistently, the documentation is accurate, and the refactoring is sound. I've identified a few issues in the documentation that need correction and a critical typo in a function name that would lead to a runtime error. Overall, this is a solid enhancement to the project's observability features.

@sufeng-buaa sufeng-buaa force-pushed the sufeng-buaa/unify-trace-metric branch from c12a958 to e75d844 Compare November 13, 2025 04:34
@sufeng-buaa
Copy link
Contributor Author

All feedback from Bot Assist has been addressed.

@zhanghaotong
Copy link
Contributor

Hi~ I'm running your code with the following command:

python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001

However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below:
image
And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

@sufeng-buaa
Copy link
Contributor Author

Hi~ I'm running your code with the following command:

python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001

However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below: image And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

I did forget to verify the case where OpenTelemetry is not installed but tracing is enabled. I'll fix it as soon as possible.



@dataclass
class SglangTraceEvent:
Copy link
Collaborator

@ShangmingCai ShangmingCai Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class SglangTraceEvent:
class SGLangTraceEvent:

nit: we should probably use the correct uppercase and lowercase of SGLang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed all 'Sglang***' to 'SGLang***'

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Nov 13, 2025
@sufeng-buaa
Copy link
Contributor Author

Hi~ I'm running your code with the following command:

python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001

However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below: image And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

Fixed

stage_context.metric_trace_slice_end(RequestStage.TOKENIZER)
```
- In trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be?

Suggested change
- In trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.
- In metric_trace_slice_end, use auto_next_anon to automatically create the next anonymous slice, which can reduce the number of instrumentation points needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will recorrect it

parser.add_argument(
"--trace-module",
type=str,
default=ServerArgs.trace_module,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the optional items for this argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracing package is not only used for tracking requests—for example, we are currently exploring its use in monitoring hierarchical caches. Therefore, we use the --trace-module parameter to enable tracing for specific modules. The default set is "request".

help="Enable opentelemetry trace",
"--trace-level",
type=int,
default=ServerArgs.trace_level,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about listing the choices and describing the meanings like --log-requests-level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, good suggestion.

@ShangmingCai ShangmingCai changed the title Sglang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics Nov 17, 2025
# Production Request Tracing

SGlang exports request trace data based on the OpenTelemetry Collector. You can enable tracing by adding the `--enable-trace` and configure the OpenTelemetry Collector endpoint using `--otlp-traces-endpoint` when launching the server.
SGLang exports request trace data based on the OpenTelemetry Collector. You can enable tracing by adding the `--trace-level` and configure the OpenTelemetry Collector endpoint using `--otlp-traces-endpoint` when launching the server. The `--trace-level` option accepts configurable values from `1` to `3`, with higher numbers indicating more detailed tracing. Additionally, you can use `--trace-module` to specify the module to trace; currently, only `request` is supported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: since you removed --enable-trace, should we inform the user --trace-level 0 is equal to setting tracing option to False here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I have updated the doc.

Comment on lines +142 to +143
class SGLangStageContext(SGLangTraceReqContext):
def __init__(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure about the naming here. Do we really need the SGLang prefix when developing a feature in the python/sglang dir? Seems unnecessary to me.

Maybe something like InferenceStageContext or TracingStageContext, or another more accurate option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmm, As for the naming, I'm not entirely sure. This class encapsulates both tracing and metrics, so using "Tracing" as a prefix feels incomplete. My current naming is indeed not ideal. Let me think about it for a moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about StageObserveContext or TraceMetricContext?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TraceMetricContext sounds better

metric_trace_slice = metric_trace_slice_end


class NoOpStageContext:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean "No operation"? NullStageContext sounds more accurate to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it sounds good.

name: str,
reqs: List,
ts: Optional[int] = None,
attrs: Dict[str, Any] = {},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Optional[Dict[str, Any]] = None as well?

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: python/sglang/srt/tracing/trace_metric_warpper.py -> python/sglang/srt/tracing/trace_metric_wrapper.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants