Skip to content

Conversation

@nv-guomingz
Copy link
Collaborator

@nv-guomingz nv-guomingz commented Jul 17, 2025

It's a mirror PR of #5707 since @Superjomn is on vacation.

Summary by CodeRabbit

  • New Features
    • API parameters and methods are now visually tagged as "beta", "prototype", or "deprecated" in documentation and code, providing clearer stability indicators.
    • Documentation includes custom styles and tags to highlight feature maturity and deprecated elements.
    • Warnings are issued when unstable ("beta" or "prototype") features are used.
    • Added new API configuration options and custom roles to enhance documentation styling and clarity.
    • Introduced new API parameters for batching and log probability normalization.
  • Bug Fixes
    • Improved consistency in handling deprecated and unstable fields in documentation generation.
    • Added warnings when unsupported parameters are used with certain backends.
  • Tests
    • Enhanced tests to verify API status annotations and ensure alignment between code and reference specifications.
    • Added unit tests for API status registry functionality.
  • Documentation
    • Updated documentation to display API status tags and improved configuration for better clarity and navigation.

@nv-guomingz nv-guomingz requested review from a team as code owners July 17, 2025 04:54
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 17, 2025

Walkthrough

This update introduces structured API status tagging throughout the codebase and documentation. It adds status metadata (such as "beta", "prototype", "deprecated") to Pydantic model fields and API methods, updates Sphinx documentation to display these tags, and implements tests to ensure status consistency. Supporting utilities and CSS styles are included.

Changes

Files/Groups Change Summary
docs/source/_static/custom.css Added new CSS classes for tag and code element styling based on API status.
docs/source/conf.py Enhanced Sphinx config: added autodoc_pydantic options, custom tag role, included custom.css, and imported tag_llm_params().
docs/source/helper.py Added helper to get Pydantic methods; updated doc options for class docs to exclude these methods conditionally.
tensorrt_llm/llmapi/llm.py Applied @set_api_status("beta") decorator to several methods/properties; removed _LLM_REPR; updated docstring.
tensorrt_llm/llmapi/llm_args.py Introduced custom Field wrapper with status metadata; updated many field declarations with status; added new fields and warning method.
tensorrt_llm/llmapi/utils.py Added ApiParamTagger and ApiStatusRegistry classes for tagging API params and methods with status; updated docstring amendment; changed deprecated field detection.
tests/unittest/api_stability/api_stability_core.py Added optional status to snapshot dataclasses; implemented a test verifying API status consistency between code and reference YAML.
tests/unittest/api_stability/references/llm.yaml Annotated many parameters and methods with status fields; removed some parameters; marked deprecated fields.
tests/unittest/api_stability/references_committed/llm.yaml Added stream_interval parameter to the committed reference YAML.
tests/unittest/llmapi/test_utils.py Added unit test for the ApiStatusRegistry utility verifying status setting and retrieval on functions and methods.
tensorrt_llm/bench/benchmark/throughput.py Added warning and removal of unsupported batching_type parameter for PyTorch backend.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant Model as Pydantic Model
    participant Utils as ApiParamTagger/ApiStatusRegistry
    participant Docs as Sphinx

    Dev->>Model: Define field with Field(..., status="beta")
    Model->>Utils: Expose field metadata
    Utils->>Docs: Tag fields/methods with :tag:`beta`
    Docs->>Dev: Render documentation with status tags and custom CSS
Loading
sequenceDiagram
    participant User as User
    participant LLM as TorchLlmArgs
    participant Logger as Logger

    User->>LLM: Set field values (some with status="beta")
    LLM->>LLM: warn_on_unstable_feature_usage()
    alt Field is unstable
        LLM->>Logger: Log warning about unstable feature usage
    end
Loading

Estimated code review effort

3 (~45 minutes)

Possibly related PRs

Suggested reviewers

  • Superjomn
  • HuiGao-NV

Poem

In the docs, a tag appears—
"Beta", "prototype", warnings in your ears!
Fields and methods, status clear,
With CSS flair, the message is near.
Rabbits hop through YAML and code,
Ensuring your APIs are properly bestowed.
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch 2 times, most recently from 280e995 to 9b38c27 Compare July 17, 2025 04:55
@nv-guomingz
Copy link
Collaborator Author

/bot run

@nv-guomingz nv-guomingz changed the title User/guomingz/mirror 5707 [TRTLLM-5061] chore: add tags to API reference Jul 17, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #12155 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
tests/unittest/llmapi/test_utils.py (1)

4-24: Comprehensive test for ApiStatusRegistry functionality.

The test thoroughly validates the status registry for both standalone functions and class methods. Consider using unique function names instead of reusing _my_method to improve test clarity and avoid potential confusion.

 @ApiStatusRegistry.set_api_status("beta")
-def _my_method(self, *args, **kwargs):
+def _my_method_beta(self, *args, **kwargs):
     pass

-assert ApiStatusRegistry.get_api_status(_my_method) == "beta"
+assert ApiStatusRegistry.get_api_status(_my_method_beta) == "beta"

 @ApiStatusRegistry.set_api_status("prototype")
-def _my_method(self, *args, **kwargs):
+def _my_method_prototype(self, *args, **kwargs):
     pass

-assert ApiStatusRegistry.get_api_status(_my_method) == "prototype"
+assert ApiStatusRegistry.get_api_status(_my_method_prototype) == "prototype"
tensorrt_llm/llmapi/utils.py (1)

551-555: Fix docstring formatting

The docstring should follow PEP 257 conventions with proper summary and description separation.

 class ApiParamTagger:
-    ''' A helper to tag the api doc according to the status of the fields.
-    The status is set in the json_schema_extra of the field.
-    '''
+    '''Tag API documentation according to field status.
+    
+    The status is set in the json_schema_extra of the field.
+    '''
tensorrt_llm/llmapi/llm_args.py (1)

1856-1856: Fix line length violations.

Three lines exceed the 120-character limit. Please reformat these field definitions to comply with the style guidelines.

-        "Note that each CUDA graph can use up to 200 MB of extra memory.",
+        "Note that each CUDA graph can use up to 200 MB of extra memory.",
         status="beta")

-        default=False,
-        description="Disable the overlap scheduler.",
+        default=False,
+        description="Disable the overlap scheduler.",
         status="beta")

-        "If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc.",
+        "If true, will iterate over sampling_params of each request and use the "
+        "corresponding sampling strategy, e.g. top-k, top-p, etc.",

Also applies to: 1862-1862, 1873-1873

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e033c3 and 9b38c27.

📒 Files selected for processing (10)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/llmapi/llm.py (8 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/unittest/llmapi/test_utils.py (1)
tensorrt_llm/llmapi/utils.py (3)
  • ApiStatusRegistry (588-629)
  • set_api_status (605-618)
  • get_api_status (621-622)
tensorrt_llm/llmapi/utils.py (1)
tensorrt_llm/logger.py (1)
  • debug (143-144)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1856-1856: Line too long (138 > 120)

(E501)


1862-1862: Line too long (142 > 120)

(E501)


1873-1873: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (17)
docs/source/_static/custom.css (1)

1-26: Well-structured CSS for API status tags.

The CSS implementation is clean and follows good practices:

  • Semantic class names for different API status types
  • Good color contrast for accessibility
  • Consistent styling approach with appropriate padding and sizing
tests/unittest/api_stability/references_committed/llm.yaml (1)

93-95: Properly added stream_interval parameter.

The new stream_interval parameter follows the correct pattern with appropriate type annotation and default value.

docs/source/helper.py (2)

289-299: Clean utility function for extracting Pydantic methods.

The implementation correctly creates a dummy Pydantic model to extract inherited methods and properly excludes __init__ from the filtering.


316-327: Improved documentation generation with proper filtering.

The updates to autodoc options and selective exclusion of Pydantic methods for specific classes will result in cleaner, more focused API documentation.

docs/source/conf.py (5)

15-15: Proper import for custom role implementation.

The nodes import from docutils is necessary for implementing the custom tag role.


64-73: Well-configured autodoc_pydantic options.

The configuration options provide good control over documentation generation with proper member ordering and selective visibility of validators and signatures.


153-155: Proper CSS file inclusion.

The custom CSS file is correctly included to support the API status tag styling.


158-162: Standard implementation of custom Sphinx role.

The tag_role function follows Sphinx conventions for custom role implementation and correctly creates literal nodes with appropriate CSS classes.


168-171: Proper integration of API status tagging system.

The setup function correctly imports and calls tag_llm_params() and registers the custom tag role, integrating the API status system into the documentation build process.

tests/unittest/api_stability/api_stability_core.py (1)

407-407: LGTM!

Creating a deep copy of the reference before merging is the correct approach to preserve the non-committed API state for validation.

tensorrt_llm/llmapi/utils.py (1)

496-497: LGTM!

The change correctly updates the deprecation check to use the new unified status field approach.

tests/unittest/api_stability/references/llm.yaml (1)

1-194: LGTM!

The status annotations are consistently applied and align with the API stability tracking system implementation.

tensorrt_llm/llmapi/llm.py (2)

43-43: LGTM!

The @set_api_status("beta") decorators are correctly applied to mark the stability status of various API methods and properties.

Also applies to: 215-216, 425-426, 439-440, 453-454, 475-476, 672-673


1050-1050: LGTM!

The documentation correctly directs users to TorchLlmArgs for detailed argument information.

tensorrt_llm/llmapi/llm_args.py (3)

64-92: Well-designed custom Field wrapper implementation.

The custom Field wrapper cleanly extends Pydantic's Field to support API status metadata. The implementation correctly handles existing json_schema_extra and provides clear documentation for the status levels.


1017-1017: Consistent and appropriate status annotations applied.

The status annotations are consistently applied across fields and align well with the API stability framework. The categorization appears logical:

  • "beta" for stable but newer features
  • "prototype" for experimental features
  • "deprecated" for fields with deprecation warnings

Also applies to: 1023-1023, 1034-1036, 1039-1040, 1053-1055, 1057-1062, 1067-1069, 1096-1097, 1101-1102, 1106-1106, 1109-1110, 1113-1115, 1137-1140, 1146-1147, 1151-1152, 1156-1157, 1179-1179, 1837-1839, 1846-1847, 1856-1857, 1861-1862, 1865-1866, 1869-1870, 1875-1876, 1881-1882, 1885-1887, 1892-1893, 1896-1897, 1900-1900, 1904-1905, 1908-1910, 1922-1922, 1936-1936, 1943-1944


2132-2150: Effective implementation of unstable feature warnings.

The method properly identifies explicitly set fields with unstable status and provides clear warnings to users. The implementation correctly:

  • Uses exclude_unset=True to only check user-provided values
  • Extracts status from field metadata safely
  • Warns only for unstable statuses ("beta", "prototype")
  • Provides actionable warning messages

class ParamSnapshot:
annotation: type
default: Any = None
status: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add status field handling in serialization methods

The new status field should be included in the to_dict() method to ensure it's properly serialized when saving snapshots.

Apply this diff to fix the serialization:

 def to_dict(self):
     d = {f.name: getattr(self, f.name) for f in fields(self)}
     d['annotation'] = repr_annotation(d['annotation'])
     if d['default'] == inspect._empty:
         d['default'] = "inspect._empty"
+    if d['status'] is None:
+        del d['status']
     return d

Also applies to: 115-121

🤖 Prompt for AI Agents
In tests/unittest/api_stability/api_stability_core.py at lines 76 and 115-121,
the new 'status' field is declared but not included in the serialization
methods. Update the to_dict() method to add the 'status' field so it is properly
serialized when saving snapshots. Ensure that any deserialization or related
methods also handle the 'status' field accordingly.

class MethodSnapshot:
parameters: Dict[str, ParamSnapshot]
return_annotation: type
status: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add status field handling in serialization methods

The new status field should be included in the to_dict() method to ensure it's properly serialized when saving snapshots.

Apply this diff to fix the serialization:

 def to_dict(self):
     d = {f.name: getattr(self, f.name) for f in fields(self)}
     d['parameters'] = {
         name: param.to_dict()
         for name, param in d['parameters'].items()
     }
     d['return_annotation'] = repr_annotation(d['return_annotation'])
+    if d['status'] is None:
+        del d['status']
     return d

Also applies to: 170-177

🤖 Prompt for AI Agents
In tests/unittest/api_stability/api_stability_core.py at lines 133 and 170-177,
the new 'status' field is declared but not included in the serialization
methods. Update the to_dict() method to add the 'status' field so it is properly
serialized when saving snapshots. Ensure that any other serialization or
deserialization methods in these lines also handle the 'status' field
accordingly.

if method_name == "__init__":
for param_name, param_data in method_data.get('parameters',
{}).items():
print(f"param_name: {param_name}, param_data: {param_data}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove debug print statement

Debug print statements should not be committed to production test code.

-                    print(f"param_name: {param_name}, param_data: {param_data}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"param_name: {param_name}, param_data: {param_data}")
🤖 Prompt for AI Agents
In tests/unittest/api_stability/api_stability_core.py at line 529, remove the
debug print statement that outputs param_name and param_data, as debug prints
should not be present in committed test code.

def my_method(self, *args, **kwargs):
pass
'''
method_to_status = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Thread safety concern with mutable class variable

Using a mutable class variable method_to_status could cause race conditions in multi-threaded environments.

Consider using a thread-safe approach:

import threading

class ApiStatusRegistry:
    _lock = threading.Lock()
    method_to_status = {}
    
    @classmethod
    def set_api_status(cls, status: str):
        def decorator(func):
            with cls._lock:
                if func.__qualname__ in cls.method_to_status:
                    logger.debug(
                        f"Method {func.__qualname__} already has a status, skipping the decorator"
                    )
                    return func
                cls.method_to_status[func.__qualname__] = status
            func.__doc__ = cls.amend_api_doc_with_status_tags(func)
            return func
        return decorator
🤖 Prompt for AI Agents
In tensorrt_llm/llmapi/utils.py at line 605, the mutable class variable
method_to_status is not thread-safe and may cause race conditions. To fix this,
introduce a threading.Lock as a class-level lock and wrap all accesses and
modifications to method_to_status within this lock to ensure thread safety.
Update the decorator to acquire the lock before checking or setting
method_to_status entries and release it afterward.

Comment on lines +635 to +636
set_api_status = ApiStatusRegistry().set_api_status
get_api_status = ApiStatusRegistry().get_api_status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect singleton instantiation

The class uses classmethods but creates an instance. Either use instance methods or remove the instantiation.

-set_api_status = ApiStatusRegistry().set_api_status
-get_api_status = ApiStatusRegistry().get_api_status
+set_api_status = ApiStatusRegistry.set_api_status
+get_api_status = ApiStatusRegistry.get_api_status
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set_api_status = ApiStatusRegistry().set_api_status
get_api_status = ApiStatusRegistry().get_api_status
set_api_status = ApiStatusRegistry.set_api_status
get_api_status = ApiStatusRegistry.get_api_status
🤖 Prompt for AI Agents
In tensorrt_llm/llmapi/utils.py around lines 635 to 636, the code incorrectly
creates an instance of ApiStatusRegistry to call classmethods. Remove the
instantiation and call set_api_status and get_api_status directly on the
ApiStatusRegistry class without creating an object.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12155 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9027 completed with status: 'FAILURE'

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from 9b38c27 to fdc6c35 Compare July 17, 2025 14:47
@nv-guomingz
Copy link
Collaborator Author

/bot run --stage-list "A100X-PyTorch-1"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tensorrt_llm/llmapi/utils.py (2)

605-605: Thread safety concern with mutable class variable

Using a mutable class variable method_to_status could cause race conditions in multi-threaded environments.


635-636: Incorrect singleton instantiation

The class uses classmethods but creates an instance. Either use instance methods or remove the instantiation.

🧹 Nitpick comments (4)
docs/source/helper.py (1)

289-299: Consider caching the Pydantic methods list for better performance.

The function creates a dummy Pydantic BaseModel class every time it's called, which could be inefficient if called multiple times during documentation generation.

Consider caching the result:

+@cache
 def get_pydantic_methods() -> list[str]:
     from pydantic import BaseModel

     class Dummy(BaseModel):
         pass

     methods = set(
         [method for method in dir(Dummy) if not method.startswith('_')])
     methods.discard("__init__")
     return list(methods)

Alternatively, use a more direct approach:

+from functools import lru_cache
+
+@lru_cache(maxsize=1)
 def get_pydantic_methods() -> list[str]:
     from pydantic import BaseModel
-
-    class Dummy(BaseModel):
-        pass
-
-    methods = set(
-        [method for method in dir(Dummy) if not method.startswith('_')])
-    methods.discard("__init__")
-    return list(methods)
+    
+    # Get methods directly from BaseModel without creating a dummy class
+    methods = set(method for method in dir(BaseModel) if not method.startswith('_'))
+    methods.discard("__init__")
+    return list(methods)
tensorrt_llm/llmapi/utils.py (2)

551-589: The ApiParamTagger implementation looks solid.

The class correctly processes Pydantic model fields and adds status tags to their descriptions. The implementation properly rebuilds the model after modifying field descriptions.

Consider adding some validation for the status values:

 def amend_pydantic_field_description_with_tags(self, cls: Type[BaseModel],
                                                field_names: list[str],
                                                tag: str) -> None:
+    # Validate status tag
+    valid_statuses = {"beta", "prototype", "deprecated", "stable"}
+    if tag not in valid_statuses:
+        logger.warning(f"Unknown status tag: {tag}. Valid values: {valid_statuses}")
+        
     assert field_names
     for field_name in field_names:

591-633: Address static analysis hints for docstring formatting.

The static analysis tools correctly identified docstring formatting issues.

Apply these fixes for better docstring formatting:

 class ApiParamTagger:
-    ''' A helper to tag the api doc according to the status of the fields.
+    """A helper to tag the api doc according to the status of the fields.
+    
     The status is set in the json_schema_extra of the field.
-    '''
+    """

     def __call__(self, cls: Type[BaseModel]) -> None:
-        self.process_pydantic_model(cls)
+        """Process the Pydantic model to add tags to the fields."""
+        self.process_pydantic_model(cls)

     def process_pydantic_model(self, cls: Type[BaseModel]) -> None:
-        """Process the Pydantic model to add tags to the fields.
-        """
+        """Process the Pydantic model to add tags to the fields."""

     def amend_pydantic_field_description_with_tags(self, cls: Type[BaseModel],
                                                    field_names: list[str],
                                                    tag: str) -> None:
-        """Amend the description of the fields with tags.
+        """Amend the description of the fields with tags.
+        
         e.g. :tag:`beta` or :tag:`prototype`
+        
         Args:
             cls: The Pydantic BaseModel class.
             field_names: The names of the fields to amend.
             tag: The tag to add to the fields.
         """
tensorrt_llm/llmapi/llm_args.py (1)

1864-1881: Fix line length violations in field descriptions.

Several field descriptions exceed the 120 character limit. Consider breaking these into multiple lines for better readability and code style consistency.

For example, line 1864 could be reformatted as:

         description="CUDA graph config.If true, use CUDA graphs for decoding. \
         CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, \
         and are enabled for batches that consist of decoding requests *only* \
         (the reason is that it's hard to capture a single graph with prefill requests \
         since the input shapes are a function of the sequence lengths).\
-         Note that each CUDA graph can use up to 200 MB of extra memory.",
+         Note that each CUDA graph can use up to 200 MB of extra memory.",

Similar adjustments can be made to lines 1870 and 1881.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b38c27 and fdc6c35.

📒 Files selected for processing (10)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/llmapi/llm.py (8 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tensorrt_llm/llmapi/llm.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • docs/source/_static/custom.css
  • tests/unittest/api_stability/references_committed/llm.yaml
  • tests/unittest/llmapi/test_utils.py
  • tests/unittest/api_stability/api_stability_core.py
  • tests/unittest/api_stability/references/llm.yaml
  • docs/source/conf.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1864-1864: Line too long (138 > 120)

(E501)


1870-1870: Line too long (142 > 120)

(E501)


1881-1881: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
docs/source/helper.py (1)

316-327: The autodoc options refactoring looks good.

The changes properly standardize the autodoc options and add filtering for Pydantic methods on specific classes. This should improve the documentation clarity by hiding irrelevant inherited methods.

tensorrt_llm/llmapi/utils.py (1)

496-496: LGTM! Field name change aligns with the new status system.

The change from checking "deprecated" to "status" == "deprecated" is consistent with the new unified status tagging system introduced in this PR.

tensorrt_llm/llmapi/llm_args.py (6)

64-92: Well-designed custom Field wrapper for API status tagging.

The implementation correctly extends Pydantic's Field with status metadata support. The function properly handles existing json_schema_extra values and provides clear documentation for the status levels.


1025-1187: Comprehensive API status tagging enhances user guidance.

The status annotations on BaseLlmArgs fields provide clear indicators of API stability:

  • beta status for features like gpus_per_node and enable_attention_dp
  • prototype status for experimental features like cp_config and various stats options
  • deprecated status for fields being phased out like max_lora_rank and backend

This systematic approach helps users make informed decisions about which features to use in production.


1731-1735: New fields properly implemented.

The addition of batching_type and normalize_log_probs fields to TrtLlmArgs follows the established patterns with appropriate type annotations, default values, and descriptions.


1846-1952: Appropriate status annotations for PyTorch backend features.

The status annotations on TorchLlmArgs fields properly reflect the experimental nature of many PyTorch backend features:

  • beta status for features like cuda_graph_config and moe_config
  • prototype status for experimental features like torch_compile_config
  • deprecated status for build_config being phased out

This helps users understand the maturity level of PyTorch backend features.


2140-2158: Well-implemented warning system for unstable features.

The warn_on_unstable_feature_usage method provides valuable user feedback:

  • Only warns about explicitly set fields using exclude_unset=True
  • Properly checks for status metadata in json_schema_extra
  • Provides clear warning messages for beta and prototype features
  • Helps users make informed decisions about feature stability

64-2158: Excellent implementation of API stability tagging system.

This comprehensive update introduces a well-designed API stability framework that:

  1. Extends Pydantic seamlessly - The custom Field wrapper integrates naturally with existing code
  2. Provides clear stability indicators - Status levels (prototype, beta, deprecated) are well-defined
  3. Offers user guidance - The warning system helps users understand feature maturity
  4. Maintains consistency - Status annotations are applied systematically across all argument classes

The implementation demonstrates thoughtful design that enhances the user experience while maintaining backward compatibility.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12212 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12212 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9068 (Partly Tested) completed with status: 'FAILURE'

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from fdc6c35 to 9d0ea69 Compare July 17, 2025 16:33
@nv-guomingz
Copy link
Collaborator Author

/bot run --disable-fail-fast

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
tests/unittest/api_stability/api_stability_core.py (3)

76-76: Status field serialization needs to be handled.

The new status field should be included in the to_dict() method to ensure it's properly serialized when saving snapshots.


133-133: Status field serialization needs to be handled.

The new status field should be included in the to_dict() method to ensure it's properly serialized when saving snapshots.


529-529: Remove debug print statement.

Debug print statements should not be committed to production test code.

tensorrt_llm/llmapi/utils.py (2)

605-605: Thread safety concern with mutable class variable.

Using a mutable class variable method_to_status could cause race conditions in multi-threaded environments. Consider using a thread-safe approach with locks.


635-636: Incorrect singleton instantiation.

The code incorrectly creates an instance of ApiStatusRegistry to call classmethods. The methods should be called directly on the class.

🧹 Nitpick comments (2)
tensorrt_llm/llmapi/utils.py (1)

551-588: Well-designed API parameter tagging functionality.

The ApiParamTagger class provides a clean way to add status tags to Pydantic model fields. The implementation correctly:

  • Processes model fields with status metadata
  • Amends field descriptions with appropriate tags
  • Rebuilds the model schema

Consider minor docstring formatting improvements as suggested by static analysis:

-    ''' A helper to tag the api doc according to the status of the fields.
-    The status is set in the json_schema_extra of the field.
-    '''
+    """A helper to tag the api doc according to the status of the fields.
+    
+    The status is set in the json_schema_extra of the field.
+    """
tensorrt_llm/llmapi/llm_args.py (1)

1864-1864: Address line length violations for code style consistency.

The static analysis tool identified three lines that exceed the 120-character limit. Consider reformatting these field descriptions to comply with the length limit.

For line 1864, consider:

-        "Note that each CUDA graph can use up to 200 MB of extra memory.",
+        "Note that each CUDA graph can use up to 200 MB of extra memory.",

For line 1870, break the description:

-        description="Disable the overlap scheduler.",
+        description="Disable the overlap scheduler.",

For line 1881, break the description:

-        "If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc.",
+        "If true, will iterate over sampling_params of each request and use the "
+        "corresponding sampling strategy, e.g. top-k, top-p, etc.",

Also applies to: 1870-1870, 1881-1881

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdc6c35 and 9d0ea69.

📒 Files selected for processing (10)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/llmapi/llm.py (8 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • docs/source/_static/custom.css
  • tests/unittest/api_stability/references_committed/llm.yaml
  • tests/unittest/llmapi/test_utils.py
  • docs/source/helper.py
  • docs/source/conf.py
  • tests/unittest/api_stability/references/llm.yaml
  • tensorrt_llm/llmapi/llm.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tensorrt_llm/llmapi/utils.py (1)
tensorrt_llm/logger.py (1)
  • debug (143-144)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1864-1864: Line too long (138 > 120)

(E501)


1870-1870: Line too long (142 > 120)

(E501)


1881-1881: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (14)
tests/unittest/api_stability/api_stability_core.py (3)

7-7: Import added for debugging - remove if debug prints are cleaned up.

The pprint import appears to be used for debugging purposes. If debug print statements are removed from the code, this import should also be removed.


407-407: Good addition for isolating non-committed API data.

Creating a deep copy of the reference ensures proper isolation when working with non-committed API data for status validation.


452-532: Well-structured API status validation test.

The test method effectively validates API status consistency between live code and reference YAML. The logic properly handles:

  • Filtering for llm.yaml reference file only
  • Checking non-committed APIs
  • Validating both method and parameter statuses
  • Providing clear error messages for mismatches

The implementation is comprehensive and serves its purpose well.

tensorrt_llm/llmapi/utils.py (3)

496-497: Correct update to status field handling.

The change from checking a boolean deprecated field to checking status == "deprecated" is consistent with the new unified status field approach.


591-633: Good API status management functionality.

The ApiStatusRegistry class provides a clean interface for managing API method statuses with:

  • Decorator-based status assignment
  • Automatic docstring amendment
  • Clear retrieval methods

The functionality is well-designed for its intended purpose.


586-588: Clean utility function for parameter tagging.

The tag_llm_params() function provides a simple and clean interface for applying status tagging to LLM parameters.

tensorrt_llm/llmapi/llm_args.py (8)

64-92: Well-implemented custom Field wrapper with clear status documentation.

The custom Field function correctly extends Pydantic's Field to add status metadata to the JSON schema. The implementation properly handles existing json_schema_extra values and provides clear documentation of the status levels.


1025-1077: Consistent and appropriate status assignments for existing fields.

The status annotations properly reflect the maturity levels of the fields:

  • Beta features are correctly marked for features that are recommended but not fully stable
  • Prototype features are marked for experimental functionality
  • Deprecated fields include both status annotation and deprecation messages

1104-1124: Appropriate prototype status assignments for experimental features.

The prototype status is correctly applied to experimental features like iter stats, PEFT cache, scheduler config, and cache transceiver config, which aligns with their experimental nature.


1187-1187: Correct deprecation status for backend field.

The deprecated status is appropriate for the backend field, which is already excluded from JSON schema and appears to be legacy functionality.


1731-1735: Well-implemented new fields with appropriate types and defaults.

The new batching_type and normalize_log_probs fields are correctly typed with sensible defaults and clear descriptions. The absence of status annotations indicates they're considered stable features.


1845-1952: Comprehensive and appropriate status assignments for PyTorch backend features.

The status annotations correctly reflect the maturity levels of PyTorch backend features:

  • Beta status for features that are recommended but still maturing
  • Prototype status for experimental features like autotuner and torch compile config
  • Deprecated status for build_config which is being phased out

2140-2158: Excellent runtime warning implementation for unstable features.

The warn_on_unstable_feature_usage method provides valuable user feedback by:

  • Only checking explicitly set fields (using exclude_unset=True)
  • Properly accessing status metadata from field info
  • Providing clear warnings for beta and prototype features
  • Helping users understand the stability implications of their configuration

64-2158: Comprehensive and well-executed API status tagging implementation.

This implementation effectively establishes a unified mechanism for API status annotation:

  • Custom Field wrapper seamlessly extends Pydantic functionality
  • Status levels are consistently applied across all field types
  • Runtime warnings help users understand stability implications
  • The approach integrates well with the broader API status tagging effort described in the AI summary

The implementation demonstrates good architectural design and attention to user experience.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12220 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12220 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9074 completed with status: 'FAILURE'

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from 9d0ea69 to 53103d7 Compare July 18, 2025 11:25
@nv-guomingz
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tests/unittest/api_stability/api_stability_core.py (1)

452-532: Remove debug print statement and consider thread safety

The test implementation is comprehensive and correctly validates API status consistency. However, there's a debug print statement that should be removed.

The debug print statement on line 529 should be removed as noted in previous reviews.

Additionally, consider the thread safety implications when accessing the ApiStatusRegistry.method_to_status dictionary, as noted in the utils.py review comments.

-                    print(f"param_name: {param_name}, param_data: {param_data}")
tensorrt_llm/llmapi/utils.py (1)

591-636: Fix thread safety and singleton instantiation issues

The ApiStatusRegistry implementation has two critical issues identified in previous reviews that need to be addressed:

  1. Thread safety concern with the mutable class variable
  2. Incorrect singleton instantiation pattern

These issues were previously identified but still need to be fixed:

+import threading
+
 class ApiStatusRegistry:
-    ''' A registry to store the status of the api.
+    """A registry to store the status of the api.

     usage:

     @ApiStatusRegistry.set_api_status("beta")
     def my_method(self, *args, **kwargs):
         pass

     class App:
         @ApiStatusRegistry.set_api_status("beta")
         def my_method(self, *args, **kwargs):
             pass
-    '''
+    """
+    _lock = threading.Lock()
     method_to_status = {}

     @classmethod
     def set_api_status(cls, status: str):
         def decorator(func):
+            with cls._lock:
                 # Use qualified name to support class methods
                 if func.__qualname__ in cls.method_to_status:
                     logger.debug(
                         f"Method {func.__qualname__} already has a status, skipping the decorator"
                     )
                     return func
                 cls.method_to_status[func.__qualname__] = status
             func.__doc__ = cls.amend_api_doc_with_status_tags(func)
             return func
         return decorator

     @classmethod
     def get_api_status(cls, method: Callable) -> Optional[str]:
+        with cls._lock:
             return cls.method_to_status.get(method.__qualname__, None)

-set_api_status = ApiStatusRegistry().set_api_status
-get_api_status = ApiStatusRegistry().get_api_status
+set_api_status = ApiStatusRegistry.set_api_status
+get_api_status = ApiStatusRegistry.get_api_status
🧹 Nitpick comments (3)
tests/unittest/api_stability/api_stability_core.py (1)

7-7: Unused import should be removed

The pprint import is not used anywhere in the code and should be removed to keep the imports clean.

-from pprint import pprint
tensorrt_llm/llmapi/utils.py (1)

551-588: ApiParamTagger implementation needs docstring formatting fixes

The ApiParamTagger class implementation is functionally correct and provides a clean way to add status tags to Pydantic model fields. However, there are some docstring formatting issues that should be addressed.

Apply these fixes for better docstring formatting:

 class ApiParamTagger:
-    ''' A helper to tag the api doc according to the status of the fields.
+    """A helper to tag the api doc according to the status of the fields.
+    
     The status is set in the json_schema_extra of the field.
-    '''
+    """

     def __call__(self, cls: Type[BaseModel]) -> None:
         self.process_pydantic_model(cls)

     def process_pydantic_model(self, cls: Type[BaseModel]) -> None:
         """Process the Pydantic model to add tags to the fields.
+        
         """
tensorrt_llm/llmapi/llm_args.py (1)

1864-1864: Fix line length violations.

Several lines exceed the 120-character limit. Consider breaking these long description strings for better readability.

For line 1864:

-        description="CUDA graph config.If true, use CUDA graphs for decoding. \
-        CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, \
-        and are enabled for batches that consist of decoding requests *only* \
-        (the reason is that it's hard to capture a single graph with prefill requests \
-        since the input shapes are a function of the sequence lengths).\
-         Note that each CUDA graph can use up to 200 MB of extra memory.",
+        description="CUDA graph config. If true, use CUDA graphs for decoding. "
+        "CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, "
+        "and are enabled for batches that consist of decoding requests *only* "
+        "(the reason is that it's hard to capture a single graph with prefill requests "
+        "since the input shapes are a function of the sequence lengths). "
+        "Note that each CUDA graph can use up to 200 MB of extra memory.",

Similar formatting should be applied to lines 1870 and 1881.

Also applies to: 1870-1870, 1881-1881

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d0ea69 and 53103d7.

📒 Files selected for processing (10)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/llmapi/llm.py (8 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tensorrt_llm/llmapi/llm.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • docs/source/_static/custom.css
  • tests/unittest/llmapi/test_utils.py
  • tests/unittest/api_stability/references_committed/llm.yaml
  • docs/source/helper.py
  • tests/unittest/api_stability/references/llm.yaml
  • docs/source/conf.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tensorrt_llm/llmapi/utils.py (1)
tensorrt_llm/logger.py (1)
  • debug (143-144)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1864-1864: Line too long (138 > 120)

(E501)


1870-1870: Line too long (142 > 120)

(E501)


1881-1881: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tests/unittest/api_stability/api_stability_core.py (3)

76-76: Status field addition looks good

The addition of the optional status field to track API status metadata is well-implemented and aligns with the API status tagging system.


133-133: Status field addition looks good

The addition of the optional status field to MethodSnapshot is consistent with the ParamSnapshot implementation and properly supports the API status tracking functionality.


407-407: Deep copy for non-committed reference is appropriate

Creating a deep copy of the reference as non_committed_reference is the correct approach to isolate non-committed API data from committed APIs, enabling proper status validation.

tensorrt_llm/llmapi/utils.py (1)

496-496: Deprecated field check update is correct

The change to check for deprecated fields using field_info.get("status", None) == "deprecated" is appropriate and consistent with the new status tagging system.

tensorrt_llm/llmapi/llm_args.py (4)

64-91: Well-implemented custom Field wrapper for API status tagging.

The custom Field function effectively extends Pydantic's Field to support status metadata. The implementation correctly handles both existing and new json_schema_extra dictionaries, and the type annotations properly constrain the status values to valid options.


1025-1025: Consistent and logical API status annotations applied.

The status annotations throughout the file are well-chosen and consistent:

  • Deprecated: Legacy fields being phased out (max_lora_rank, backend, build_config)
  • Beta: Stable newer features recommended for use (gpus_per_node, enable_attention_dp)
  • Prototype: Experimental features not recommended for production (cp_config, various stats configurations)

The annotations align well with the broader API stability framework and will help users understand feature maturity levels.

Also applies to: 1031-1031, 1043-1044, 1047-1048, 1061-1063, 1065-1070, 1075-1077, 1104-1105, 1109-1110, 1114-1114, 1117-1118, 1121-1123, 1146-1148, 1154-1155, 1159-1160, 1164-1165, 1187-1187, 1845-1847, 1854-1855, 1864-1865, 1868-1870, 1873-1874, 1877-1878, 1883-1884, 1889-1890, 1894-1895, 1900-1901, 1904-1905, 1908-1908, 1912-1913, 1916-1918, 1930-1930, 1944-1944, 1951-1952


1731-1735: New fields properly implemented.

The new batching_type and normalize_log_probs fields are well-implemented with appropriate types, defaults, and descriptions. Since they lack status annotations, they are correctly treated as stable features.


2140-2158: Effective implementation of unstable feature warning system.

The warn_on_unstable_feature_usage method properly identifies explicitly set fields and warns users about beta/prototype features. The implementation correctly uses model_dump(exclude_unset=True) to focus only on user-configured fields and provides clear warnings about feature stability.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12312 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12312 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9146 completed with status: 'FAILURE'

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch 2 times, most recently from b67933b to e03f790 Compare July 22, 2025 04:21
@nv-guomingz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12518 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
tests/unittest/api_stability/api_stability_core.py (3)

76-76: Add status field handling in serialization methods

The new status field should be included in the to_dict() method to ensure it's properly serialized when saving snapshots.


133-133: Add status field handling in serialization methods

The new status field should be included in the to_dict() method to ensure it's properly serialized when saving snapshots.


529-529: Remove debug print statement

Debug print statements should not be committed to production test code.

tensorrt_llm/llmapi/utils.py (2)

605-605: Thread safety concern with mutable class variable

Using a mutable class variable method_to_status could cause race conditions in multi-threaded environments.


635-636: Incorrect singleton instantiation

The class uses classmethods but creates an instance. Either use instance methods or remove the instantiation.

🧹 Nitpick comments (3)
tests/unittest/api_stability/api_stability_core.py (1)

7-7: Remove unused import

The pprint import doesn't appear to be used anywhere in the file. Consider removing it to keep the imports clean.

-from pprint import pprint
tensorrt_llm/llmapi/utils.py (1)

551-584: Well-implemented API parameter tagging functionality

The ApiParamTagger class effectively handles status tagging for Pydantic model fields. The logic correctly reads status from json_schema_extra metadata and amends field descriptions with appropriate tags.

Consider fixing the docstring formatting issues flagged by static analysis:

-    ''' A helper to tag the api doc according to the status of the fields.
-    The status is set in the json_schema_extra of the field.
-    '''
+    """A helper to tag the api doc according to the status of the fields.
+    
+    The status is set in the json_schema_extra of the field.
+    """
tensorrt_llm/llmapi/llm_args.py (1)

1868-1870: Good status annotations for PyTorch backend fields, but fix line length violations.

The status annotations properly categorize PyTorch-specific fields by their maturity levels. However, there are line length violations that need to be addressed.

Apply these fixes for the line length violations:

-        "Lower values trigger more frequent garbage collection.",
-        status="beta")
+        "Lower values trigger more frequent garbage collection.",
+        status="beta")

-        "CUDA graph config.If true, use CUDA graphs for decoding. \
-        CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, \
-        and are enabled for batches that consist of decoding requests *only* \
-        (the reason is that it's hard to capture a single graph with prefill requests \
-        since the input shapes are a function of the sequence lengths).\
-         Note that each CUDA graph can use up to 200 MB of extra memory.",
+        "CUDA graph config. If true, use CUDA graphs for decoding. "
+        "CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, "
+        "and are enabled for batches that consist of decoding requests *only* "
+        "(the reason is that it's hard to capture a single graph with prefill requests "
+        "since the input shapes are a function of the sequence lengths). "
+        "Note that each CUDA graph can use up to 200 MB of extra memory.",

-        "If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc.",
+        "If true, will iterate over sampling_params of each request and use the "
+        "corresponding sampling strategy, e.g. top-k, top-p, etc.",

-        "If true, will use the TRTLLM sampler instead of the PyTorch sampler. The TRTLLM sampler has a wide coverage of sampling strategies.",
+        "If true, will use the TRTLLM sampler instead of the PyTorch sampler. "
+        "The TRTLLM sampler has a wide coverage of sampling strategies.",

-        "If true, enables per request stats per iteration. Must also set enable_iter_perf_stats to true to get request stats.",
+        "If true, enables per request stats per iteration. "
+        "Must also set enable_iter_perf_stats to true to get request stats.",

Also applies to: 1877-1878, 1887-1888, 1891-1893, 1896-1897, 1900-1901, 1906-1907, 1912-1913, 1916-1918, 1923-1924, 1927-1928, 1931-1931, 1935-1936, 1939-1941, 1953-1953, 1967-1967, 1974-1975

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53103d7 and e03f790.

📒 Files selected for processing (10)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/llmapi/llm.py (9 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
🧬 Code Graph Analysis (2)
tensorrt_llm/llmapi/llm_args.py (1)
tensorrt_llm/builder.py (1)
  • default (50-58)
tensorrt_llm/llmapi/utils.py (3)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
  • LlmArgs (24-238)
tensorrt_llm/llmapi/llm_args.py (2)
  • decorator (508-522)
  • decorator (538-548)
tensorrt_llm/logger.py (1)
  • debug (143-144)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1887-1887: Line too long (138 > 120)

(E501)


1893-1893: Line too long (142 > 120)

(E501)


1904-1904: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

✅ Files skipped from review due to trivial changes (1)
  • tensorrt_llm/llmapi/llm.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/unittest/llmapi/test_utils.py
  • tests/unittest/api_stability/references_committed/llm.yaml
  • docs/source/helper.py
  • docs/source/_static/custom.css
  • docs/source/conf.py
  • tests/unittest/api_stability/references/llm.yaml
🧰 Additional context used
🧬 Code Graph Analysis (2)
tensorrt_llm/llmapi/llm_args.py (1)
tensorrt_llm/builder.py (1)
  • default (50-58)
tensorrt_llm/llmapi/utils.py (3)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
  • LlmArgs (24-238)
tensorrt_llm/llmapi/llm_args.py (2)
  • decorator (508-522)
  • decorator (538-548)
tensorrt_llm/logger.py (1)
  • debug (143-144)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1887-1887: Line too long (138 > 120)

(E501)


1893-1893: Line too long (142 > 120)

(E501)


1904-1904: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

🔇 Additional comments (8)
tests/unittest/api_stability/api_stability_core.py (2)

407-407: Good separation of committed and non-committed references

Creating a deep copy to isolate non-committed API data from committed APIs is a clean approach that supports the new status validation functionality effectively.


452-533: Well-implemented API status validation

The test_api_status method provides comprehensive validation of API status consistency between live code and reference files. The logic correctly:

  • Separates committed vs non-committed API validation
  • Validates method status annotations using the API registry
  • Checks parameter status consistency for __init__ methods
  • Provides clear error messages for missing or mismatched status tags

This is a solid implementation that ensures API status metadata remains synchronized across the codebase.

tensorrt_llm/llmapi/utils.py (2)

496-497: Good update to align with new status field approach

The condition change from checking a "deprecated" key to checking "status" == "deprecated" properly aligns with the new unified status tagging system.


586-588: Clean utility function for LLM parameter tagging

The tag_llm_params function provides a clean interface for applying status tags to the main LlmArgs model. Good encapsulation.

tensorrt_llm/llmapi/llm_args.py (4)

64-92: Well-implemented custom Field wrapper with proper fallback handling.

The custom Field function correctly wraps Pydantic's Field and adds status metadata to json_schema_extra. The implementation handles both dictionary and non-dictionary cases for json_schema_extra, ensuring robustness.


1025-1026: Appropriate status annotations for API stability tracking.

The status annotations correctly identify the maturity level of various fields:

  • "deprecated" for legacy fields like max_lora_rank, max_loras, max_cpu_loras, and backend
  • "beta" for stable but evolving features like gpus_per_node, moe_cluster_parallel_size, enable_attention_dp
  • "prototype" for experimental features like cp_config, iterator stats fields, and various cache configs

This aligns well with the API stability tagging system and helps users understand feature maturity.

Also applies to: 1030-1031, 1042-1044, 1047-1048, 1061-1063, 1065-1070, 1075-1077, 1104-1105, 1109-1110, 1114-1114, 1117-1118, 1121-1123, 1146-1148, 1154-1155, 1159-1160, 1164-1165, 1187-1187


1740-1744: New fields added with appropriate defaults.

The addition of batching_type and normalize_log_probs fields to TrtLlmArgs looks correct:

  • batching_type uses the proper BatchingType enum from the relevant code snippets
  • Both fields have sensible defaults (None and False respectively)
  • Clear descriptions are provided

2163-2181: Well-implemented warning method for unstable feature usage.

The warn_on_unstable_feature_usage method correctly:

  • Uses model_dump(exclude_unset=True) to only check explicitly set fields
  • Accesses field metadata from model_fields and json_schema_extra
  • Warns only for "beta" and "prototype" status values
  • Provides clear messaging about the feature's stability level

This provides valuable feedback to users about unstable features they're using.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12518 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9303 completed with status: 'FAILURE'

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from e03f790 to 9e8977c Compare July 22, 2025 16:54
@nv-guomingz nv-guomingz requested a review from a team as a code owner July 22, 2025 16:54
@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from 9e8977c to 1a9909b Compare July 22, 2025 16:54
@nv-guomingz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12594 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tensorrt_llm/llmapi/utils.py (2)

605-605: Thread safety concern with mutable class variable remains unaddressed.

The past review comment about thread safety with the mutable class variable method_to_status has not been addressed. This could still cause race conditions in multi-threaded environments.


635-636: Incorrect singleton instantiation pattern remains unaddressed.

The past review comment about incorrect instantiation of ApiStatusRegistry for classmethods has not been addressed. The code still creates unnecessary instances to call classmethods.

🧹 Nitpick comments (1)
tensorrt_llm/llmapi/llm_args.py (1)

1887-1887: Address line length issue in CUDA graph description.

The line exceeds the 120-character limit. Consider breaking the description into multiple lines.

Apply this diff to improve readability:

-        "CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, \
-        and are enabled for batches that consist of decoding requests *only* \
-        (the reason is that it's hard to capture a single graph with prefill requests \
-        since the input shapes are a function of the sequence lengths).\
-         Note that each CUDA graph can use up to 200 MB of extra memory.",
+        "CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, "
+        "and are enabled for batches that consist of decoding requests *only* "
+        "(the reason is that it's hard to capture a single graph with prefill requests "
+        "since the input shapes are a function of the sequence lengths). "
+        "Note that each CUDA graph can use up to 200 MB of extra memory.",
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e03f790 and 1a9909b.

📒 Files selected for processing (11)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/bench/benchmark/throughput.py (1 hunks)
  • tensorrt_llm/llmapi/llm.py (9 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1887-1887: Line too long (138 > 120)

(E501)


1893-1893: Line too long (142 > 120)

(E501)


1904-1904: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

✅ Files skipped from review due to trivial changes (1)
  • tensorrt_llm/bench/benchmark/throughput.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • docs/source/_static/custom.css
  • tests/unittest/api_stability/references_committed/llm.yaml
  • tests/unittest/llmapi/test_utils.py
  • docs/source/helper.py
  • tests/unittest/api_stability/api_stability_core.py
  • docs/source/conf.py
  • tests/unittest/api_stability/references/llm.yaml
  • tensorrt_llm/llmapi/llm.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1887-1887: Line too long (138 > 120)

(E501)


1893-1893: Line too long (142 > 120)

(E501)


1904-1904: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

🔇 Additional comments (11)
tensorrt_llm/llmapi/utils.py (2)

496-496: LGTM! Consistent with new status metadata approach.

The change from checking the "deprecated" key to field_info.get("status", None) == "deprecated" correctly aligns with the new status metadata structure introduced in the Field wrapper.


551-589: LGTM! Well-designed API parameter tagging utility.

The ApiParamTagger class provides a clean approach to process Pydantic models and amend field descriptions with status tags based on json_schema_extra metadata. The implementation correctly rebuilds the model schema after modifications.

tensorrt_llm/llmapi/llm_args.py (9)

64-92: Well-designed custom Field wrapper for API stability tracking.

The custom Field wrapper effectively extends Pydantic's Field to support status metadata while maintaining backward compatibility. The implementation correctly handles the json_schema_extra dictionary merging and provides clear documentation for the status levels.


1025-1025: Consistent status annotation application.

The addition of status="beta" to the gpus_per_node field is appropriate and follows the established pattern for marking unstable API parameters.


1030-1031: Good status annotation for MoE cluster parameter.

Marking moe_cluster_parallel_size as beta status is appropriate given its specialized nature for MoE models.


1042-1044: Appropriate beta status for attention data parallel.

The beta status for enable_attention_dp correctly reflects its experimental nature.


1047-1048: Correct prototype status for context parallel config.

Marking cp_config as prototype status is appropriate given its advanced and potentially unstable nature.


1061-1063: Proper deprecation marking for max_lora_rank.

The combination of deprecated parameter and status="deprecated" correctly indicates this field should no longer be used.


1740-1744: New fields properly documented and typed.

The addition of batching_type and normalize_log_probs fields follows good practices with clear descriptions and appropriate types.


1868-1870: Appropriate deprecation of build_config in TorchLlmArgs.

Marking the build_config field as deprecated in TorchLlmArgs is appropriate since it's mentioned as "just a dummy BuildConfig" in the comment.


2163-2181: Excellent feature for warning about unstable API usage.

The warn_on_unstable_feature_usage method provides valuable feedback to users about potentially unstable features they're using. The implementation correctly checks field metadata and provides informative warnings.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12594 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9372 completed with status: 'FAILURE'

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from 1a9909b to 7d34d45 Compare July 23, 2025 02:53
@nv-guomingz
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
tests/unittest/api_stability/api_stability_core.py (2)

76-76: Status fields properly declared but serialization incomplete

The new status fields are properly declared in both dataclasses. However, the serialization methods still need to handle these fields properly as noted in the past review comments.

Also applies to: 133-133


452-533: Well-structured API status validation with one debug print issue

The test_api_status method provides comprehensive validation of API status consistency between live code and reference YAML. The implementation properly:

  1. Limits scope to llm.yaml reference file
  2. Imports required dependencies within the method scope
  3. Validates both method and parameter status annotations
  4. Provides detailed error messages for mismatches

However, there's a debug print statement that should be removed.

The debug print at line 529 should be removed as noted in past review comments:

-                    print(f"param_name: {param_name}, param_data: {param_data}")
tensorrt_llm/llmapi/utils.py (2)

591-632: API status registry needs thread safety improvements

The ApiStatusRegistry class provides a good foundation for managing API method statuses, but has thread safety concerns as noted in past reviews.

The implementation correctly uses __qualname__ for method identification and provides proper status retrieval and docstring amendment functionality.


635-636: Fix incorrect singleton instantiation

The code incorrectly creates an instance to call classmethods, as noted in past review comments.

-set_api_status = ApiStatusRegistry().set_api_status
-get_api_status = ApiStatusRegistry().get_api_status
+set_api_status = ApiStatusRegistry.set_api_status
+get_api_status = ApiStatusRegistry.get_api_status
🧹 Nitpick comments (2)
tests/unittest/api_stability/api_stability_core.py (1)

7-7: Unused import detected

The pprint import is added but not used in the visible code. Consider removing it to keep imports clean.

-from pprint import pprint
tensorrt_llm/llmapi/llm_args.py (1)

1887-1887: Address line length violations for code style consistency.

Three lines exceed the 120-character limit. Consider breaking these long description strings for better readability:

-        "Note that each CUDA graph can use up to 200 MB of extra memory.",
+        "Note that each CUDA graph can use up to 200 MB of extra memory.",

-        description="Disable the overlap scheduler.",
+        description="Disable the overlap scheduler.",

-        "If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc.",
+        "If true, will iterate over sampling_params of each request and use "
+        "the corresponding sampling strategy, e.g. top-k, top-p, etc.",

Also applies to: 1893-1893, 1904-1904

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a9909b and 7d34d45.

📒 Files selected for processing (11)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/bench/benchmark/throughput.py (1 hunks)
  • tensorrt_llm/llmapi/llm.py (9 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

tensorrt_llm/llmapi/llm_args.py

1887-1887: Line too long (138 > 120)

(E501)


1893-1893: Line too long (142 > 120)

(E501)


1904-1904: Line too long (127 > 120)

(E501)

🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/unittest/api_stability/references_committed/llm.yaml
  • docs/source/_static/custom.css
  • tests/unittest/llmapi/test_utils.py
  • tensorrt_llm/bench/benchmark/throughput.py
  • docs/source/helper.py
  • docs/source/conf.py
  • tensorrt_llm/llmapi/llm.py
  • tests/unittest/api_stability/references/llm.yaml
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

tensorrt_llm/llmapi/llm_args.py

1887-1887: Line too long (138 > 120)

(E501)


1893-1893: Line too long (142 > 120)

(E501)


1904-1904: Line too long (127 > 120)

(E501)

🔇 Additional comments (8)
tests/unittest/api_stability/api_stability_core.py (1)

407-407: Good implementation of reference isolation

Creating a deep copy of the reference as non_committed_reference is a solid approach to isolate non-committed API elements for separate validation.

tensorrt_llm/llmapi/utils.py (2)

496-496: Correct alignment with new status metadata structure

The change from checking "deprecated" to field_info.get("status", None) == "deprecated" properly aligns with the new unified status metadata approach introduced in this PR.


551-588: Well-implemented parameter tagging utility

The ApiParamTagger class provides a clean mechanism for adding status tags to Pydantic model fields. The implementation properly:

  1. Processes model fields with status metadata
  2. Amends field descriptions with appropriate tags
  3. Forces model rebuild to reflect changes
  4. Provides a convenient tag_llm_params function for easy usage

The approach of modifying field descriptions and rebuilding the model schema is appropriate for documentation integration.

Note: The static analysis hints about docstring formatting are cosmetic and don't affect functionality.

tensorrt_llm/llmapi/llm_args.py (5)

64-92: LGTM! Well-structured API status tagging implementation.

The custom Field wrapper effectively extends Pydantic's functionality to support status metadata. The implementation properly handles edge cases and maintains backward compatibility.

Minor suggestion: Consider clarifying the docstring for the default parameter since ... is Pydantic's sentinel for required fields, not a literal default value.


1025-1025: Well-categorized API status annotations.

The status annotations are appropriately assigned:

  • Deprecated fields correctly identify legacy parameters with modern replacements (e.g., max_lora_ranklora_config.max_lora_rank)
  • Beta fields mark stable but potentially evolving features
  • Prototype fields identify experimental or debugging functionality

This systematic approach to API stability labeling will help users make informed decisions about feature adoption.

Also applies to: 1031-1031, 1044-1044, 1048-1048, 1062-1063, 1069-1070, 1076-1077, 1105-1105, 1110-1110, 1114-1114, 1118-1118, 1123-1123, 1148-1148, 1155-1155, 1160-1160, 1165-1165, 1187-1187, 1869-1870, 1878-1878, 1888-1888, 1893-1893, 1897-1897, 1901-1901, 1907-1907, 1913-1913, 1918-1918, 1924-1924, 1928-1928, 1931-1931, 1936-1936, 1941-1941, 1953-1953, 1967-1967, 1975-1975


2163-2181: Excellent runtime warning implementation for unstable features.

The method effectively provides user feedback about beta and prototype feature usage by:

  • Only checking explicitly set fields (avoiding false positives)
  • Providing clear, actionable warning messages
  • Distinguishing between beta and prototype stability levels

This helps users make informed decisions about feature adoption in production environments.


2740-2744: Well-structured new field additions.

The new batching_type and normalize_log_probs fields are properly implemented:

  • Appropriate types and sensible defaults
  • Clear, descriptive documentation
  • Consistent with existing field patterns in the class

64-2181: Excellent implementation of API stability framework.

This comprehensive update successfully introduces structured API status tagging throughout the LLM arguments system. Key strengths:

  1. Clean Architecture: The custom Field wrapper elegantly extends Pydantic without breaking existing functionality
  2. Systematic Categorization: Status annotations are thoughtfully applied across deprecated, beta, and prototype features
  3. Runtime Feedback: The warning mechanism helps developers make informed decisions about feature usage
  4. Future-Proof Design: The framework is extensible and integrates well with the broader API stability system

This will significantly improve developer experience by providing clear guidance on feature maturity and stability.

@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from 7d34d45 to 8fb25e1 Compare July 23, 2025 06:50
Signed-off-by: Superjomn <[email protected]>
Signed-off-by: nv-guomingz <[email protected]>
@nv-guomingz nv-guomingz force-pushed the user/guomingz/mirror_5707 branch from 8fb25e1 to 9cd6594 Compare July 23, 2025 10:12
@nv-guomingz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12704 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tensorrt_llm/llmapi/utils.py (2)

605-605: Thread safety concern with mutable class variable

Using a mutable class variable method_to_status could cause race conditions in multi-threaded environments.


635-636: Incorrect singleton instantiation

The class uses classmethods but creates an instance. Either use instance methods or remove the instantiation.

🧹 Nitpick comments (2)
tensorrt_llm/llmapi/utils.py (1)

551-589: Fix docstring formatting issues but functionality looks good.

The ApiParamTagger class implementation is sound and correctly processes Pydantic models to add status tags. However, there are docstring formatting issues that should be addressed.

Apply these formatting fixes:

 class ApiParamTagger:
-    ''' A helper to tag the api doc according to the status of the fields.
-    The status is set in the json_schema_extra of the field.
-    '''
+    """A helper to tag the api doc according to the status of the fields.
+    
+    The status is set in the json_schema_extra of the field.
+    """

     def __call__(self, cls: Type[BaseModel]) -> None:
-        self.process_pydantic_model(cls)
+        """Process the given Pydantic model."""
+        self.process_pydantic_model(cls)

     def process_pydantic_model(self, cls: Type[BaseModel]) -> None:
-        """Process the Pydantic model to add tags to the fields.
-        """
+        """Process the Pydantic model to add tags to the fields."""

     def amend_pydantic_field_description_with_tags(self, cls: Type[BaseModel],
                                                    field_names: list[str],
                                                    tag: str) -> None:
-        """Amend the description of the fields with tags.
-        e.g. :tag:`beta` or :tag:`prototype`
+        """Amend the description of the fields with tags.
+        
+        e.g. :tag:`beta` or :tag:`prototype`
+        
         Args:
             cls: The Pydantic BaseModel class.
             field_names: The names of the fields to amend.
             tag: The tag to add to the fields.
         """
tensorrt_llm/llmapi/llm_args.py (1)

1867-1867: Fix line length violations in field descriptions.

The static analysis correctly identifies lines exceeding 120 characters. These can be fixed by breaking the descriptions into multiple lines.

Apply these fixes for better code formatting:

-        Note that each CUDA graph can use up to 200 MB of extra memory.",
+        Note that each CUDA graph can use up to 200 MB of extra memory."
+        "",

-        "If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc.",
+        "If true, will iterate over sampling_params of each request and use the "
+        "corresponding sampling strategy, e.g. top-k, top-p, etc.",

For line 1873, the description appears to be just "Disable the overlap scheduler." which should already be under the limit, so please double-check that specific line.

Also applies to: 1873-1873, 1884-1884

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fb25e1 and 9cd6594.

📒 Files selected for processing (11)
  • docs/source/_static/custom.css (1 hunks)
  • docs/source/conf.py (3 hunks)
  • docs/source/helper.py (2 hunks)
  • tensorrt_llm/bench/benchmark/throughput.py (2 hunks)
  • tensorrt_llm/llmapi/llm.py (9 hunks)
  • tensorrt_llm/llmapi/llm_args.py (14 hunks)
  • tensorrt_llm/llmapi/utils.py (2 hunks)
  • tests/unittest/api_stability/api_stability_core.py (5 hunks)
  • tests/unittest/api_stability/references/llm.yaml (3 hunks)
  • tests/unittest/api_stability/references_committed/llm.yaml (1 hunks)
  • tests/unittest/llmapi/test_utils.py (1 hunks)
🧬 Code Graph Analysis (1)
tensorrt_llm/llmapi/utils.py (2)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
  • LlmArgs (224-316)
tensorrt_llm/logger.py (1)
  • debug (143-144)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1867-1867: Line too long (138 > 120)

(E501)


1873-1873: Line too long (142 > 120)

(E501)


1884-1884: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

🚧 Files skipped from review as they are similar to previous changes (9)
  • tests/unittest/api_stability/references_committed/llm.yaml
  • tests/unittest/llmapi/test_utils.py
  • docs/source/helper.py
  • docs/source/_static/custom.css
  • tensorrt_llm/bench/benchmark/throughput.py
  • tests/unittest/api_stability/api_stability_core.py
  • docs/source/conf.py
  • tensorrt_llm/llmapi/llm.py
  • tests/unittest/api_stability/references/llm.yaml
🧰 Additional context used
🧬 Code Graph Analysis (1)
tensorrt_llm/llmapi/utils.py (2)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
  • LlmArgs (224-316)
tensorrt_llm/logger.py (1)
  • debug (143-144)
🪛 Ruff (0.12.2)
tensorrt_llm/llmapi/llm_args.py

1867-1867: Line too long (138 > 120)

(E501)


1873-1873: Line too long (142 > 120)

(E501)


1884-1884: Line too long (127 > 120)

(E501)

tensorrt_llm/llmapi/utils.py

549-551: 1 blank line required between summary line and description

(D205)


557-558: One-line docstring should fit on one line

Reformat to one line

(D200)


568-574: 1 blank line required between summary line and description

(D205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tensorrt_llm/llmapi/utils.py (1)

496-496: LGTM! Consistent with the new status tagging system.

The change from checking "deprecated" key to field_info.get("status", None) == "deprecated" aligns with the unified status metadata approach being introduced throughout the codebase.

tensorrt_llm/llmapi/llm_args.py (5)

64-92: Excellent implementation of the custom Field wrapper.

The implementation properly handles all edge cases for json_schema_extra and provides a clean API for adding status metadata to Pydantic fields. The documentation is comprehensive and the allowed status values follow a logical hierarchy.


1005-1005: Well-structured status annotations for BaseLlmArgs fields.

The status assignments are logical and consistent:

  • "beta" for features that are usable but not fully stable
  • "prototype" for experimental features
  • "deprecated" for fields being phased out with clear migration guidance

This will help users make informed decisions about feature usage.

Also applies to: 1011-1011, 1024-1024, 1028-1028, 1042-1043, 1049-1050, 1056-1057, 1085-1085, 1090-1090, 1094-1094, 1098-1098, 1103-1103, 1128-1128, 1135-1135, 1140-1140, 1145-1145, 1167-1167


1719-1724: Clean addition of new stable fields to TrtLlmArgs.

The new batching_type and normalize_log_probs fields are properly typed with appropriate defaults and descriptions. The absence of status annotations indicates these are stable APIs.


1847-1849: Comprehensive and well-categorized status annotations for TorchLlmArgs.

The status assignments are logical:

  • build_config appropriately marked as deprecated
  • Performance/debugging features marked as "prototype"
  • More mature but evolving features marked as "beta"

This provides clear guidance on the maturity level of PyTorch backend features.

Also applies to: 1857-1857, 1867-1867, 1872-1872, 1876-1876, 1880-1880, 1886-1886, 1892-1892, 1897-1897, 1903-1903, 1907-1907, 1910-1910, 1915-1915, 1920-1920, 1932-1932, 1946-1946, 1954-1954


2142-2160: Excellent implementation of unstable feature warning system.

The method effectively leverages the status metadata to warn users about potentially unstable features they're using. Key strengths:

  • Only warns about explicitly set fields
  • Clear, actionable warning messages
  • Proper method chaining with return self
  • Good integration with the status annotation system

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12704 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9452 completed with status: 'FAILURE'

@nv-guomingz nv-guomingz deleted the user/guomingz/mirror_5707 branch September 30, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants