Skip to content

Commit a364050

Browse files
committed
rebase
Signed-off-by: QI JUN <[email protected]>
2 parents 6311856 + 58a8a8f commit a364050

File tree

236 files changed

+7357
-4969
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

236 files changed

+7357
-4969
lines changed

CONTRIBUTING.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,12 +101,18 @@ Developer workflow for code contributions is as follows:
101101

102102
The naming of the merge requests in TensorRT-LLM follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/). If the PR includes an API change that might break user code/API usage, consider adding "BREAKING CHANGE" in the title so that reviewers know what to expect. Additionally, if the PR is not related to any bug and task, consider using "chore" or None as the placeholder.
103103

104+
[!IMPORTANT]
105+
For NVIDIA developers, please include the JIRA number or NVBUG ID in the PR title whenever possible.
106+
104107
Good PR Titles Examples:
105108
* feat: Add support for starcoder-v2 FP8 base + FP16/BF16 LoRA
106109
* BREAKING CHANGE: Set default max batch size to 2048
107110
* chore: Remove version from plugins .so
108111
* None: Stringized enums for better error msgs
109112
* fix https://github.com/NVIDIA/TensorRT-LLM/issues/700: a Memory leak issue in C++ runtime
113+
* [TRTLLM-5516] perf: replicate dummy request for cuda graph padding (**NVIDIAN only**)
114+
* [nvbug/5334370] fix: Fix one model EAGLE3 (**NVIDIAN only**)
115+
110116

111117
This is important for tracking and collecting what has been submitted to which release and makes it easier for others to track the bugs or tasks. It could also be helpful when collecting GitHub publish announcement.
112118

@@ -118,6 +124,13 @@ In the PR description, please consider addressing these points:
118124
* Potential performance or functional impacts of the changes. If there are risks, please inform the reviewers.
119125
* Link to the related PRs.
120126

127+
[!IMPORTANT]
128+
For NVIDIA developers, please submit feature or bug fixes to the dedicated branch specified in the nvbug
129+
**Keywords** field. For example, if a bug is reported on the release/v0.20 branch, please submit the fix to
130+
`release/v0.20` instead of the main branch.
131+
132+
Meanwhile, please add the "release blocker" label to any PRs that could potentially cause a release delay.
133+
121134

122135
## Tests and Code Review for Protected APIs
123136

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ TensorRT-LLM
99
[![python](https://img.shields.io/badge/python-3.10-green)](https://www.python.org/downloads/release/python-31012/)
1010
[![cuda](https://img.shields.io/badge/cuda-12.9.0-green)](https://developer.nvidia.com/cuda-downloads)
1111
[![trt](https://img.shields.io/badge/TRT-10.11.0-green)](https://developer.nvidia.com/tensorrt)
12-
[![version](https://img.shields.io/badge/release-0.21.0rc2-green)](./tensorrt_llm/version.py)
12+
[![version](https://img.shields.io/badge/release-1.0.0rc0-green)](./tensorrt_llm/version.py)
1313
[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)
1414

1515
[Architecture](./docs/source/torch/arch_overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Performance](./docs/source/performance/perf-overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Examples](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Documentation](./docs/source/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Roadmap](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue%20state%3Aopen%20label%3Aroadmap)
@@ -18,6 +18,9 @@ TensorRT-LLM
1818
<div align="left">
1919

2020
## Tech Blogs
21+
* [06/19] Disaggregated Serving in TensorRT-LLM
22+
[➡️ link](./docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md)
23+
2124
* [06/05] Scaling Expert Parallelism in TensorRT-LLM (Part 1: Design and Implementation of Large-scale EP)
2225
[➡️ link](./docs/source/blogs/tech_blog/blog4_Scaling_Expert_Parallelism_in_TensorRT-LLM.md)
2326

@@ -31,6 +34,7 @@ TensorRT-LLM
3134
[➡️ link](./docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md)
3235

3336
## Latest News
37+
* [06/17] Join NVIDIA and DeepInfra for a developer meetup on June 26 ✨ [➡️ link](https://events.nvidia.com/scaletheunscalablenextgenai)
3438
* [05/22] Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick
3539
[➡️ link](https://developer.nvidia.com/blog/blackwell-breaks-the-1000-tps-user-barrier-with-metas-llama-4-maverick/)
3640
* [04/10] TensorRT-LLM DeepSeek R1 performance benchmarking best practices now published.
@@ -223,3 +227,4 @@ To get started with TensorRT-LLM, visit our documentation:
223227
- [Quantized models on Hugging Face](https://huggingface.co/collections/nvidia/model-optimizer-66aa84f7966b3150262481a4): A growing collection of quantized (e.g., FP8, FP4) and optimized LLMs, including [DeepSeek FP4](https://huggingface.co/nvidia/DeepSeek-R1-FP4), ready for fast inference with TensorRT-LLM.
224228
- [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo): A datacenter scale distributed inference serving framework that works seamlessly with TensorRT-LLM.
225229
- [AutoDeploy](./examples/auto_deploy/README.md): An experimental backend for TensorRT-LLM to simplify and accelerate the deployment of PyTorch models.
230+
- [WeChat Discussion Group](https://github.com/NVIDIA/TensorRT-LLM/issues/5359): A real-time channel for TensorRT-LLM Q&A and news.

cpp/include/tensorrt_llm/batch_manager/trtGptModelOptionalParams.h

Lines changed: 0 additions & 140 deletions
This file was deleted.

cpp/include/tensorrt_llm/executor/executor.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1351,6 +1351,8 @@ class GuidedDecodingConfig
13511351
{
13521352
/// @brief Enable guided decoding with XGrammar backend.
13531353
kXGRAMMAR = 0,
1354+
/// @brief Enable guided decoding with LLGuidance backend.
1355+
kLLGUIDANCE = 1,
13541356
};
13551357

13561358
explicit GuidedDecodingConfig(GuidedDecodingBackend backend,

cpp/kernels/xqa/CMakeLists.txt

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -84,21 +84,30 @@ add_custom_command(
8484
add_custom_target(xqa_sources_h DEPENDS ${XQA_SOURCES_H})
8585

8686
if(BUILD_XQA_TESTS)
87-
# GoogleTest Preparation - Code block copied from
88-
# https://google.github.io/googletest/quickstart-cmake.html
89-
include(FetchContent)
90-
FetchContent_Declare(
91-
googletest
92-
GIT_REPOSITORY https://github.com/google/googletest.git
93-
GIT_TAG v1.15.2)
94-
include(GoogleTest)
87+
# Try to find system installed GTest first
88+
find_package(GTest QUIET)
89+
if(NOT GTest_FOUND)
90+
message(STATUS "System GTest not found, fetching from repository")
91+
include(FetchContent)
92+
FetchContent_Declare(
93+
googletest
94+
GIT_REPOSITORY https://github.com/google/googletest.git
95+
GIT_TAG v1.15.2)
96+
FetchContent_MakeAvailable(googletest)
97+
include(GoogleTest)
98+
endif()
9599

96-
# Add Eigen via FetchContent
97-
FetchContent_Declare(
98-
eigen
99-
GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
100-
GIT_TAG 3.4.0)
101-
FetchContent_MakeAvailable(googletest eigen)
100+
# Try to find system installed Eigen first
101+
find_package(Eigen3 3.4 QUIET)
102+
if(NOT Eigen3_FOUND)
103+
message(STATUS "System Eigen not found, fetching from repository")
104+
include(FetchContent)
105+
FetchContent_Declare(
106+
eigen
107+
GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
108+
GIT_TAG 3.4.0)
109+
FetchContent_MakeAvailable(eigen)
110+
endif()
102111

103112
enable_testing()
104113
add_executable(

cpp/kernels/xqa/barriers.cuh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -434,7 +434,7 @@ using CtaBarrier = MBarrier<Scope::CTA>;
434434
using CgaBarrier = MBarrier<Scope::CGA>;
435435

436436
template <uint32_t nbBars>
437-
__device__ inline bool toParity(uint32_t i)
437+
__device__ inline constexpr bool toParity(uint32_t i)
438438
{
439439
return i % (nbBars * 2) / nbBars;
440440
}

0 commit comments

Comments
 (0)