Skip to content

Commit c5cbc34

Browse files
committed
rebase
Signed-off-by: QI JUN <[email protected]>
2 parents 2e123ef + 5558563 commit c5cbc34

33 files changed

+680
-355
lines changed

CONTRIBUTING.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,12 +101,18 @@ Developer workflow for code contributions is as follows:
101101

102102
The naming of the merge requests in TensorRT-LLM follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/). If the PR includes an API change that might break user code/API usage, consider adding "BREAKING CHANGE" in the title so that reviewers know what to expect. Additionally, if the PR is not related to any bug and task, consider using "chore" or None as the placeholder.
103103

104+
[!IMPORTANT]
105+
For NVIDIA developers, please include the JIRA number or NVBUG ID in the PR title whenever possible.
106+
104107
Good PR Titles Examples:
105108
* feat: Add support for starcoder-v2 FP8 base + FP16/BF16 LoRA
106109
* BREAKING CHANGE: Set default max batch size to 2048
107110
* chore: Remove version from plugins .so
108111
* None: Stringized enums for better error msgs
109112
* fix https://github.com/NVIDIA/TensorRT-LLM/issues/700: a Memory leak issue in C++ runtime
113+
* [TRTLLM-5516] perf: replicate dummy request for cuda graph padding (**NVIDIAN only**)
114+
* [nvbug/5334370] fix: Fix one model EAGLE3 (**NVIDIAN only**)
115+
110116

111117
This is important for tracking and collecting what has been submitted to which release and makes it easier for others to track the bugs or tasks. It could also be helpful when collecting GitHub publish announcement.
112118

@@ -118,6 +124,13 @@ In the PR description, please consider addressing these points:
118124
* Potential performance or functional impacts of the changes. If there are risks, please inform the reviewers.
119125
* Link to the related PRs.
120126

127+
[!IMPORTANT]
128+
For NVIDIA developers, please submit feature or bug fixes to the dedicated branch specified in the nvbug
129+
**Keywords** field. For example, if a bug is reported on the release/v0.20 branch, please submit the fix to
130+
`release/v0.20` instead of the main branch.
131+
132+
Meanwhile, please add the "release blocker" label to any PRs that could potentially cause a release delay.
133+
121134

122135
## Tests and Code Review for Protected APIs
123136

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ TensorRT-LLM
99
[![python](https://img.shields.io/badge/python-3.10-green)](https://www.python.org/downloads/release/python-31012/)
1010
[![cuda](https://img.shields.io/badge/cuda-12.9.0-green)](https://developer.nvidia.com/cuda-downloads)
1111
[![trt](https://img.shields.io/badge/TRT-10.11.0-green)](https://developer.nvidia.com/tensorrt)
12-
[![version](https://img.shields.io/badge/release-0.21.0rc2-green)](./tensorrt_llm/version.py)
12+
[![version](https://img.shields.io/badge/release-0.21.0rc3-green)](./tensorrt_llm/version.py)
1313
[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)
1414

1515
[Architecture](./docs/source/torch/arch_overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Performance](./docs/source/performance/perf-overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Examples](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Documentation](./docs/source/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Roadmap](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue%20state%3Aopen%20label%3Aroadmap)

cpp/kernels/xqa/CMakeLists.txt

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -84,21 +84,30 @@ add_custom_command(
8484
add_custom_target(xqa_sources_h DEPENDS ${XQA_SOURCES_H})
8585

8686
if(BUILD_XQA_TESTS)
87-
# GoogleTest Preparation - Code block copied from
88-
# https://google.github.io/googletest/quickstart-cmake.html
89-
include(FetchContent)
90-
FetchContent_Declare(
91-
googletest
92-
GIT_REPOSITORY https://github.com/google/googletest.git
93-
GIT_TAG v1.15.2)
94-
include(GoogleTest)
87+
# Try to find system installed GTest first
88+
find_package(GTest QUIET)
89+
if(NOT GTest_FOUND)
90+
message(STATUS "System GTest not found, fetching from repository")
91+
include(FetchContent)
92+
FetchContent_Declare(
93+
googletest
94+
GIT_REPOSITORY https://github.com/google/googletest.git
95+
GIT_TAG v1.15.2)
96+
FetchContent_MakeAvailable(googletest)
97+
include(GoogleTest)
98+
endif()
9599

96-
# Add Eigen via FetchContent
97-
FetchContent_Declare(
98-
eigen
99-
GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
100-
GIT_TAG 3.4.0)
101-
FetchContent_MakeAvailable(googletest eigen)
100+
# Try to find system installed Eigen first
101+
find_package(Eigen3 3.4 QUIET)
102+
if(NOT Eigen3_FOUND)
103+
message(STATUS "System Eigen not found, fetching from repository")
104+
include(FetchContent)
105+
FetchContent_Declare(
106+
eigen
107+
GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
108+
GIT_TAG 3.4.0)
109+
FetchContent_MakeAvailable(eigen)
110+
endif()
102111

103112
enable_testing()
104113
add_executable(

cpp/kernels/xqa/barriers.cuh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -434,7 +434,7 @@ using CtaBarrier = MBarrier<Scope::CTA>;
434434
using CgaBarrier = MBarrier<Scope::CGA>;
435435

436436
template <uint32_t nbBars>
437-
__device__ inline bool toParity(uint32_t i)
437+
__device__ inline constexpr bool toParity(uint32_t i)
438438
{
439439
return i % (nbBars * 2) / nbBars;
440440
}

0 commit comments

Comments
 (0)