QiJune
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 13 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 6 additions & 1 deletion b/‎README.md‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎cpp/include/tensorrt_llm/batch_manager/trtGptModelOptionalParams.h‎
Lines changed: 0 additions & 140 deletions b/‎cpp/include/tensorrt_llm/batch_manager/trtGptModelOptionalParams.h‎
Lines changed: 0 additions & 140 deletions
diff --git a/‎cpp/include/tensorrt_llm/executor/executor.h‎
Lines changed: 2 additions & 0 deletions b/‎cpp/include/tensorrt_llm/executor/executor.h‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎cpp/kernels/xqa/CMakeLists.txt‎
Lines changed: 23 additions & 14 deletions b/‎cpp/kernels/xqa/CMakeLists.txt‎
Lines changed: 23 additions & 14 deletions
diff --git a/‎cpp/kernels/xqa/barriers.cuh‎
Lines changed: 1 addition & 1 deletion b/‎cpp/kernels/xqa/barriers.cuh‎
Lines changed: 1 addition & 1 deletion
@@ -101,12 +101,18 @@ Developer workflow for code contributions is as follows:
 
 The naming of the merge requests in TensorRT-LLM follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/). If the PR includes an API change that might break user code/API usage, consider adding "BREAKING CHANGE" in the title so that reviewers know what to expect. Additionally, if the PR is not related to any bug and task, consider using "chore" or None as the placeholder.
 
+[!IMPORTANT]
+For NVIDIA developers, please include the JIRA number or NVBUG ID in the PR title whenever possible.
+
 Good PR Titles Examples:
 * feat: Add support for starcoder-v2 FP8 base + FP16/BF16 LoRA
 * BREAKING CHANGE: Set default max batch size to 2048
 * chore: Remove version from plugins .so
 * None: Stringized enums for better error msgs
 * fix https://github.com/NVIDIA/TensorRT-LLM/issues/700: a Memory leak issue in C++ runtime
+* [TRTLLM-5516] perf: replicate dummy request for cuda graph padding (**NVIDIAN only**)
+* [nvbug/5334370] fix: Fix one model EAGLE3 (**NVIDIAN only**)
+
 
 This is important for tracking and collecting what has been submitted to which release and makes it easier for others to track the bugs or tasks. It could also be helpful when collecting GitHub publish announcement.
 
@@ -118,6 +124,13 @@ In the PR description, please consider addressing these points:
 * Potential performance or functional impacts of the changes. If there are risks, please inform the reviewers.
 * Link to the related PRs.
 
+[!IMPORTANT]
+For NVIDIA developers,  please submit feature or bug fixes to the dedicated branch specified in the nvbug
+**Keywords** field. For example, if a bug is reported on the release/v0.20 branch, please submit the fix to
+`release/v0.20` instead of the main branch.
+
+Meanwhile, please add the "release blocker" label to any PRs that could potentially cause a release delay.
+
 
 ## Tests and Code Review for Protected APIs
 
 
@@ -9,7 +9,7 @@ TensorRT-LLM
 [![python](https://img.shields.io/badge/python-3.10-green)](https://www.python.org/downloads/release/python-31012/)
 [![cuda](https://img.shields.io/badge/cuda-12.9.0-green)](https://developer.nvidia.com/cuda-downloads)
 [![trt](https://img.shields.io/badge/TRT-10.11.0-green)](https://developer.nvidia.com/tensorrt)
-[![version](https://img.shields.io/badge/release-0.21.0rc2-green)](./tensorrt_llm/version.py)
+[![version](https://img.shields.io/badge/release-1.0.0rc0-green)](./tensorrt_llm/version.py)
 [![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)
 
 [Architecture](./docs/source/torch/arch_overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Performance](./docs/source/performance/perf-overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Examples](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Documentation](./docs/source/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Roadmap](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue%20state%3Aopen%20label%3Aroadmap)
@@ -18,6 +18,9 @@ TensorRT-LLM
 <div align="left">
 
 ## Tech Blogs
+* [06/19] Disaggregated Serving in TensorRT-LLM
+✨ [➡️ link](./docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md)
+
 * [06/05] Scaling Expert Parallelism in TensorRT-LLM (Part 1: Design and Implementation of Large-scale EP)
 ✨ [➡️ link](./docs/source/blogs/tech_blog/blog4_Scaling_Expert_Parallelism_in_TensorRT-LLM.md)
 
@@ -31,6 +34,7 @@ TensorRT-LLM
 ✨ [➡️ link](./docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md)
 
 ## Latest News
+* [06/17] Join NVIDIA and DeepInfra for a developer meetup on June 26 ✨ [➡️ link](https://events.nvidia.com/scaletheunscalablenextgenai)
 * [05/22] Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick
 ✨ [➡️ link](https://developer.nvidia.com/blog/blackwell-breaks-the-1000-tps-user-barrier-with-metas-llama-4-maverick/)
 * [04/10] TensorRT-LLM DeepSeek R1 performance benchmarking best practices now published.
@@ -223,3 +227,4 @@ To get started with TensorRT-LLM, visit our documentation:
 - [Quantized models on Hugging Face](https://huggingface.co/collections/nvidia/model-optimizer-66aa84f7966b3150262481a4): A growing collection of quantized (e.g., FP8, FP4) and optimized LLMs, including [DeepSeek FP4](https://huggingface.co/nvidia/DeepSeek-R1-FP4), ready for fast inference with TensorRT-LLM.
 - [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo): A datacenter scale distributed inference serving framework that works seamlessly with TensorRT-LLM.
 - [AutoDeploy](./examples/auto_deploy/README.md): An experimental backend for TensorRT-LLM to simplify and accelerate the deployment of PyTorch models.
+- [WeChat Discussion Group](https://github.com/NVIDIA/TensorRT-LLM/issues/5359): A real-time channel for TensorRT-LLM Q&A and news.
@@ -1351,6 +1351,8 @@ class GuidedDecodingConfig
     {
         /// @brief Enable guided decoding with XGrammar backend.
         kXGRAMMAR = 0,
+        /// @brief Enable guided decoding with LLGuidance backend.
+        kLLGUIDANCE = 1,
     };
 
     explicit GuidedDecodingConfig(GuidedDecodingBackend backend,
 
@@ -84,21 +84,30 @@ add_custom_command(
 add_custom_target(xqa_sources_h DEPENDS ${XQA_SOURCES_H})
 
 if(BUILD_XQA_TESTS)
-  # GoogleTest Preparation - Code block copied from
-  # https://google.github.io/googletest/quickstart-cmake.html
-  include(FetchContent)
-  FetchContent_Declare(
-    googletest
-    GIT_REPOSITORY https://github.com/google/googletest.git
-    GIT_TAG v1.15.2)
-  include(GoogleTest)
+  # Try to find system installed GTest first
+  find_package(GTest QUIET)
+  if(NOT GTest_FOUND)
+    message(STATUS "System GTest not found, fetching from repository")
+    include(FetchContent)
+    FetchContent_Declare(
+      googletest
+      GIT_REPOSITORY https://github.com/google/googletest.git
+      GIT_TAG v1.15.2)
+    FetchContent_MakeAvailable(googletest)
+    include(GoogleTest)
+  endif()
 
-  # Add Eigen via FetchContent
-  FetchContent_Declare(
-    eigen
-    GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
-    GIT_TAG 3.4.0)
-  FetchContent_MakeAvailable(googletest eigen)
+  # Try to find system installed Eigen first
+  find_package(Eigen3 3.4 QUIET)
+  if(NOT Eigen3_FOUND)
+    message(STATUS "System Eigen not found, fetching from repository")
+    include(FetchContent)
+    FetchContent_Declare(
+      eigen
+      GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
+      GIT_TAG 3.4.0)
+    FetchContent_MakeAvailable(eigen)
+  endif()
 
   enable_testing()
   add_executable(
 
@@ -434,7 +434,7 @@ using CtaBarrier = MBarrier<Scope::CTA>;
 using CgaBarrier = MBarrier<Scope::CGA>;
 
 template <uint32_t nbBars>
-__device__ inline bool toParity(uint32_t i)
+__device__ inline constexpr bool toParity(uint32_t i)
 {
     return i % (nbBars * 2) / nbBars;
 }
Original file line number	Diff line number	Diff line change
`@@ -1351,6 +1351,8 @@ class GuidedDecodingConfig`
`1351`	`1351`	`{`
`1352`	`1352`	`/// @brief Enable guided decoding with XGrammar backend.`
`1353`	`1353`	`kXGRAMMAR = 0,`
	`1354`	`+ /// @brief Enable guided decoding with LLGuidance backend.`
	`1355`	`+ kLLGUIDANCE = 1,`
`1354`	`1356`	`};`
`1355`	`1357`
`1356`	`1358`	`explicit GuidedDecodingConfig(GuidedDecodingBackend backend,`
Original file line number	Diff line number	Diff line change
`@@ -434,7 +434,7 @@ using CtaBarrier = MBarrier<Scope::CTA>;`
`434`	`434`	`using CgaBarrier = MBarrier<Scope::CGA>;`
`435`	`435`
`436`	`436`	`template <uint32_t nbBars>`
`437`		`-__device__ inline bool toParity(uint32_t i)`
	`437`	`+__device__ inline constexpr bool toParity(uint32_t i)`
`438`	`438`	`{`
`439`	`439`	`return i % (nbBars * 2) / nbBars;`
`440`	`440`	`}`