Skip to content

Commit b86561e

Browse files
committed
fix numBlocksCoop
Signed-off-by: Enwei Zhu <[email protected]>
1 parent e63f847 commit b86561e

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -647,8 +647,7 @@ void run(Data& data, void* stream)
647647
//
648648
// The upper bound is a strict requirement. The number of blocks should be determined by querying
649649
// the device properties, or conservatively low.
650-
// /!\ The following number is not portable!! (but works on H100 and B200)
651-
int const numBlocksCoop = 128;
650+
static int const numBlocksCoop = tensorrt_llm::common::getMultiProcessorCount();
652651

653652
// Maximum number of tokens supported by the kernel using a cooperative launch.
654653
int const maxTokensCoop = (numBlocksCoop * numThreadsHist * 64) / data.mTopK;

0 commit comments

Comments
 (0)