We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent e63f847 commit b86561eCopy full SHA for b86561e
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu
@@ -647,8 +647,7 @@ void run(Data& data, void* stream)
647
//
648
// The upper bound is a strict requirement. The number of blocks should be determined by querying
649
// the device properties, or conservatively low.
650
- // /!\ The following number is not portable!! (but works on H100 and B200)
651
- int const numBlocksCoop = 128;
+ static int const numBlocksCoop = tensorrt_llm::common::getMultiProcessorCount();
652
653
// Maximum number of tokens supported by the kernel using a cooperative launch.
654
int const maxTokensCoop = (numBlocksCoop * numThreadsHist * 64) / data.mTopK;
0 commit comments