-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Question
static uint64_t calculateMaxKernelExecTimeUsecs(struct inspectorCollInfo *collInfo,
inspectorTimingSource_t *timingSource) {
// ...
for (uint32_t i = 0; i < collInfo->nChannels; i++) {
struct inspectorKernelChInfo *kernelCh = &collInfo->kernelCh[i]; // <-- get channel[i] info
uint64_t gpuExecTimeUsecs = calculateKernelGpuExecTimeUsecs(kernelCh); // <-- stopClk - startClk
if (gpuExecTimeUsecs > 0) {
if (gpuExecTimeUsecs > maxKernelExecTimeUsecs) {
maxKernelExecTimeUsecs = gpuExecTimeUsecs; // <-- max of all time
bestTimingSource = inspectorTimingSourceKernelGpu;
}
} else {
// ...Let's say there is something wrong with a kernel, making a channel later than all others.
time point 0 1 2 ... 98 99
channel 0 start end
channel 1 start end
channel 2 start end
channel 3 start end
In this case, the kernel exec time should be like 99 - 0 = 99, but in current algorithm we will say it is max(2-0, 99-98) = 2.
Could you elaborate why the inspector picks the latter algorithm? I'm not sure if this is working as intended or there are some mechanisms that make it reasonable than the former one. Thanks! 😃