Required prerequisites
Motivation
Our goal of pipeline planning is to run pipline operations across heterogeneous compute devices (tensor cores, vector cores, and ODMA units) in in parallel to minimize total execution time.
We can employ a critical-path aware greedy scheduling algorithm that prioritizes commands on the longest dependency chain, ensuring bottleneck operations complete as early as possible. To maximize hardware utilization, the algorithm should support multi-iteration pipelining,
where operations from different iterations execute concurrently since they have no inter-iteration dependencies—allowing, for example, iteration 1's memory transfers to occur while iteration 0's
compute operations are still running.
Additionally, we can also perform buffer liveness analysis to reduce memory footprint by identifying when logical buffers from different iterations can safely
reuse the same physical memory locations.
Solution
No response
Alternatives
No response
Additional context
No response
Required prerequisites
Motivation
Our goal of pipeline planning is to run pipline operations across heterogeneous compute devices (tensor cores, vector cores, and ODMA units) in in parallel to minimize total execution time.
We can employ a critical-path aware greedy scheduling algorithm that prioritizes commands on the longest dependency chain, ensuring bottleneck operations complete as early as possible. To maximize hardware utilization, the algorithm should support multi-iteration pipelining,
where operations from different iterations execute concurrently since they have no inter-iteration dependencies—allowing, for example, iteration 1's memory transfers to occur while iteration 0's
compute operations are still running.
Additionally, we can also perform buffer liveness analysis to reduce memory footprint by identifying when logical buffers from different iterations can safely
reuse the same physical memory locations.
Solution
No response
Alternatives
No response
Additional context
No response