[Feature Request] [Pass]Greedy Pipeline Scheduling

### Required prerequisites

- [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.)

### Motivation
Our goal of pipeline planning is to run pipline operations across heterogeneous compute devices (tensor cores, vector cores, and ODMA units) in in parallel  to minimize total execution time. 

We can employ a critical-path aware greedy scheduling algorithm that prioritizes commands on the longest dependency chain, ensuring bottleneck operations complete as early as possible. To maximize hardware utilization, the algorithm should support multi-iteration pipelining,
  where operations from different iterations execute concurrently since they have no inter-iteration dependencies—allowing, for example, iteration 1's memory transfers to occur while iteration 0's
  compute operations are still running. 

Additionally, we can also perform buffer liveness analysis to reduce memory footprint by identifying when logical buffers from different iterations can safely
  reuse the same physical memory locations. 

### Solution

_No response_

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] [Pass]Greedy Pipeline Scheduling #63

Required prerequisites

Motivation

Solution

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] [Pass]Greedy Pipeline Scheduling #63

Description

Required prerequisites

Motivation

Solution

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions