-
Notifications
You must be signed in to change notification settings - Fork 343
[AIG] Add LongestPathAnalysis implementation #8529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a19c629
to
3e65f98
Compare
This patch implements the LongestPathAnalysis for the AIG dialect, which calculates the maximum delay through combinational paths in a circuit where each AIG operation has a unit delay. The implementation: - Adds the core analysis infrastructure with three main components: - Context: Thread-safe interface to access analysis results across modules - LocalVisitor: Per-module visitor that computes paths within module boundaries - Impl: Top-level implementation that orchestrates analysis across hierarchies - Implements hierarchical analysis that works across module boundaries - Supports parallel execution for improved performance on large designs - Provides both closed paths (register-to-register) and open paths analysis - Includes debug point tracing for detailed path information - Adds a PrintLongestPathAnalysis pass for outputting results The analysis is designed to be scalable for large designs through techniques like redundant path elimination, graph parallelism, and early pruning of paths. It's particularly useful for identifying critical paths, estimating circuit performance, and guiding optimization passes.
3e65f98
to
c92f0c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really really cool! I love the split into a thread-safe global and a multi-threaded local part of the analysis to make things scale. And using equivalence classes to deal with aliasing operations is very clever 🥳 💯 Can't wait to see what we can do with this in CIRCT 🚀
// expected-remark-re @+4 {{fanOut=Object($root.x[0]), fanIn=Object($root.a[0], delay=1, history=[{{.+}}])}} | ||
// expected-remark-re @+3 {{fanOut=Object($root.x[0]), fanIn=Object($root.b[0], delay=1, history=[{{.+}}])}} | ||
// expected-remark-re @+2 {{fanOut=Object($root.x[1]), fanIn=Object($root.a[1], delay=1, history=[{{.+}}])}} | ||
// expected-remark-re @+1 {{fanOut=Object($root.x[1]), fanIn=Object($root.b[1], delay=1, history=[{{.+}}])}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could @below
work for these, instead of counting the number of lines?
// expected-remark @+2 {{fanOut=Object($root.x[0]), fanIn=Object($root.a[0], delay=2, history=[Object($root.c2.x[0], delay=2, comment="output port"), Object($root/c2:child.a[0], delay=1, comment="input port"), Object($root.c1.x[0], delay=1, comment="output port"), Object($root/c1:child.a[0], delay=0, comment="input port"), Object($root.a[0], delay=0, comment="input port")])}} | ||
// expected-remark @+1 {{fanOut=Object($root.x[0]), fanIn=Object($root.b[0], delay=2, history=[Object($root.c2.x[0], delay=2, comment="output port"), Object($root/c2:child.a[0], delay=1, comment="input port"), Object($root.c1.x[0], delay=1, comment="output port"), Object($root/c1:child.b[0], delay=0, comment="input port"), Object($root.b[0], delay=0, comment="input port")])}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really really cool 🥳
const char *ir = R"MLIR( | ||
hw.module private @basic(in %clock : !seq.clock, in %a : i2, in %b : i2, out x : i2, out y: i4) { | ||
%p = seq.firreg %a clock %clock : i2 | ||
%q = seq.firreg %s clock %clock : i2 | ||
%r = hw.instance "inst" @child(a: %p: i2, b: %b: i2) -> (x: i2) | ||
%s = aig.and_inv not %p, %q, %r : i2 | ||
%dummy = comb.concat %s, %a : i2, i2 | ||
hw.output %s, %dummy : i2, i4 | ||
} | ||
hw.module private @child(in %a : i2, in %b : i2, out x : i2) { | ||
%r = aig.and_inv not %a, %b {sv.namehint = "child.r"} : i2 | ||
hw.output %r : i2 | ||
} | ||
)MLIR"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is some precedence for having MLIR blobs encoded in the C++ source as strings? This feels sensible since you are trying to go through a lot of the details provided by the analysis. On the other hand, you could argue that maybe the printer should have a verbose option where it prints all the interesting info you'd want to check, which would then allow you to check this with FileCheck.
//===----------------------------------------------------------------------===// | ||
// Context | ||
//===----------------------------------------------------------------------===// | ||
// This class provides a thread-safe interface to access the analysis results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This class provides a thread-safe interface to access the analysis results. | |
/// This class provides a thread-safe interface to access the analysis results. |
// Track equivalence classes for data movement ops | ||
// Treat output as equivalent to input for pure data movement operations | ||
llvm::EquivalenceClasses<std::pair<Value, size_t>> ec; | ||
DenseMap<std::pair<Value, size_t>, std::pair<Value, size_t>> ecMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of EquivalenceClasses
! 🥳 I'm wondering if you still need to track that separate ecMap
🤔 It looks like you use it to map from an entry in the class to the leader, which I think the ec
can do for you directly 🤔 Might be mistaken though.
// Wait for the thread to finish. | ||
while (!done.load()) | ||
std::this_thread::sleep_for(std::chrono::milliseconds(100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth using a condition variable here to avoid sleeping. Not sure how much of a performance implication this has in practice 🙂
auto leader = ec.getOrInsertLeaderValue({to, toBitPos}); | ||
auto &slot = ecMap[leader]; | ||
if (!slot.first) | ||
slot = {to, toBitPos}; | ||
// Merge classes, and visit the leader. | ||
auto newLeader = ec.unionSets({to, toBitPos}, {from, fromBitPos}); | ||
assert(leader == *newLeader); | ||
return visitValue(to, toBitPos, results); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice idea to get rid of aliases in the IR and boil things down to a single computed value regardless of how many wires and bit slices you have of that value 💯
auto *node = ctx->instanceGraph->lookup(moduleName); | ||
if (!node || !node->getModule()) { | ||
op.emitError() << "module " << moduleName << " not found"; | ||
return failure(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could turn this into an assert? I think the hw.instance
verifier would already error if the module did not exist 🤔
auto cost = std::max(1u, llvm::Log2_64_Ceil(size)); | ||
// Create edges each operand with cost ceil(log(size)). | ||
for (auto operand : op->getOperands()) | ||
if (failed(addEdge(operand, bitPos, cost, results))) | ||
return failure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the logarithm as an approximation for the ideal work depth of a logic op is a nice idea! If I remember correctly there was even some theoretical work by Torsten Hoefler showing that this holds in many cases.
LongestPathAnalysis::~LongestPathAnalysis() {} | ||
LongestPathAnalysis::~LongestPathAnalysis() { delete impl; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could impl be an std::unique_ptr
to avoid having to deal with naked pointers and allocation/deallocation?
This patch implements the LongestPathAnalysis for the AIG dialect, which calculates the maximum delay through combinational paths in a circuit where each AIG operation has a unit delay. The implementation:
The analysis is designed to be scalable for large designs through techniques like redundant path elimination, graph parallelism, and early pruning of paths. It's particularly useful for identifying critical paths, estimating circuit performance, and guiding optimization passes.