[Datapath] Implement Add with Carry-In to Improve Subtraction Circuit Implementation#9949
[Datapath] Implement Add with Carry-In to Improve Subtraction Circuit Implementation#9949sarthakmangla1 wants to merge 1 commit intollvm:mainfrom
Conversation
There was a problem hiding this comment.
Cool, thank you for contributing to CIRCT! I launched benchmark https://uenoku.github.io/circt-synth-tracker/pr-benchmarks/pr-9949/report.html and there are improvements in sub results 👍 (also there was no translation validation failure, the logic is certainly sound).
The code looks clean and makes sense.
I'm also curios what lowering pattern would improve the benchmark like https://github.com/pulp-platform/ELAU/blob/master/src/AddSubC.sv
| auto width = op.getType().getIntOrFloatBitWidth(); | ||
| // Implement a naive Ripple-carry full adder. | ||
| Value carry; | ||
| Value carry = carryIn; // null Value → carry_in = 0 |
There was a problem hiding this comment.
Generally please use ASCII
| Value carry = carryIn; // null Value → carry_in = 0 | |
| Value carry = carryIn; // null Value -> carry_in = 0 |
| auto arch = determineAdderArch(op, width); | ||
| if (arch == AdderArchitecture::RippleCarry) | ||
| return lowerRippleCarryAdder(op, operands[0], operands[1], carryIn, | ||
| rewriter); | ||
| return lowerParallelPrefixAdder(op, operands[0], operands[1], carryIn, | ||
| rewriter); | ||
| } | ||
| return failure(); // 3-input add without const_1 — handled elsewhere | ||
| } | ||
|
|
||
| // Lower only when there are two inputs. | ||
| // Variadic operands must be lowered in a different pattern. | ||
| if (inputs.size() != 2) | ||
| return failure(); | ||
|
|
||
| auto arch = determineAdderArch(op, width); | ||
| if (arch == AdderArchitecture::RippleCarry) | ||
| return lowerRippleCarryAdder(op, inputs, rewriter); | ||
| return lowerParallelPrefixAdder(op, inputs, rewriter); | ||
| return lowerRippleCarryAdder(op, inputs[0], inputs[1], Value(), rewriter); | ||
| return lowerParallelPrefixAdder(op, inputs[0], inputs[1], Value(), rewriter); |
There was a problem hiding this comment.
Could you create a helper function for this part and share at two places?
auto arch = determineAdderArch(op, width);
if (arch == AdderArchitecture::RippleCarry)
return lowerRippleCarryAdder(op, inputs[0], inputs[1], Value(), rewriter);
return lowerParallelPrefixAdder(op, inputs[0], inputs[1], Value(), rewriter);| // This pattern comes from compress(a, b, const_1) in DatapathToComb, which | ||
| // arises from sub → add(a, ~b, 1). Detect const_1 in any operand position. | ||
| if (inputs.size() == 3) { | ||
| for (unsigned i = 0; i < 3; ++i) { |
There was a problem hiding this comment.
Because comb::AddOp and datapath::CompressOp have commutative trait, usually constant is pushed to the last element. I think it's fine to check only the last element.
https://github.com/llvm/llvm-project/blob/3d421d59ad247afeadf5d4f886c9dea14a5eb229/mlir/docs/Canonicalization.md?plain=1#L117-L118
cowardsa
left a comment
There was a problem hiding this comment.
Thanks for this PR - really very clean for a first PR so great job!
Obviously would be great to add FileCheck tests and circt-lec tests
A nice follow-up PR will be to handle the case when you've not got a constant but you've got a variable bit - this will be more difficult because it'll be zero-extended to match the operator's width
| addends.push_back( | ||
| extractBits(rewriter, input)); // Extract bits from each input | ||
| // Special case: compress(a, b, const_1) with exactly 3 inputs where one | ||
| // input is the constant 1. This pattern arises from sub → add(a, ~b, 1). |
| auto constOp = inputs[i].getDefiningOp<hw::ConstantOp>(); | ||
| if (!constOp || !constOp.getValue().isOne()) | ||
| continue; |
There was a problem hiding this comment.
As above constOp should be found as the final argument
| // replace compress + add together with comb.add(a, b, const_1) so the | ||
| // downstream prefix adder can bake carry_in=1 directly into g[0]. | ||
| if (inputs.size() == 3) { | ||
| for (unsigned i = 0; i < 3; ++i) { |
There was a problem hiding this comment.
Can remove this outer loop when we know where the final argument is
This is a draft PR. This is my first PR to CIRCT.
Summary
Subtraction
a - bis lowered toa + ~b + 1. Previously, the constant1was passed as a full-width data input through the Wallace treecompressor, and then a separate parallel prefix adder was used on top, which wastes gates.
This PR fixes the issue in two places:
DatapathToComb: Detects
compress(a, b, const_1)followed by a downstreamcomb.add, and replaces both with a singlecomb.add(a, b, const_1).CombToSynth: Detects a 3-input
comb.addwith aconst_1argument and bakes the carry-in directly intog[0]of the parallel prefixadder, eliminating the extra adder stage.
Results
For a 4-bit subtractor, gate count drops from 55 to 47.
TODO