Skip to content

hw: Add Reduction Feature#163

Open
Lore0599 wants to merge 9 commits intomainfrom
reduction-vc-rebase
Open

hw: Add Reduction Feature#163
Lore0599 wants to merge 9 commits intomainfrom
reduction-vc-rebase

Conversation

@Lore0599
Copy link
Contributor

@Lore0599 Lore0599 commented Mar 3, 2026

Reduction

This PR introduces reduction support in the FlooNoC architecture. The reduction functionality is integrated with the previously supported multicast feature.

🔎 Feature Overview:

Reduction support can be divided into two different cases:

  • Parallel: Parallel reduction is used for synchronization mechanisms and leverages the previously supported CollectB feature. In particular, synchronization is enabled by the LsbAnd operation. This feature is typically supported in the narrow router.
  • Sequential: To support scalar or floating-point reductions, a sequential reduction mechanism has been added. This is supported in both wide and narrow routers.
    To perform arithmetic operations, the router exposes wide and narrow offload ports towards a functional unit. Floating-point operations can leverage the Direct Compute Access (DCA) logic to reuse existing functional units in the target SoC.

⚙️ Configuration

To simplify the top-level configuration, all previously separate collective-related parameters have been merged into theCollectiveCfg structure.
At the top level, the user only needs to select which macro operations should be enabled (collect_op_fe_cfg_t). Internally, these operations are mapped into micro operations, which determine which features must be enabled at a finer granularity.
For example:

  • Multicast support requires both multicast functionality in the router and a primitive parallel reduction mechanism to collect incoming responses (CollectB).
  • Synchronization requires parallel reduction support for the requests and multicast support for the responses.

This fine-grained configuration is hidden from the user through the use of frontend operations (collect_op_fe_cfg_t), which are internally remapped to backend operations (collect_op_be_cfg_t).

For an example of how to integrate and enable the different operations, see the picobello SoC.

➕ Additions

  • floo_pkg.sv:
    • Adds CollectiveCfgto the route configuration structure.
    • OpCfg selects which macro operations are supported.
    • RedCfg specifies how many pipeline stages the reduction unit should implement.
    • Includes helper functions to abstract configuration complexity.
  • floo_reduction_unit.sv: Implements the main logic responsible for handling and offloading sequential reductions.
  • floo_alu.sv: An ALU used for narrow sequential reductions. Ideally, the IPU from the snitch cluster could be reused, but this would require adding Snitch as a dependency of FlooNoC.
  • floo_reduction_arbiter.sv/floo_output_arbiter.sv: Refactors the router and arbiter logic.
  • reduction/typedef.svh: Adds typedef definitions for the reduction offload interface.

➖ Removed

  • Removed parameters used to enable multicast, parallel reduction, and sequential reduction.
    These features are now automatically enabled based on the selected operations.

⚠️ IMPORTANT

With the introduction of reduction support, all collective operations (multicast, synchronization, and reduction) can now be executed entirely within the NoC, provided that the loopback feature is enabled. This ensures that the full collective operation is handled exclusively within the NoC system.
This was not the case in the multicast implementation supported in v0.7.0, which relied on the multicast capabilities of the endpoint interconnect.

TODO

  • Rebase on top of main
  • Check FlooNoC CI
  • Remove or adapt the reduction TB
  • Split the synthesis wrapper contribution

@Lore0599 Lore0599 changed the title Add Reduction Feature hw: Add Reduction Feature Mar 3, 2026
Lore0599 and others added 8 commits March 6, 2026 10:40
When an endpoint initiates a wide multicast DMA transfer from another
endpoint to itself (and possibly other endpoints), the following
deadlock occurs. The DMA issues an AR, causing a read burst to come
back from the router to the initiator endpoint. When the first read
beats arrive, the DMA issues a write burst. This write burst loops
back to the initiator endpoint, and may take control of the physical
link. As writes lock the link for the entire burst, but the burst
cannot complete as it needs the stalling read beats to feed the write
burst (a DMA requirement), there is a deadlock. Note, this can happen
also without multicast, so long as the system uses loopback.

hw: Add spill registers for VCs in chimney

hw: Fix reduction support with VC for read and write (working)

hw: Merging collective configuration parameters

hw: Initial setup for PnR experiments

hw: Add internal offload cuts

hw: Fix parametrization multicast

hw: Fix parametrization adding micro collective ops

pnr: Adapt synth wrapper to script

fixes to script synthesis runs

hw: Create two physical channels on wide_in interface of eject port

hw: Support outstanding barriers with overlapping inputs

hw: Fix `floo_reduction_sync`

hw: Add comment to add assertion in `floo_route_xymask`

synt: Fix synth wrapper and include chimney

hw: Clean reduction_sync

hw: Re-implement reduction simple controller (reduction_unit)

synth: Adapt supprot for chimney with VC
@Lore0599 Lore0599 force-pushed the reduction-vc-rebase branch from d19db12 to e8d8e3e Compare March 6, 2026 09:43
@Lore0599 Lore0599 marked this pull request as ready for review March 6, 2026 11:00
@Lore0599 Lore0599 force-pushed the reduction-vc-rebase branch from d123d89 to d3fb3f4 Compare March 6, 2026 12:27
@Lore0599 Lore0599 force-pushed the reduction-vc-rebase branch from d3fb3f4 to 8d19d92 Compare March 6, 2026 14:01
Copy link
Collaborator

@fischeti fischeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first look and have a first round of feedback. In general it looks quite clean. Good job at cleaning this up and making it ready!

So far, I mainly looked at the top-level modules (routers, chimney), and I don't see any bigger issues there. I would just make sure that the default configuration is not affected. This seems to be true mostly with proper parametrization, but sometimes it is a bit hard to see in the Github diff view.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file intentionally tracked?

version: 0.39.9
source:
Git: https://github.com/pulp-platform/axi.git
Git: https://github.com/Lura518/axi.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

common_verification: 0.2.3
idma: {version: 0.6.2, upstream_name: iDMA}
floo_noc_pd: {path: ./pd, target: "floo_synth"}
FPnew: { git: "https://github.com/pulp-platform/cvfpu.git", rev: pulp-v0.1.3 }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the newest version pulp-v0.2.3, which is also used in snitch_cluster and cheshire afaik.


PD_REMOTE ?= git@iis-git.ee.ethz.ch:axi-noc/floo_noc_pd.git
PD_BRANCH ?= master
PD_BRANCH ?= feature/reduction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a Gitlab MR already open?

Comment on lines +313 to +325
EnNarrowMulticast : 1'b0,
EnWideMulticast : 1'b0,
EnLSBAnd : 1'b0,
EnF_Add : 1'b0,
EnF_Mul : 1'b0,
EnF_Min : 1'b0,
EnF_Max : 1'b0,
EnA_Add : 1'b0,
EnA_Mul : 1'b0,
EnA_Min_S : 1'b0,
EnA_Min_U : 1'b0,
EnA_Max_S : 1'b0,
EnA_Max_U : 1'b0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'{default: '0} would be a bit shorter here.

Comment on lines +61 to +66
/// Parameter to define which type of collective operation support
parameter floo_pkg::collect_op_fe_cfg_t CollectiveOpCfg = floo_pkg::CollectiveOpDefaultCfg,
/// Parameter for the wide reduction configuration
parameter floo_pkg::reduction_cfg_t RdWideCfg = floo_pkg::ReductionDefaultCfg,
/// Parameter for the narrow reduction configuration
parameter floo_pkg::reduction_cfg_t RdNarrowCfg = floo_pkg::ReductionDefaultCfg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move up as well here

@@ -11,46 +12,58 @@
/// Wrapper of a multi-link router for narrow and wide links
module floo_nw_router #(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also import floo_pkg globally in this module since it is used a lot.

Comment on lines +656 to +659
user_aw = axi_narrow_out_req_o.aw.user;
user_aw.collective_mask = '0;
user_aw.collective_op = Unicast;
axi_narrow_out_req_o.aw.user = user_aw;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
user_aw = axi_narrow_out_req_o.aw.user;
user_aw.collective_mask = '0;
user_aw.collective_op = Unicast;
axi_narrow_out_req_o.aw.user = user_aw;
axi_narrow_out_req_o.aw.user.collective_mask = '0;
axi_narrow_out_req_o.aw.user.collective_op = Unicast;

Wouldn't that be equivalent? or are there some type casts that I am missing?

assign collective_mask = '0;
end

// Because the W doesn't have a user field it is required to store the AW user field
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but W does have a use field, no?

Comment on lines +1124 to +1125
if (is_en_narrow_reduction(CollectOpCfg)) begin
if (is_reduction_op(red_coll_operation[NarrowAw])) begin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can combine the two if statements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants