Conversation
When an endpoint initiates a wide multicast DMA transfer from another endpoint to itself (and possibly other endpoints), the following deadlock occurs. The DMA issues an AR, causing a read burst to come back from the router to the initiator endpoint. When the first read beats arrive, the DMA issues a write burst. This write burst loops back to the initiator endpoint, and may take control of the physical link. As writes lock the link for the entire burst, but the burst cannot complete as it needs the stalling read beats to feed the write burst (a DMA requirement), there is a deadlock. Note, this can happen also without multicast, so long as the system uses loopback. hw: Add spill registers for VCs in chimney hw: Fix reduction support with VC for read and write (working) hw: Merging collective configuration parameters hw: Initial setup for PnR experiments hw: Add internal offload cuts hw: Fix parametrization multicast hw: Fix parametrization adding micro collective ops pnr: Adapt synth wrapper to script fixes to script synthesis runs hw: Create two physical channels on wide_in interface of eject port hw: Support outstanding barriers with overlapping inputs hw: Fix `floo_reduction_sync` hw: Add comment to add assertion in `floo_route_xymask` synt: Fix synth wrapper and include chimney hw: Clean reduction_sync hw: Re-implement reduction simple controller (reduction_unit) synth: Adapt supprot for chimney with VC
d19db12 to
e8d8e3e
Compare
d123d89 to
d3fb3f4
Compare
d3fb3f4 to
8d19d92
Compare
fischeti
left a comment
There was a problem hiding this comment.
I had a first look and have a first round of feedback. In general it looks quite clean. Good job at cleaning this up and making it ready!
So far, I mainly looked at the top-level modules (routers, chimney), and I don't see any bigger issues there. I would just make sure that the default configuration is not affected. This seems to be true mostly with proper parametrization, but sometimes it is a bit hard to see in the Github diff view.
There was a problem hiding this comment.
Is this file intentionally tracked?
| version: 0.39.9 | ||
| source: | ||
| Git: https://github.com/pulp-platform/axi.git | ||
| Git: https://github.com/Lura518/axi.git |
| common_verification: 0.2.3 | ||
| idma: {version: 0.6.2, upstream_name: iDMA} | ||
| floo_noc_pd: {path: ./pd, target: "floo_synth"} | ||
| FPnew: { git: "https://github.com/pulp-platform/cvfpu.git", rev: pulp-v0.1.3 } |
There was a problem hiding this comment.
Use the newest version pulp-v0.2.3, which is also used in snitch_cluster and cheshire afaik.
|
|
||
| PD_REMOTE ?= git@iis-git.ee.ethz.ch:axi-noc/floo_noc_pd.git | ||
| PD_BRANCH ?= master | ||
| PD_BRANCH ?= feature/reduction |
There was a problem hiding this comment.
Is there a Gitlab MR already open?
| EnNarrowMulticast : 1'b0, | ||
| EnWideMulticast : 1'b0, | ||
| EnLSBAnd : 1'b0, | ||
| EnF_Add : 1'b0, | ||
| EnF_Mul : 1'b0, | ||
| EnF_Min : 1'b0, | ||
| EnF_Max : 1'b0, | ||
| EnA_Add : 1'b0, | ||
| EnA_Mul : 1'b0, | ||
| EnA_Min_S : 1'b0, | ||
| EnA_Min_U : 1'b0, | ||
| EnA_Max_S : 1'b0, | ||
| EnA_Max_U : 1'b0 |
There was a problem hiding this comment.
'{default: '0} would be a bit shorter here.
| /// Parameter to define which type of collective operation support | ||
| parameter floo_pkg::collect_op_fe_cfg_t CollectiveOpCfg = floo_pkg::CollectiveOpDefaultCfg, | ||
| /// Parameter for the wide reduction configuration | ||
| parameter floo_pkg::reduction_cfg_t RdWideCfg = floo_pkg::ReductionDefaultCfg, | ||
| /// Parameter for the narrow reduction configuration | ||
| parameter floo_pkg::reduction_cfg_t RdNarrowCfg = floo_pkg::ReductionDefaultCfg |
There was a problem hiding this comment.
nit: Move up as well here
| @@ -11,46 +12,58 @@ | |||
| /// Wrapper of a multi-link router for narrow and wide links | |||
| module floo_nw_router #( | |||
There was a problem hiding this comment.
You can also import floo_pkg globally in this module since it is used a lot.
| user_aw = axi_narrow_out_req_o.aw.user; | ||
| user_aw.collective_mask = '0; | ||
| user_aw.collective_op = Unicast; | ||
| axi_narrow_out_req_o.aw.user = user_aw; |
There was a problem hiding this comment.
| user_aw = axi_narrow_out_req_o.aw.user; | |
| user_aw.collective_mask = '0; | |
| user_aw.collective_op = Unicast; | |
| axi_narrow_out_req_o.aw.user = user_aw; | |
| axi_narrow_out_req_o.aw.user.collective_mask = '0; | |
| axi_narrow_out_req_o.aw.user.collective_op = Unicast; |
Wouldn't that be equivalent? or are there some type casts that I am missing?
| assign collective_mask = '0; | ||
| end | ||
|
|
||
| // Because the W doesn't have a user field it is required to store the AW user field |
There was a problem hiding this comment.
but W does have a use field, no?
| if (is_en_narrow_reduction(CollectOpCfg)) begin | ||
| if (is_reduction_op(red_coll_operation[NarrowAw])) begin |
There was a problem hiding this comment.
you can combine the two if statements
Reduction
This PR introduces reduction support in the FlooNoC architecture. The reduction functionality is integrated with the previously supported multicast feature.
🔎 Feature Overview:
Reduction support can be divided into two different cases:
CollectBfeature. In particular, synchronization is enabled by theLsbAndoperation. This feature is typically supported in the narrow router.To perform arithmetic operations, the router exposes wide and narrow offload ports towards a functional unit. Floating-point operations can leverage the Direct Compute Access (DCA) logic to reuse existing functional units in the target SoC.
⚙️ Configuration
To simplify the top-level configuration, all previously separate collective-related parameters have been merged into the
CollectiveCfgstructure.At the top level, the user only needs to select which macro operations should be enabled (
collect_op_fe_cfg_t). Internally, these operations are mapped into micro operations, which determine which features must be enabled at a finer granularity.For example:
CollectB).This fine-grained configuration is hidden from the user through the use of frontend operations (
collect_op_fe_cfg_t), which are internally remapped to backend operations (collect_op_be_cfg_t).For an example of how to integrate and enable the different operations, see the picobello SoC.
➕ Additions
floo_pkg.sv:CollectiveCfgto the route configuration structure.OpCfgselects which macro operations are supported.RedCfgspecifies how many pipeline stages the reduction unit should implement.floo_reduction_unit.sv: Implements the main logic responsible for handling and offloading sequential reductions.floo_alu.sv: An ALU used for narrow sequential reductions. Ideally, the IPU from the snitch cluster could be reused, but this would require adding Snitch as a dependency of FlooNoC.➖ Removed
These features are now automatically enabled based on the selected operations.
With the introduction of reduction support, all collective operations (multicast, synchronization, and reduction) can now be executed entirely within the NoC, provided that the loopback feature is enabled. This ensures that the full collective operation is handled exclusively within the NoC system.
This was not the case in the multicast implementation supported in v0.7.0, which relied on the multicast capabilities of the endpoint interconnect.
TODO
main