Skip to content

feat[next-dace]: Use SDFG library node for lowering of broadcast and reduce#2386

Draft
edopao wants to merge 76 commits intoGridTools:mainfrom
edopao:dace-fill_node
Draft

feat[next-dace]: Use SDFG library node for lowering of broadcast and reduce#2386
edopao wants to merge 76 commits intoGridTools:mainfrom
edopao:dace-fill_node

Conversation

@edopao
Copy link
Copy Markdown
Contributor

@edopao edopao commented Nov 11, 2025

TODO:

Copy link
Copy Markdown
Contributor

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some refinements needed.

Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated


@dace_library.node
class Fill(dace_nodes.LibraryNode):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some more semantic, i.e. an input connector, that collects the value that should be broadcasted and an output connector for the output.

I am also wondering if it would make sense to have two different library nodes.
One where the value that is broadcast is a literal, like 0.0 and one, which is probably the current one, where the value is read from another data descriptor (might be hard to integrate into the lowering).

Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated
@philip-paul-mueller
Copy link
Copy Markdown
Contributor

@edopao
I am not sure if we should add the transformations we need already in this PR or in a later one.
If we put it in a later one, we should patch the optimizer to expand the node right at the beginning, this way we preserve the current behaviour and performance.

Comment thread src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg_primitives.py Outdated
@edopao
Copy link
Copy Markdown
Contributor Author

edopao commented Nov 24, 2025

cscs-ci run default

@edopao
Copy link
Copy Markdown
Contributor Author

edopao commented Nov 24, 2025

cscs-ci run default

@edopao
Copy link
Copy Markdown
Contributor Author

edopao commented Nov 24, 2025

cscs-ci run default

@edopao
Copy link
Copy Markdown
Contributor Author

edopao commented Dec 10, 2025

No plan for now to integrate this feature.

philip-paul-mueller and others added 27 commits May 4, 2026 11:33
…o_sdfg_primitives.py

Co-authored-by: Edoardo Paone <edoardo16@gmail.com>
Copy link
Copy Markdown
Contributor Author

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, just some minor comments.

library as dace_library,
nodes as dace_nodes,
properties as dace_properties,
subsets as dace_sbs,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the transformation module, we use the dace_sbs alias, in the lowering module we use dace_subsets. It's OK to use dace_sbs in this module, but let's try to keep it consistent.

```python
for i in range(len(broadcast_in_dim):
assert output.shape[broadcast_in_dim[i]] == value_to_broadcast.shape[i]
```
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```
In other words, the result array shape has the same size as the broadcast domain.

```

Args:
broadcast_in_dim: How to broadcast.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
broadcast_in_dim: How to broadcast.
broadcast_in_dim: How to broadcast, see the class documentation.


Args:
broadcast_in_dim: How to broadcast.
params: The parameters that should be used for the expansion. If given one
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
params: The parameters that should be used for the expansion. If given one
params: The parameters that should be used for the expansion. If given, one


Todo:
- While for the output it is probably okay to always require an adjacent
AccessNode for the input it might be possible to be on the other side
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AccessNode for the input it might be possible to be on the other side
AccessNode, the input nodes might be outside a map scope.

However, I don't understand how this could happen.

# A fundamental requirement is that `bcast_result` is only generated by us.
# ADR-18 guarantees us this if it is transient and has a single producer,
# `bcast_node`. However, since we will remove `bcast_result`, we have to
# make sure that it is not used every where else.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# make sure that it is not used every where else.
# make sure that it is not used anywhere else.


match consumer := consumer_edge.dst:
case dace_nodes.AccessNode():
# TODO(phimuell): Are there more checks needed.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing this todo comment before merge, unless there are known cases.

Suggested change
# TODO(phimuell): Are there more checks needed.

# Check single use data if it was not known at the beginning.
if self._single_use_data is None:
find_single_use_data = dace_analysis.FindSingleUseData()
single_use_data = find_single_use_data.apply_pass(sdfg, None)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be wrong to now store single_use_data? I am asking because it is used again inside apply().

Comment on lines +255 to +257
# We need new transformations in order to deal with GTIR library nodes.
# For now, we simply expand these nodes before starting optimizing.
# TODO: Remove once transformations are ready.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not calling ScalarBrodcastInliner before expanding?

# probably yes, as we can remove the read and write of the initial data
# only the write to final destination is left. If the consumers are Maps
# the thing is a bit different. As we have to create the intermediate
# allocation. If the read of the memory is okay the `InlineBroadcastAccess`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InlineBroadcastAccess does not exist yet.

Copy link
Copy Markdown
Contributor Author

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, just some minor comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants