Skip to content

perf[next-dace]: Enhance LoopBlocking pass#2578

Open
iomaganaris wants to merge 38 commits intoGridTools:mainfrom
iomaganaris:extend_loopblocking
Open

perf[next-dace]: Enhance LoopBlocking pass#2578
iomaganaris wants to merge 38 commits intoGridTools:mainfrom
iomaganaris:extend_loopblocking

Conversation

@iomaganaris
Copy link
Copy Markdown
Contributor

Added the following features to the LoopBlocking pass:

  • Memlets that have independent data are promoted in the outer Map so that they only have to be read once
  • Added option blocking_independent_node_threshold. LoopBlocking is only going to be applied to Maps that have more than the threshold number independent variables

Copy link
Copy Markdown
Contributor

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes but there some things that needs to be addressed.

Comment thread src/gt4py/next/program_processors/runners/dace/transformations/gpu_utils.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/transformations/loop_blocking.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/transformations/loop_blocking.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/transformations/loop_blocking.py Outdated
"Independent memlets should only be inputs to maps that have a single parameter. "
"Those should always be neighbor reductions."
)
edge.data.subset = next(iter(original_dst_of_in_edge.params))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct that you have to update subset here, but does does not make much sense to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood your comment. Should I change something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean original_dst_of_in_edge is a MapEntry and it does not have an attribute called params, the Map has this, i.e. original_dst_of_in_edge.map.params would exists.
Map::params stores the iteration variables (ordered accordingly to this function).
However, now you take the first iteration variable, which at this point should be horizontal dimension.
So I do not understand the logic that is applied here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original_dst_of_in_edge is a MapEntry indeed but it has a params attribute.
You're right that next(iter(original_dst_of_in_edge.params)) is wrong. Instead it should be I believe:

edge.data.subset = dace_subsets.Range.from_indices(original_dst_of_in_edge.params)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I miss something, but here you modifying the (downstream) Memlets such that instead reading from the global data they read from promoted_anode.
You do this because you know that the read (expressed by the Memlet you promoted) does not depend on K.
However, has far as I understand original_dst_of_in_edge.params contains K?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original_dst_of_in_edge is the dst of the edge before any changes. In case this is a Map, I have the impression that based on our lowering it can only be a neighbor reduction. In case it's a neighbor reduction it shouldn't have as parameter a vertical or horizontal index. I think in general in our lowering there should be a Map inside another Map that refers to vertical or horizontal indexes. In case there is such map it should be one that was created by the LoopBlocking pass. However we only call this pass once and even if we called it more than once then the inner Map would be dependent and also after the latest changes I added, the edges pointing to such map shouldn't be considered for promotion.
Let me know if the above is right and makes sense

Comment thread src/gt4py/next/program_processors/runners/dace/transformations/loop_blocking.py Outdated
Comment on lines +776 to +781
elif isinstance(original_dst_of_in_edge, dace_nodes.NestedSDFG):
raise NotImplementedError("Promotion of memlets to NestedSDFG not implemented yet.")
elif isinstance(original_dst_of_in_edge, dace_nodes.LibraryNode):
raise NotImplementedError(
"Promotion of memlets to LibraryNode not implemented yet."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove them, since original_dst_of_in_edge is always classified as a dependent node.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I could remove these checks. Couldn't we have a dependent NestedSDFG or LibraryNode?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure what I meant with it.
But thinking about it now, I think the can_be_applied() should make sure that you do not hit that case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in that case we actually have to implement the handling of those cases instead of silently avoid applying the transformation but I haven't seen any of these cases yet so that's why I left it as it is 🙈

Comment on lines +725 to +727
for subset_range in in_edge.data.subset.ranges:
if subset_range not in independent_outer_map_as_range.ranges:
new_subset.append(subset_range)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit brittle, because you look at sizes.
However, currently I do not have a better idea, beside looking at the subset and checking if it contains the blocking parameter, but I am not sure if this is better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we have parameters in the ranges I think we actually check the parameters and not the sizes. For example:

(Pdb) p in_edge.data.subset.ranges
[(__i0, __i0, 1), (0, 2, 1)]
(Pdb) p independent_outer_map_as_range.ranges
[(__i0, __i0, 1)]
(Pdb) p new_subset
[(0, 2, 1)]

)

if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry):
for edge in state.out_edges(original_dst_of_in_edge):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not saw this before, but iterating over the out edges is not enough, as there could be nested Maps.
I think there is something in utils, the reroute or so that can help you do it or at least give you some hints on how to do it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If isinstance(edge.dst, dace_nodes.MapEntry) would it help if we called

dace_sdutils.canonicalize_memlet_trees_for_map(state=state, map_node=edge.dst)
        dace_propagation.propagate_memlets_map_scope(sdfg, state, edge.dst)

?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
What I meant is that one level of Memlets might not be enough, as they could "continue".
Thus you must use something like here, i.e. you iterate over the Memlet tree.
Ideally you could use reconfigure_dataflow_after_rerouting(), however, that function can only handle simple shifts.
You do not have simple shifts because in line 726 you conditionally add.
What you could do is adding dummy dimensions, i.e. size 0 dimensions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried traversing the memlet tree as you mentioned. Hopefully that should be enough? 🙈 Even if there some indirection using another AccessNode, based on what I've seen until now, accessing this AccessNode will be done using only local indexes which we don't touch

Copy link
Copy Markdown
Contributor

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments.

import enum
import warnings
from typing import Any, Callable, Optional, Sequence, TypeAlias, Union
from typing import Any, Callable, List, Optional, Sequence, TypeAlias, Union
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from typing import Any, Callable, List, Optional, Sequence, TypeAlias, Union
from typing import Any, Callable, Optional, Sequence, TypeAlias, Union

gpu_block_size_3d: Optional[Sequence[int | str] | str] = None,
gpu_maxnreg: Optional[int] = None,
blocking_dim: Optional[gtx_common.Dimension] = None,
blocking_dims: Optional[List[gtx_common.Dimension]] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
blocking_dims: Optional[List[gtx_common.Dimension]] = None,
blocking_dims: Optional[Sequence[gtx_common.Dimension]] = None,


gpu_map.gpu_block_size = tuple(block_size)
if self.maxnreg is not None:
if self.maxnreg is not None and gpu_map.gpu_maxnreg == 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my curiosity, what is the intention behind this change?

dtype=str,
blocking_parameters = dace_properties.Property(
dtype=list,
allow_none=True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allow_none=True,
blocking_parameters = dace_properties.ListProperty(
dtype=str,

@@ -52,6 +54,8 @@ class LoopBlocking(dace_transformation.SingleStateTransformation):
blocking_parameter: On which parameter should we block.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be updated as well.


_ = sdfg.reset_cfg_list()
dace_sdutils.canonicalize_memlet_trees_for_map(state=state, map_node=outer_map_entry)
dace_propagation.propagate_memlets_map_scope(sdfg, state, outer_map_entry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if Memlet propagation is needed here.


self._populate_memlet_to_promote(matched_blocking_var, state, outer_map_entry)
# Below checks are necessary for MyPy
if self._memlet_to_promote and len(self._memlet_to_promote) == 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._memlet_to_promote and len(self._memlet_to_promote) == 0:
if len(self._memlet_to_promote) == 0:


for in_edge in self._memlet_to_promote:
if isinstance(in_edge.dst, dace_nodes.AccessNode):
raise NotImplementedError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should be rejected already in the populate_memlet_to_promote() function.

Comment on lines +798 to +800
corresponding_inner_map_out_edges = list(
state.out_edges_by_connector(in_edge.dst, "OUT_" + in_edge.dst_conn[3:])
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only have connectors starting with IN_ at Map nodes.
It can even be None.
So doing this operation, dst_conn[3:], unprotected by an if isinstance(node, MapEntry) is most certainly wrong.

This function should probably look like:

inode = add_new_intermediate_storage()
add_edge(outer_map, inode, modified_memlet1())
e = add_edge(inode, original_edge.dst, dcpy(original_edge.data))
for mtree in state.memlet_tree(e).traverse(include_self=True):
    apply_correction(mtree.edge)

I think that the best idea would be to copy the logic of MapFusionVertical::compute_reduced_intermediate().

return
assert self._memlet_to_promote is not None

for in_edge in self._memlet_to_promote:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty and tried to come up with something different: iomaganaris#2
Feel free to modify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants