perf[next-dace]: Enhance LoopBlocking pass#2578
perf[next-dace]: Enhance LoopBlocking pass#2578iomaganaris wants to merge 38 commits intoGridTools:mainfrom
Conversation
philip-paul-mueller
left a comment
There was a problem hiding this comment.
I like the changes but there some things that needs to be addressed.
| "Independent memlets should only be inputs to maps that have a single parameter. " | ||
| "Those should always be neighbor reductions." | ||
| ) | ||
| edge.data.subset = next(iter(original_dst_of_in_edge.params)) |
There was a problem hiding this comment.
It is correct that you have to update subset here, but does does not make much sense to me.
There was a problem hiding this comment.
Not sure I understood your comment. Should I change something?
There was a problem hiding this comment.
I mean original_dst_of_in_edge is a MapEntry and it does not have an attribute called params, the Map has this, i.e. original_dst_of_in_edge.map.params would exists.
Map::params stores the iteration variables (ordered accordingly to this function).
However, now you take the first iteration variable, which at this point should be horizontal dimension.
So I do not understand the logic that is applied here.
There was a problem hiding this comment.
original_dst_of_in_edge is a MapEntry indeed but it has a params attribute.
You're right that next(iter(original_dst_of_in_edge.params)) is wrong. Instead it should be I believe:
edge.data.subset = dace_subsets.Range.from_indices(original_dst_of_in_edge.params)
There was a problem hiding this comment.
Maybe I miss something, but here you modifying the (downstream) Memlets such that instead reading from the global data they read from promoted_anode.
You do this because you know that the read (expressed by the Memlet you promoted) does not depend on K.
However, has far as I understand original_dst_of_in_edge.params contains K?
There was a problem hiding this comment.
original_dst_of_in_edge is the dst of the edge before any changes. In case this is a Map, I have the impression that based on our lowering it can only be a neighbor reduction. In case it's a neighbor reduction it shouldn't have as parameter a vertical or horizontal index. I think in general in our lowering there should be a Map inside another Map that refers to vertical or horizontal indexes. In case there is such map it should be one that was created by the LoopBlocking pass. However we only call this pass once and even if we called it more than once then the inner Map would be dependent and also after the latest changes I added, the edges pointing to such map shouldn't be considered for promotion.
Let me know if the above is right and makes sense
| elif isinstance(original_dst_of_in_edge, dace_nodes.NestedSDFG): | ||
| raise NotImplementedError("Promotion of memlets to NestedSDFG not implemented yet.") | ||
| elif isinstance(original_dst_of_in_edge, dace_nodes.LibraryNode): | ||
| raise NotImplementedError( | ||
| "Promotion of memlets to LibraryNode not implemented yet." | ||
| ) |
There was a problem hiding this comment.
I think you can remove them, since original_dst_of_in_edge is always classified as a dependent node.
There was a problem hiding this comment.
Not sure why I could remove these checks. Couldn't we have a dependent NestedSDFG or LibraryNode?
There was a problem hiding this comment.
I am not fully sure what I meant with it.
But thinking about it now, I think the can_be_applied() should make sure that you do not hit that case.
There was a problem hiding this comment.
I think that in that case we actually have to implement the handling of those cases instead of silently avoid applying the transformation but I haven't seen any of these cases yet so that's why I left it as it is 🙈
| for subset_range in in_edge.data.subset.ranges: | ||
| if subset_range not in independent_outer_map_as_range.ranges: | ||
| new_subset.append(subset_range) |
There was a problem hiding this comment.
This looks a bit brittle, because you look at sizes.
However, currently I do not have a better idea, beside looking at the subset and checking if it contains the blocking parameter, but I am not sure if this is better.
There was a problem hiding this comment.
In case we have parameters in the ranges I think we actually check the parameters and not the sizes. For example:
(Pdb) p in_edge.data.subset.ranges
[(__i0, __i0, 1), (0, 2, 1)]
(Pdb) p independent_outer_map_as_range.ranges
[(__i0, __i0, 1)]
(Pdb) p new_subset
[(0, 2, 1)]
| ) | ||
|
|
||
| if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry): | ||
| for edge in state.out_edges(original_dst_of_in_edge): |
There was a problem hiding this comment.
I did not saw this before, but iterating over the out edges is not enough, as there could be nested Maps.
I think there is something in utils, the reroute or so that can help you do it or at least give you some hints on how to do it.
There was a problem hiding this comment.
If isinstance(edge.dst, dace_nodes.MapEntry) would it help if we called
dace_sdutils.canonicalize_memlet_trees_for_map(state=state, map_node=edge.dst)
dace_propagation.propagate_memlets_map_scope(sdfg, state, edge.dst)
?
There was a problem hiding this comment.
No.
What I meant is that one level of Memlets might not be enough, as they could "continue".
Thus you must use something like here, i.e. you iterate over the Memlet tree.
Ideally you could use reconfigure_dataflow_after_rerouting(), however, that function can only handle simple shifts.
You do not have simple shifts because in line 726 you conditionally add.
What you could do is adding dummy dimensions, i.e. size 0 dimensions.
There was a problem hiding this comment.
I tried traversing the memlet tree as you mentioned. Hopefully that should be enough? 🙈 Even if there some indirection using another AccessNode, based on what I've seen until now, accessing this AccessNode will be done using only local indexes which we don't touch
…contains the blocking parameter
37a6568 to
d2947cc
Compare
philip-paul-mueller
left a comment
There was a problem hiding this comment.
Some additional comments.
| import enum | ||
| import warnings | ||
| from typing import Any, Callable, Optional, Sequence, TypeAlias, Union | ||
| from typing import Any, Callable, List, Optional, Sequence, TypeAlias, Union |
There was a problem hiding this comment.
| from typing import Any, Callable, List, Optional, Sequence, TypeAlias, Union | |
| from typing import Any, Callable, Optional, Sequence, TypeAlias, Union |
| gpu_block_size_3d: Optional[Sequence[int | str] | str] = None, | ||
| gpu_maxnreg: Optional[int] = None, | ||
| blocking_dim: Optional[gtx_common.Dimension] = None, | ||
| blocking_dims: Optional[List[gtx_common.Dimension]] = None, |
There was a problem hiding this comment.
| blocking_dims: Optional[List[gtx_common.Dimension]] = None, | |
| blocking_dims: Optional[Sequence[gtx_common.Dimension]] = None, |
|
|
||
| gpu_map.gpu_block_size = tuple(block_size) | ||
| if self.maxnreg is not None: | ||
| if self.maxnreg is not None and gpu_map.gpu_maxnreg == 0: |
There was a problem hiding this comment.
Just for my curiosity, what is the intention behind this change?
| dtype=str, | ||
| blocking_parameters = dace_properties.Property( | ||
| dtype=list, | ||
| allow_none=True, |
There was a problem hiding this comment.
| allow_none=True, | |
| blocking_parameters = dace_properties.ListProperty( | |
| dtype=str, |
| @@ -52,6 +54,8 @@ class LoopBlocking(dace_transformation.SingleStateTransformation): | |||
| blocking_parameter: On which parameter should we block. | |||
There was a problem hiding this comment.
Needs to be updated as well.
|
|
||
| _ = sdfg.reset_cfg_list() | ||
| dace_sdutils.canonicalize_memlet_trees_for_map(state=state, map_node=outer_map_entry) | ||
| dace_propagation.propagate_memlets_map_scope(sdfg, state, outer_map_entry) |
There was a problem hiding this comment.
I am not sure if Memlet propagation is needed here.
|
|
||
| self._populate_memlet_to_promote(matched_blocking_var, state, outer_map_entry) | ||
| # Below checks are necessary for MyPy | ||
| if self._memlet_to_promote and len(self._memlet_to_promote) == 0: |
There was a problem hiding this comment.
| if self._memlet_to_promote and len(self._memlet_to_promote) == 0: | |
| if len(self._memlet_to_promote) == 0: |
|
|
||
| for in_edge in self._memlet_to_promote: | ||
| if isinstance(in_edge.dst, dace_nodes.AccessNode): | ||
| raise NotImplementedError( |
There was a problem hiding this comment.
This case should be rejected already in the populate_memlet_to_promote() function.
| corresponding_inner_map_out_edges = list( | ||
| state.out_edges_by_connector(in_edge.dst, "OUT_" + in_edge.dst_conn[3:]) | ||
| ) |
There was a problem hiding this comment.
You only have connectors starting with IN_ at Map nodes.
It can even be None.
So doing this operation, dst_conn[3:], unprotected by an if isinstance(node, MapEntry) is most certainly wrong.
This function should probably look like:
inode = add_new_intermediate_storage()
add_edge(outer_map, inode, modified_memlet1())
e = add_edge(inode, original_edge.dst, dcpy(original_edge.data))
for mtree in state.memlet_tree(e).traverse(include_self=True):
apply_correction(mtree.edge)
I think that the best idea would be to copy the logic of MapFusionVertical::compute_reduced_intermediate().
| return | ||
| assert self._memlet_to_promote is not None | ||
|
|
||
| for in_edge in self._memlet_to_promote: |
There was a problem hiding this comment.
I took the liberty and tried to come up with something different: iomaganaris#2
Feel free to modify.
Added the following features to the
LoopBlockingpass:Memlets that have independent data are promoted in the outerMapso that they only have to be read onceblocking_independent_node_threshold.LoopBlockingis only going to be applied toMaps that have more than the threshold number independent variables