fix[next]: Deterministic dace auto optimize#2568
Conversation
philip-paul-mueller
left a comment
There was a problem hiding this comment.
Generally looks okay, there are some improvements possible.
| ``` | ||
| TODO: introduce new config var that prints the hash instead of hard-coding it. | ||
| - Execute the program in question twice and compare the output. | ||
| - Set a conditonal breakpoint in beginning of the `apply` method of the first pass where the SDFG |
There was a problem hiding this comment.
apply() is something that is specific to the PatternTransformation.
Not all transformations use them (but the majority does).
The most general interface is given by Pass, whcih defines the apply_pass() method.
But as mentioned above, also functions can be transformations.
| with open(file) as f: | ||
| data = json.load(f) | ||
| sdfg = dace.SDFG.from_json(data) | ||
| print_sdfg_hash(sdfg) |
There was a problem hiding this comment.
I looked at this file and it looks okay, although the O(N) deletion is not nice.
One could rework the algorithm to get rid of them, however I would not do this.
There is PR#2531 which updates this file and also tries to avoid in determinism.
While it still uses sets it sorts the nodes ones in a deterministic way (depending on the node order upon input).
| first_map_params = set(first_map.params) | ||
| second_map_params = set(second_map.params) | ||
| # TODO(tehrengruber): The structure here looks a little funky. We just use an ordered set for | ||
| # now, but likely no sets are needed at all. | ||
| first_map_params = OrderedSet(first_map.params) | ||
| second_map_params = OrderedSet(second_map.params) | ||
| if first_map_params != second_map_params: | ||
| return None |
There was a problem hiding this comment.
sets are needed here, at least for the check that follows because a Map with parameters ["i", "j"] is the same as one with parameters ["j", "i"].
However, you can do something like:
first_map_params = sorted(first_map.params)
second_map_params = sorted(second_map.params)then you can also remove the sorted() calls bellow as well.
| # Now we will reroute the edges went through the inner map, through the | ||
| # inner access node instead. | ||
| for old_inner_edge in list( | ||
| for old_inner_edge in list( # TODO(tehrengruber): Why all these list comprehensions everywhere? |
There was a problem hiding this comment.
It is needed because you replace the edges (removing the old and adding a new one).
Furthermore, the by_connector() gives you back an iterator thus modyfing the edges will change your iteration source, which leads to a race condition.
|
|
||
| - Enable printing each transformation step, e.g. using | ||
| ``` | ||
| dace.Config.set("progress", value=True) |
There was a problem hiding this comment.
This will only give you like ~95% of the cases, as transformation can also run through other means that the patter matcher or can be simple functions that do things.
There was a problem hiding this comment.
In order to avoid editing the code and adding this line, you can export an environment variable:
export DACE_progress=1
(note the the upper/lower case unfortunately matters)
|
|
||
| sdfg.apply_transformations_repeated( | ||
| transformation( | ||
| ignore_upstream_blocks=False, |
There was a problem hiding this comment.
This should be True for the correct working.
| sdfg=copy.deepcopy(node.sdfg), | ||
| inputs=set(node.in_connectors.keys()), | ||
| outputs=set(node.out_connectors.keys()), | ||
| # TODO(tehrengruber): What is the performance optimization from Philip about? |
There was a problem hiding this comment.
What do you mean with my performance optimization?
What cold be a problem is the copying node.sdfg might also copy the surrounding SDFG, because nested SDFGs have a reference to their parent SDFG.
| inputs={k: None for k in node.in_connectors.keys()}, | ||
| outputs={k: None for k in node.out_connectors.keys()}, |
There was a problem hiding this comment.
This should be equivelent since the data types should not change.
| inputs={k: None for k in node.in_connectors.keys()}, | |
| outputs={k: None for k in node.out_connectors.keys()}, | |
| inputs=node.in_connectors.copy(). | |
| outputs=node.out_connectors.copy(), |
It you want to play it safe ignore this suggestion.
| **kwargs: Any, | ||
| ) -> dace.SDFG: | ||
| with gtx_wfdcommon.dace_context(device_type=self.device_type): | ||
| with gtx_wfdcommon.dace_context(device_type=self.device_type), dace.sdfg.nodes.reset_node_id_counter(): |
There was a problem hiding this comment.
I kind of understand why it is used here, although I do not like it.
| # NOTE: Each thread maintains its own set of configuration, i.e. `dace.Config` is | ||
| # a thread local variable. This means it is safe to set values that are different | ||
| # for each thread. | ||
| dace.Config.set("progress", value=True) |
There was a problem hiding this comment.
This is not meant to be merged, right?
|
|
||
| - Enable printing each transformation step, e.g. using | ||
| ``` | ||
| dace.Config.set("progress", value=True) |
There was a problem hiding this comment.
In order to avoid editing the code and adding this line, you can export an environment variable:
export DACE_progress=1
(note the the upper/lower case unfortunately matters)
This PR is a first vertical slice to make
auto_optimizedeterministic. The general idea is to use ordered datastructures everywhere where order matters (e.g. when iterating over elements of a set) and provide an easy to follow workflow to debug and resolve indeterminism (see here). In many places this just means replacingsetwithOrderedSetfrom https://github.com/rspeer/ordered-set and using deterministic naming schemes. In addition to the changes here GridTools/dace#17 is needed, which also introduces anidproperty in all dace nodes. Theidproperty is computed from a deterministic counter incremented whenever a new node is constructed. Together with thereset_node_id_countercontext manager each node gets a unique id, which is stable across runs. This was initially meant as a way to order output from networkx algorithms used in dace that return sets, but surprisingly this was not needed for the stencil used to test this PR. Since this can be different for other stencils and since the id value is also very useful to quickly recognize that order has changed this is kept for now.Open discussion points:
OrderedSetthe right package?The program used to test this is PR:
icon4py @ 0918c3d with
model/atmosphere/dycore/tests/dycore/stencil_tests/test_compute_advection_in_vertical_momentum_equation.py::TestFusedVelocityAdvectionStencilVMomentum::test_TestFusedVelocityAdvectionStencilVMomentum[compile_time_domain]