Skip to content

Conversation

yfguo
Copy link
Contributor

@yfguo yfguo commented Sep 2, 2025

Pull Request Description

JSON format

Creating a new JSON format as part of the collective selection refactoring. The focus on this new format is correctness, completeness and validation.

There are a few rules that the JSON file must met (and will be validated):

  1. The JSON file must be complete that all decision paths lead to algorithms. There cannot be a path that results in an empty node, which would require calling "fallback" function at runtime.
  2. The decision path that leads to an algorithm must check all the requirement/restrictions of the algorithm.
  3. The JSON file can only contains conditions and algorithms that are defined in the given MPICH version.

The top level object in JSON will be:

{
    "version": "5.0",
    "bcast": {},
    ......
}

The version define the compatibility of the JSON file. Version 5.0 means compatible with MPICH 5.0 and afterward.

The binary conditions in JSON will be like this:

{
    "condition": {
        "true": {},
        "false": {}
    }
}

This will cover all the binary checks such as pof2? is_commutative?, is_sendbuf_inplace?, is_intracomm?, is_builtin_op?. None of the branches can be empty.

The range conditions in JSON will be like this:

{
    "comm_size": {
        "threshold_1": {},
        "threshold_2": {},
        "max": {}
    }
}

This will cover checks for comm_size, avg_msg_size, etc. All the thresholds are checked with less-or-equal operation. The "max" branch is required.

The algorithm in JSON will be like this:

{
    "algorithm": {
        "name": "MPIR_Bcast_intra_smp",
        "param_name": "param_value",
    }
}

It contains the name and the required parameters for an algorithm. The object for composition will be in similar format.

The tree can have the binary condition nodes and range condition nodes organized in any order.

JSON tools

[WIP] There is a new tool for manipulating the JSON file which would help ensuring the JSON file meet all the requirements.
[WIP] There is a script for converting existing JSON files to the new format.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

yfguo added 11 commits September 1, 2025 17:04
Creating CSEL constants array for the string name of collective
and comm hierarchy. These string values will be used during parsing
of the JSON file, and printing of the CSEL tree node.

Separating the implementation details CSEL tree printing function for
the ease of maintenance.
Consolidate the POSIX coll algorithm enum definition under MPII. The
JSON parsing no longer need separate functions for them.
Consolidate the CH4 coll algorithm enum definition under MPII. The
JSON parsing no longer need separate functions for them.
Consolidate the OFI coll algorithm enum definition under MPII. The
JSON parsing no longer need separate functions for them.
Creating internal function for creating, freeing and updating CSEL
tree node. They are used to manipulating tree struction in future
CSEL optimizations.
MPIR_CVAR_COLLECTIVE_SELECTION_REPORT controls how MPICH show the
collective selection logic during init. It is turned off by default.
The user can choose to print the CSEL in tree format or summary
format (later commit).
The ANY node is needed in CSEL as the catch-all condition logically.
Keeping the actual ANY node in the tree add additional pointer
deference with no benefits. This commit will squash all ANY nodes
during CSEL tree inititalization.
The LT conditions are internally converted to less-than-or-equal (LE).
The LT conditions are redundant since it can always be represented in
LE conditions. Having both of them complicates the logical and can create
holes in matching ranges. It is deprecated in this PR, and only kept
for backward compatibility.
Merging multple LE condition nodes into a single node for matching.
LE conditions are stored in a sorted range-set which enables binary
search when matches a value against these conditions.
Using versioned JSON with new node structure for the generic.json.
Other files are not converted as they will be removed in future.
@yfguo
Copy link
Contributor Author

yfguo commented Sep 2, 2025

Current JSON for Bcast:

{
    "collective=bcast":
    {
        "comm_type=intra":
        {
            "comm_size<8":
            {
                "comm_hierarchy=parent":
                {
                    "avg_msg_size<=0":
                    {
                        "algorithm=MPIR_Bcast_intra_smp":{}
                    },
                    "avg_msg_size=any":
                    {
                        "algorithm=MPIR_Bcast_intra_binomial":{}
                    }
                },
                "comm_hierarchy=any":
                {
                    "avg_msg_size=any":
                    {
                        "algorithm=MPIR_Bcast_intra_binomial":{}
                    }
                }
            },
            "comm_size=pow2":
            {
                "comm_hierarchy=parent":
                {
                    "avg_msg_size<=0":
                    {
                        "algorithm=MPIR_Bcast_intra_smp":{}
                    },
                    "avg_msg_size<=12288":
                    {
                        "algorithm=MPIR_Bcast_intra_binomial":{}
                    },
                    "avg_msg_size<=524288":
                    {
                        "algorithm=MPIR_Bcast_intra_scatter_recursive_doubling_allgather":{}
                    },
                    "avg_msg_size=any":
                    {
                        "algorithm=MPIR_Bcast_intra_scatter_ring_allgather":{}
                    }
                },
                "comm_hierarchy=any":
                {
                    "avg_msg_size<=12288":
                    {
                        "algorithm=MPIR_Bcast_intra_binomial":{}
                    },
                    "avg_msg_size<=524288":
                    {
                        "algorithm=MPIR_Bcast_intra_scatter_recursive_doubling_allgather":{}
                    },
                    "avg_msg_size=any":
                    {
                        "algorithm=MPIR_Bcast_intra_scatter_ring_allgather":{}
                    }
                }
            },
            "comm_size=any":
            {
                "comm_hierarchy=parent":
                {
                    "avg_msg_size<=0":
                    {
                        "algorithm=MPIR_Bcast_intra_smp":{}
                    },
                    "avg_msg_size<=12288":
                    {
                        "algorithm=MPIR_Bcast_intra_binomial":{}
                    },
                    "avg_msg_size=any":
                    {
                        "algorithm=MPIR_Bcast_intra_scatter_ring_allgather":{}
                    }
                },
                "comm_hierarchy=any":
                {
                    "avg_msg_size<=12288":
                    {
                        "algorithm=MPIR_Bcast_intra_binomial":{}
                    },
                    "avg_msg_size=any":
                    {
                        "algorithm=MPIR_Bcast_intra_scatter_ring_allgather":{}
                    }
                }
            }
        },
        "comm_type=inter":
        {
            "algorithm=MPIR_Bcast_inter_remote_send_local_bcast":{}
        }
    },
}
is converted to
{
    "version": "5.0",
    "bcast": {
        "intra-comm": {
            "true": {
                "comm_size": {
                    "7": {
                        "parent-comm": {
                            "true": {
                                "avg_msg_size": {
                                    "0": {
                                        "algorithm": {
                                            "name": "MPIR_Bcast_intra_smp"
                                        }
                                    },
                                    "max": {
                                        "algorithm": {
                                            "name": "MPIR_Bcast_intra_binomial"
                                        }
                                    }
                                }
                            },
                            "false": {
                                "avg_msg_size": {
                                    "max": {
                                        "algorithm": {
                                            "name": "MPIR_Bcast_intra_binomial"
                                        }
                                    }
                                }
                            }
                        }
                    },
                    "max": {
                        "pof2": {
                            "true": {
                                "parent-comm": {
                                    "true": {
                                        "avg_msg_size": {
                                            "0": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_smp"
                                                }
                                            },
                                            "12288": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_binomial"
                                                }
                                            },
                                            "524288": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_scatter_recursive_doubling_allgather"
                                                }
                                            },
                                            "max": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_scatter_ring_allgather"
                                                }
                                            }
                                        }
                                    },
                                    "false": {
                                        "avg_msg_size": {
                                            "12288": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_binomial"
                                                }
                                            },
                                            "524288": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_scatter_recursive_doubling_allgather"
                                                }
                                            },
                                            "max": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_scatter_ring_allgather"
                                                }
                                            }
                                        }
                                    }
                                }
                            },
                            "false": {
                                "parent-comm": {
                                    "true": {
                                        "avg_msg_size": {
                                            "0": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_smp"
                                                }
                                            },
                                            "12288": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_binomial"
                                                }
                                            },
                                            "max": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_scatter_ring_allgather"
                                                }
                                            }
                                        }
                                    },
                                    "false": {
                                        "avg_msg_size": {
                                            "12288": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_binomial"
                                                }
                                            },
                                            "max": {
                                                "algorithm": {
                                                    "name": "MPIR_Bcast_intra_scatter_ring_allgather"
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            "false": {
                "algorithm": {
                    "name": "MPIR_Bcast_inter_remote_send_local_bcast"
                }
            }
        }
    },
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant