Skip to content

add support for MPMD (multiple program, multiple data) launch mode #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

guoyejun
Copy link

the following command:
mpiexec -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 final_cmd : \ -np 1 -host localhost ... -env RANK 1 final_cmd

is converted into:
mpitx -genv WORLD_SIZE 2 ... --
-np 1 -host localhost ... -env RANK 0 final_cmd : \ -np 1 -host localhost ... -env RANK 1 final_cmd

and finally calls:
mpiexec -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 mpitx "mpitx_child" final_cmd : \ -np 1 -host localhost ... -env RANK 1 mpitx "mpitx_child" final_cmd

the following command:
mpiexec -genv WORLD_SIZE 2 ... \
-np 1 -host localhost ... -env RANK 0 final_cmd : \
-np 1 -host localhost ... -env RANK 1 final_cmd

is converted into:
mpitx -genv WORLD_SIZE 2 ... -- \
-np 1 -host localhost ... -env RANK 0 final_cmd : \
-np 1 -host localhost ... -env RANK 1 final_cmd

and finally calls:
mpiexec -genv WORLD_SIZE 2 ... \
-np 1 -host localhost ... -env RANK 0 mpitx "mpitx_child" final_cmd : \
-np 1 -host localhost ... -env RANK 1 mpitx "mpitx_child" final_cmd
@guoyejun guoyejun mentioned this pull request Jun 25, 2023
@s417-lama
Copy link
Owner

I really appreciate your efforts put into mpitx.

I have a concern about the usage of the separator --. This separator is commonly used to mark the end of the options. However, in your usage, it is used to separate between the global options and local options, which might be a little confusing.

I think -- should be used to separate between the local options and each command, which looks like this:

mpitx -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 -- final_cmd : \ -np 1 -host localhost ... -env RANK 1 -- final_cmd

Then, you will not need to perform ad-hoc parsing for -env or other options, which are specific to an MPI implementation.

I think we don't have to separate between global and local options by our own, if the separator -- is always given by the user. The above command can be easily converted to the following form, I guess:

mpiexec -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 mpitx "mpitx_child" final_cmd : \ -np 1 -host localhost ... -env RANK 1 mpitx "mpitx_child" final_cmd

@guoyejun
Copy link
Author

thanks for the comment, one concern is that there might be tens, hundreds or possible thousands of 'final_cmd' in the command line, it is not easy for the user to add '--' for each of them.

'--' can be considered as the separator of 'mpiexec global part' and 'programs part'.

@s417-lama
Copy link
Owner

I understand your concern, but I don't think it is a good idea to break the standard meaning of --.

According to man bash,

A -- signals the end of options and disables further option processing. Any arguments after the -- are treated as filenames and arguments.

Another problem is that, if we do not use -- to separate local options and commands, we end up with writing a full option parser by ourselves. You already wrote a parser for -env or other commands, but this is specific to one MPI implementation. The option name and number of arguments depend on specific MPI implementations (e.g., Open MPI uses -x foo=bar). To cover all cases, we would have to write a full option parser for all MPI implementations.

This is why mpitx mandates the user to separate the options and commands by -- in the first place. This makes things very simple.

Another benefit in using -- as a separator between local options and commands is that, even if we replace mpitx with mpiexec, the command remains valid. For example, using the above example,

mpiexec -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 -- final_cmd : \ -np 1 -host localhost ... -env RANK 1 -- final_cmd

is still a vaild command line. If we put -- between the global and local options, it will not be a valid command.

@guoyejun
Copy link
Author

agree with the usage of --, while I'll keep the change locally for my specific case.

@s417-lama
Copy link
Owner

I totally understand. Thank you for your time and efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants