Problem: A common use case for tools is to launch a co-process alongside a job (one process per node or job shell) that monitors the job or node and emits some kind of output which should be collected and captured in a file or elsewhere. Currently, implementing this as a shell plugin in Lua is awkward (see #7679 for an example) and one C plugin per tool seems like overkill and an onerous burden on tool providers.
A builtin, generic, config driven approach could be useful here.
Some requirements:
- simple config format which allows site-defined or drop-in tool configuration
- easily enabled via a shell option, e.g.
-o gpumon
- coprocesses launched at
shell.start and auto-terminated at shell exit
- collection of coprocess output on rank 0 with output to file
- optional: support coprocess launch on subset of ranks
- multiple tool/coprocess configs can be selected and managed at the same time
- minimal overhead
- easy deployment
A C plugin could implement the generics here, read a config file or perhaps Lua drop-ins could create a list of config tables. The C plugin would create an independent flux_rexec() using the shell builtin rexec server for each configured tool, process output, etc.
Problem: A common use case for tools is to launch a co-process alongside a job (one process per node or job shell) that monitors the job or node and emits some kind of output which should be collected and captured in a file or elsewhere. Currently, implementing this as a shell plugin in Lua is awkward (see #7679 for an example) and one C plugin per tool seems like overkill and an onerous burden on tool providers.
A builtin, generic, config driven approach could be useful here.
Some requirements:
-o gpumonshell.startand auto-terminated at shell exitA C plugin could implement the generics here, read a config file or perhaps Lua drop-ins could create a list of config tables. The C plugin would create an independent
flux_rexec()using the shell builtin rexec server for each configured tool, process output, etc.