Skip to content

idea: generic job shell coprocess management for tools #7680

@grondo

Description

@grondo

Problem: A common use case for tools is to launch a co-process alongside a job (one process per node or job shell) that monitors the job or node and emits some kind of output which should be collected and captured in a file or elsewhere. Currently, implementing this as a shell plugin in Lua is awkward (see #7679 for an example) and one C plugin per tool seems like overkill and an onerous burden on tool providers.

A builtin, generic, config driven approach could be useful here.

Some requirements:

  • simple config format which allows site-defined or drop-in tool configuration
  • easily enabled via a shell option, e.g. -o gpumon
  • coprocesses launched at shell.start and auto-terminated at shell exit
  • collection of coprocess output on rank 0 with output to file
  • optional: support coprocess launch on subset of ranks
  • multiple tool/coprocess configs can be selected and managed at the same time
  • minimal overhead
  • easy deployment

A C plugin could implement the generics here, read a config file or perhaps Lua drop-ins could create a list of config tables. The C plugin would create an independent flux_rexec() using the shell builtin rexec server for each configured tool, process output, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions