Skip to content

Conversation

@olupton
Copy link
Collaborator

@olupton olupton commented Apr 28, 2025

  • Add --{passing,failing}-commits option, which allows the commit-level search to be manually restricted in scope
    • This also allows single-container triage, as you can pass --{passing,failing}-container and --{failing,passing}-commits
  • With the Pyxis backend, re-use the same container instance within a single triage tool process
    • This reduces the amount of time spent in container creation
  • Support multi-node/multi-process triage with the Pyxis backend
    • This is implemented by annotating the various commands run inside the containers as once (run once, in one container instance), once_per_container (run once per container instance, i.e. once per node), and default (run without extra srun arguments -- the caller must make this do the right thing e.g. by passing --ntasks-per-node to salloc.
      • once example: getting the JAX commit from a container
      • once_per_container example: building JAX
      • default example: running the test case

@gpupuck
Copy link
Contributor

gpupuck commented May 5, 2025

Is there anything else needs to be addressed?

@olupton olupton force-pushed the olupton/multi-node-triage branch from 060f5b3 to 15cef45 Compare May 7, 2025 08:23
@olupton olupton marked this pull request as ready for review May 7, 2025 08:24
@olupton olupton requested a review from gpupuck May 7, 2025 14:34
Copy link
Contributor

@gpupuck gpupuck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been tested in practice, let's ship it!

@olupton olupton merged commit b28cc73 into main May 7, 2025
98 of 111 checks passed
@olupton olupton deleted the olupton/multi-node-triage branch May 7, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants