ACTL sampleworks image + run_experiments script by manzuoni-astera · Pull Request #240 · diff-use/sampleworks

manzuoni-astera · 2026-05-21T16:16:53Z

Summary

This PR makes the ACTL Sampleworks experiment workflow usable from the prebuilt image pixi-with-checkpoints. The user-facing entry point is run_experiments: it reads TOML presets and launches the right grid-search jobs, pixi environments, GPU assignments, logs, shared data paths, result directories, and MSA cache paths. The dir /home/dev/workspace is synced to the local dir that actl is launched from, which should be this sampleworks repo.

What changed

Add standalone model presets for boltz, boltz1, boltz2, boltz2_xrd, boltz2_md, rf3, and protenix, plus comparison presets.
Add gpu_count = N preset support so jobs auto-assign visible GPUs in order; single-job presets default to all 8 GPUs.
Move experiment presets into experiments/ and simplify target syntax (run_experiments rf3, run_experiments boltz, run_experiments full_8gpu --jobs rf3,protenix).
Prefer baked pixi env Python in ACTL images, with clear failures instead of slow runtime reinstalls.
Harden ACTL defaults for shared storage, checkpoints, source path selection, cache dirs, dry-run behavior, and runner process cleanup.
Clarify worker GPU logs by showing both local CUDA index and selected CUDA_VISIBLE_DEVICES entry.
Add focused loader, CLI, and runner tests.

Using `run_experiments`

In an ACTL Sampleworks pod, from /home/dev/workspace:

run_experiments --list        # show available presets
run_experiments --show boltz  # inspect resolved commands
run_experiments --dry-run rf3 # print commands without running
run_experiments boltz         # run Boltz2 X-ray + MD
run_experiments rf3           # run RF3 on all 8 GPUs by default
run_experiments protenix      # run Protenix on all 8 GPUs by default

Presets live in experiments/*.toml. The default full run can be subset by job name:

run_experiments full_8gpu --jobs rf3,protenix

Useful launch-time overrides:

run_experiments rf3 --set jobs.rf3.gpu_count=4
run_experiments rf3 --set jobs.rf3.args.gradient-weights="0.0 0.01 0.02"

By default, data comes from /mnt/diffuse-shared/raw/sampleworks, results go under /mnt/diffuse-shared/results/sampleworks/<pod>/<target>/, and MSA cache goes under /mnt/diffuse-shared/cache/sampleworks/msa. Override with DATA_DIR, RESULTS_DIR, or MSA_CACHE_DIR.

Adds a new `sampleworks-runs` CLI under `src/sampleworks/runs/` that launches parallel `run_grid_search.py` jobs from a single TOML preset. Five bundled presets (`all_models`, `rf3_partial`, `rf3_partial_chiral_off`, `protenix_dual`, `rf3_protenix`) cover the canonical multi-model sweeps. Each preset declares its jobs (pixi env, GPU assignment, args); the runner sets `CUDA_VISIBLE_DEVICES`, shells out via `pixi run -e <env>`, tees per-job logs, and aggregates exit codes. Dotted-path `--set` overrides let users sweep parameters without editing TOML, e.g.: sampleworks-runs rf3_partial --set jobs.rf3.args.gradient-weights="0.0 0.01" Pure stdlib (tomllib + dataclasses + argparse + subprocess), no new deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ruff auto-fixed import ordering (I001) and unused imports (F401); two remaining were a leftover unused local `rf3 = preset.job("rf3")` (F841) and one over-100-char assertion (E501) extracted into a local. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI's lint job runs both `ruff check` and `ruff format --check`. The prior commit fixed only the former. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- cli.py: `--only` filter now preserves `shared_args`; previously the filtered Preset dropped them, silently losing every job's shared flags. - loader.py: `--set` now rejects unknown top-level keys. A typo like `--set job.rf3.gpus=0` (note missing 's') used to auto-create an unused `job` dict and silently no-op; now it raises KeyError with the valid keys listed. - runner.py: handle partial spawn failures. If one job fails to spawn midway, already-launched jobs are terminated and joined instead of orphaned. Also wraps Popen in try/close to avoid log-file handle leak if the subprocess fails to start. - presets/rf3_partial_chiral_off.toml: change `output_subdir = "."` (write to RESULTS_DIR root) to `"rf3"` (subdir under RESULTS_DIR) for consistency with the other RF3 presets and collision safety when RESULTS_DIR is overridden to a shared location. Skipped (faithful to bash original): rf3_protenix.toml asymmetric partial-diffusion-step — the source `run_rf3_protenix_mdc_actl.sh` deliberately set it only for the Protenix job, not RF3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Aligns the new sampleworks.runs module with project style policy (AGENTS.md L7 and L338): - Add NumPy-style docstrings (summary, Parameters, Returns, Raises) to every public and private function/class across schema.py, loader.py, runner.py, and cli.py. - Mark Job, Preset, JobInvocation, and _RunningJob as @DataClass(frozen=True). Behavior is preserved: Preset.effective_args already returns a fresh dict, and the runner mutates that local copy rather than the Preset itself. All 32 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Match the actual mounted layout on the ACTL pod: /mnt/diffuse-shared/raw/sampleworks/ ├── initial_dataset_40/ # DATA_DIR for all_models ├── initial_dataset_40_occ_sweeps/ # DATA_DIR for occ-sweep presets ├── actl_msa_cache/ # MSA_CACHE_DIR for all presets └── actl_results/<preset_name>/ # RESULTS_DIR namespaced per preset Per-preset RESULTS_DIR avoids cross-preset collisions when multiple presets share output_subdirs (e.g. all_models and rf3_partial both have a job named "rf3"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The proteins.csv files in the shared dataset reference /data/inputs/ paths that only resolve after the pod-init script (actl_setup_sampleworks_paths.sh) creates the canonical symlinks. Point presets at those canonical paths so everything stays consistent: DATA_DIR /data/input | /data/inputs RESULTS_DIR /data/results/<preset_name> MSA_CACHE_DIR /root/.sampleworks /data/results is namespaced by the pod-init script via $SAMPLEWORKS_ACTL_RUN_NAME (defaults to $HOSTNAME), so per-preset subdirs sit inside a per-session root. README documents the one-time setup script step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

run_grid_search.py writes its work-queue pickle (wjq_*.pkl) directly into --output-dir without creating it first, so the orchestrator must. Previously only results_dir itself was mkdir'd, which broke any preset whose output_subdir didn't already exist on disk. JobInvocation now carries the resolved output_dir explicitly so _spawn can mkdir it alongside the log directory. --dry-run still doesn't touch the filesystem. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-21T16:17:41Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a TOML-driven preset system with a CLI and runner to launch parallel multi-job grid-searches on ACTL, bundled presets, shell wrappers delegating to the Python runner, GPU/checkpoint validation and env handling, Boltz I/O defaults, Docker/packaging updates, docs, and comprehensive tests.

Changes

TOML Preset Orchestration System

Layer / File(s)	Summary
Job and Preset data model `src/sampleworks/runs/schema.py`	Frozen `Job` and `Preset` dataclasses with validation for environment membership, non-empty GPU/output strings, and unique job names; `Preset` provides job lookup and merged argument resolution.
TOML preset loading and variable interpolation `src/sampleworks/runs/loader.py`	Bundled and filesystem preset discovery, TOML parsing, CLI dotted-path overrides with coercion, and `${VAR}` fixed-point substitution using env/defaults precedence.
Job invocation construction and parallel execution `src/sampleworks/runs/runner.py`	Builds per-job subprocess invocations (CUDA env, direct pixi Python or `pixi run`), manages parallel spawning, per-job binary logs with tee threads, failure-safe cleanup, and aggregate exit codes.
CLI entry point and user interface `src/sampleworks/runs/cli.py`, `src/sampleworks/runs/__init__.py`	Argument parsing for preset selection, `--list`/`--show`, `--dry-run`, job filtering (`--only`), `--set` overrides, results-dir resolution, and dispatch to runner.
Bundled TOML preset configurations `src/sampleworks/runs/presets/*.toml`	Preset templates (`all_models`, `protenix_dual`, `rf3_partial`, `rf3_partial_chiral_off`, `rf3_protenix`) with defaults, shared args, and per-job GPU/checkpoint/sweep parameters.
Bash wrapper rewritten to delegate to Python CLI `run_all_models.sh`, `run_experiments`, `run_experiments.sh`	Replaces Docker-based launcher with ACTL-native wrapper resolving presets, default `/mnt/diffuse-shared` paths, checkpoint discovery, per-run proteins manifest rewriting, and `exec` of the Python CLI (or `pixi run` fallback).
Worker process direct Python and pixi environment support `run_grid_search.py`	Allows direct baked pixi-env Python execution with constructed PATH/CONDA/CUDA env or falls back to `pixi run`; subprocess env is consistently passed; `detect_gpus` validates requested ordinals.
Boltz in-process data loading defaults `src/sampleworks/models/boltz/wrapper.py`	Sets `num_workers` defaults to `0` across config and wrappers to keep Boltz preprocessing in-process and avoid `/dev/shm` limits.
Model checkpoint environment variable precedence `src/sampleworks/utils/guidance_script_arguments.py`	Adds per-model checkpoint env-var overrides, expands candidate checkpoint locations, and improves not-found error messages to enumerate checked candidates and env-var hints.
Boltz output directory for side-effect artifacts `src/sampleworks/utils/guidance_script_utils.py`	Passes `out_dir=args.output_dir` into Boltz preprocessing so manifest/NPZ/MSA side effects are scoped per-job.
Dockerfile checkpoint base and packaging entry `Dockerfile`, `pyproject.toml`	Updates Dockerfile to use Harbor-based checkpoint image, copies run scripts into image, registers `sampleworks-runs` console script, and includes presets in wheel packaging.
Documentation and ACTL sync config `.actlignore`, `AGENTS.md`, `README.md`, `GRID_SEARCH.md`, `docker-entrypoint.sh`	Adds `.actlignore` to exclude generated outputs, updates AGENTS.md layout, documents ACTL/TOML preset workflows in README and GRID_SEARCH.md, and updates entrypoint help text.
Comprehensive test suite `tests/runs/conftest.py`, `tests/runs/test_.py`, `tests/models/boltz/`	End-to-end and unit tests for preset discovery/loading, variable/override precedence, CLI behaviors, runner invocation construction, dry-run semantics, baked-env Python usage, and Boltz default expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

diff-use/sampleworks#91: Related container/runtime orchestration and run_all_models/run_experiments workflow edits.
diff-use/sampleworks#147: Similar checkpoint auto-resolution and guidance arg changes.
diff-use/sampleworks#143: Dockerfile checkpoint baking and external checkpoints image changes.

Suggested reviewers

marcuscollins
k-chrispens
xraymemory

Poem

🐰 I rewired scripts and TOML bright,

Presets march forward, jobs take flight,
GPUs hum, logs stitched neat at night,
Checkpoints found, and tests hold tight,
A rabbit cheers: experiments take flight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title 'ACTL sampleworks image + run_experiments script' accurately summarizes the main changes: baking the ACTL-ready sampleworks image and adding the run_experiments entrypoint script.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

manzuoni-astera · 2026-05-21T16:20:41Z

CI is currently blocked waiting for gpu-testing environment approval before the GPU test jobs can start. GitHub lists @marcuscollins / @k-chrispens as eligible reviewers for the pending deployment. The baked-image smoke verification already passed on ACTL with run_all_models.sh --dry-run against harbor.astera.sh/library/sampleworks:cuda12.4-2026-05-21-pr236-scientist3.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Dockerfile (1)
107-108: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale source comment to match Harbor.

Line 107 says Docker Hub, but the actual source is Harbor. Keeping this aligned avoids operational confusion during image maintenance.
Suggested change
-# Bake in model checkpoints from pre-built base image on Docker Hub
+# Bake in model checkpoints from pre-built Harbor image
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile` around lines 107 - 108, Update the stale comment that says
"Docker Hub" to reference "Harbor" so the source note is accurate; edit the
comment block containing the line "# Bake in model checkpoints from pre-built
base image on Docker Hub" (in the Dockerfile) to say "Harbor" instead of "Docker
Hub" to keep documentation aligned with the actual image source.

🧹 Nitpick comments (14)

README.md (1)
162-162: ⚡ Quick win

Use a generic pod name in the example.

The example uses "sampleworks-pr236", which is tied to a specific PR number and may confuse users. Consider a more generic name like "my-sampleworks-pod" or just "sampleworks".
📝 Suggested improvement
-actl pod up sampleworks-pr236 --profile 8x --image sampleworks --storage shared --pvc-size 200Gi --mount diffuse-shared --yes
+actl pod up my-sampleworks-pod --profile 8x --image sampleworks --storage shared --pvc-size 200Gi --mount diffuse-shared --yes
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 162, Replace the PR-specific pod name in the example
command (the string "actl pod up sampleworks-pr236 --profile 8x --image
sampleworks --storage shared --pvc-size 200Gi --mount diffuse-shared --yes")
with a generic name such as "my-sampleworks-pod" or simply "sampleworks" so the
README example is not tied to a specific PR number.
src/sampleworks/runs/loader.py (1)
93-102: ⚡ Quick win

Read TOML as UTF-8 via binary mode + tomllib.load()

tomllib.load() expects a binary file object (open(..., "rb")), and TOML files are UTF-8 by spec. Using Path.read_text() decodes with the locale encoding by default, which can cause UnicodeDecodeError or mojibake on non-UTF-8 systems when presets contain non-ASCII.
♻️ Suggested change
-    if path.suffix == ".toml" and path.exists():
-        return tomllib.loads(path.read_text())
+    if path.suffix == ".toml" and path.exists():
+        with path.open("rb") as f:
+            return tomllib.load(f)
     bundled = resources.files(_BUNDLED_PRESETS_PACKAGE) / f"{name_or_path}.toml"
     if not bundled.is_file():
         raise FileNotFoundError(
             f"No preset {name_or_path!r}. Bundled: {list_bundled_presets()}. "
             f"Or pass a path to a .toml file."
         )
-    return tomllib.loads(bundled.read_text())
+    with bundled.open("rb") as f:
+        return tomllib.load(f)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/runs/loader.py` around lines 93 - 102, The current loader
reads TOML via Path.read_text() and tomllib.loads which uses the locale codec
and can mis-decode non-ASCII; instead open the real file in binary and call
tomllib.load() on the file object for both local files (Path(name_or_path) when
path.suffix == ".toml") and bundled files (the resources.files(...) /
f"{name_or_path}.toml" object), ensuring you replace uses of Path.read_text()
and tomllib.loads with binary-open + tomllib.load(file_obj) while keeping the
same FileNotFoundError behavior and references to list_bundled_presets()
unchanged.
run_grid_search.py (1)
222-246: ⚡ Quick win

Worker-side cmd/env selection is correct, but duplicates the runner's env resolution.

The branch between env_python/pixi run and the env construction in get_pixi_env_process_env mirror runner._pixi_env_python and runner._job_env exactly. See the corresponding comment on src/sampleworks/runs/runner.py:115-143 — hoisting these into a shared helper would let run_guidance_queue_script and the new runner stay in sync as we evolve the activation contract (e.g. if we add LD_LIBRARY_PATH or PIXI_PROJECT_MANIFEST).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@run_grid_search.py` around lines 222 - 246, The env/command selection in
run_guidance_queue_script duplicates activation logic already in
runner._pixi_env_python and runner._job_env; extract a shared helper (e.g.,
create_pixi_process_cmd_env or similar) that encapsulates the logic using
get_pixi_env, get_pixi_env_python and get_pixi_env_process_env, then replace the
duplicated branch in run_guidance_queue_script to call this helper so both
run_guidance_queue_script and the runner use the same implementation and remain
in sync as activation requirements evolve.
run_all_models.sh (1)
78-87: 💤 Low value

runner_env is hardcoded to rf3 by default — confirm intent.

The runner module itself only orchestrates subprocess launches and never imports model code, so any of the three pixi envs would work. Defaulting to rf3 is fine on the ACTL image (it's always baked), but if a downstream consumer slims the image and ships only boltz, the wrapper would unexpectedly fall through to pixi run -e rf3 ... and fail. A brief comment or selecting any-available env would make the intent clearer.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@run_all_models.sh` around lines 78 - 87, The defaulting of runner_env to
"rf3" is brittle; update the logic in run_all_models.sh around runner_env,
SAMPLEWORKS_RUNNER_ENV, pixi_project_dir and runner_python so it either
documents the intent with a concise comment or dynamically picks an available
env instead of hardcoding "rf3" — e.g., if SAMPLEWORKS_RUNNER_ENV is unset, scan
$pixi_project_dir/.pixi/envs for available env names (or iterate a preferred
list like rf3, boltz, <other>) and set runner_env to the first existing one;
ensure runner_python continues to be constructed from the chosen runner_env and
add a one-line comment explaining why a specific env is selected or why any
available env is acceptable.
src/sampleworks/runs/runner.py (2)
115-143: ⚡ Quick win

_pixi_env_python is duplicated in run_grid_search.py (get_pixi_env_python).

This helper, along with _job_env/get_pixi_env_process_env, is byte-for-byte equivalent to the worker-side copy added in run_grid_search.py. Having two copies of the env resolution rules (including the SAMPLEWORKS_FORCE_PIXI opt-out and SAMPLEWORKS_<ENV>_PYTHON overrides) means they will drift. Consider hoisting both helpers into a shared module (e.g. sampleworks.runs.pixi_env or sampleworks.utils.pixi_env) and importing from both call sites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/runs/runner.py` around lines 115 - 143, The helper
_pixi_env_python (and the related _job_env/get_pixi_env_process_env) is
duplicated in run_grid_search.py as get_pixi_env_python; extract these shared
functions into a new shared module (e.g. create sampleworks.runs.pixi_env or
sampleworks.utils.pixi_env) and move the logic for SAMPLEWORKS_FORCE_PIXI and
SAMPLEWORKS_<ENV>_PYTHON overrides there, then update both runner.py (use
_pixi_env_python import) and run_grid_search.py (use get_pixi_env_python import)
to import the centralized helpers so the env resolution rules are implemented in
one place.
230-252: 💤 Low value

Redundant build_invocations call.

build_invocations is invoked at line 232 and again at line 242 with identical arguments. The first result is only used to derive pixi_envs, then discarded. The rebuild is presumably defensive in case _prepare_pixi_env materializes new env binaries that change the result of _pixi_env_python, but this is implicit and wasteful (re-runs os.environ lookups and os.access probes per job). Consider either dropping the rebuild or explicitly commenting why it's needed.
♻️ Possible simplification
-    invocations = build_invocations(preset, results_dir=results_dir)
-
-    if dry_run:
+    if dry_run:
+        invocations = build_invocations(preset, results_dir=results_dir)
         for inv in invocations:
             _print_dry_run(inv)
         return 0

-    pixi_envs = sorted({inv.job.env for inv in invocations})
+    pixi_envs = sorted({job.env for job in preset.jobs})
     for pixi_env in pixi_envs:
         _prepare_pixi_env(pixi_env)
     invocations = build_invocations(preset, results_dir=results_dir)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/runs/runner.py` around lines 230 - 252, The code calls
build_invocations(preset, results_dir=results_dir) twice—first to compute
pixi_envs then again to get invocations—wasting work; either remove the second
call and reuse the original invocations for launching, or (if _prepare_pixi_env
can change invocation resolution) add an explicit comment and a clear
conditional rebuild: compute invocations once, derive pixi_envs from
invocations, call _prepare_pixi_env for each pixi_env, then if _prepare_pixi_env
may alter what build_invocations would return re-run build_invocations only when
a deterministic flag indicates envs were mutated (or compare before/after
state), otherwise reuse the initial invocations; reference functions
build_invocations, _prepare_pixi_env, and the pixi_envs variable when making the
change.
src/sampleworks/runs/cli.py (1)
101-133: 💤 Low value

_filter_only preserves preset order, not user-specified order.

keep = [j for j in preset.jobs if j.name in names] iterates preset.jobs, so the --only flag silently ignores the order given on the command line. This is fine for parallel runs (order is irrelevant), but worth noting in the docstring so users don't expect e.g. --only protenix,rf3 to influence log/launch ordering.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/runs/cli.py` around lines 101 - 133, The _filter_only
implementation currently builds keep = [j for j in preset.jobs if j.name in
names], so the returned Preset preserves preset.jobs order rather than the order
specified in the --only string; update the docstring of _filter_only to
explicitly state that the resulting Preset preserves the original preset job
ordering (preset.jobs) and does not reorder jobs to match the command-line
--only sequence (mention names and keep to make the connection), so users know
that --only filters but does not control launch/log order.
tests/runs/test_runner.py (4)
137-147: 💤 Low value

Dry-run assertion is one-sided — also assert the rf3 dir is the only thing missing.

The test only checks that results_dir/"rf3" doesn't exist. If a future change accidentally creates results_dir/<other_job> (e.g. when dry-run is requested for all_models), this test wouldn't catch it. Consider either:

Asserting results_dir.exists() and list(results_dir.iterdir()) == [] (no per-job subdirs created), or

Parametrising over multiple presets so any per-job mkdir leak is caught.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runs/test_runner.py` around lines 137 - 147, The test
test_dry_run_does_not_create_directories only asserts that (results_dir / "rf3")
does not exist, which misses other unexpected per-job dirs; update the assertion
after calling runner.run(preset, results_dir=results_dir, dry_run=True) to
confirm results_dir exists (since run creates it for logs) and that it contains
no per-job subdirectories (e.g., assert results_dir.exists() and
list(results_dir.iterdir()) == []), or alternatively parametrize the test over
multiple presets so runner.run never creates any per-job subdirs during dry_run;
ensure you reference the existing variables/presets (results_dir, preset) and
the test function name test_dry_run_does_not_create_directories when making the
change.
42-92: 💤 Low value

Missing NumPy-style docstrings on several tests.

test_argv_omits_false_bool_flags (line 42), test_explicit_output_dir_in_args_wins_over_subdir_default (line 51), test_all_models_has_four_jobs_with_distinct_gpus (line 67), test_protenix_dual_uses_different_checkpoints (line 78), and test_rf3_partial_chiral_off_flag_present (line 87) all lack docstrings while neighbouring tests have them. A one-liner explaining the contract under test is sufficient.

As per coding guidelines: "Always include NumPy-style docstrings for every function and class."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runs/test_runner.py` around lines 42 - 92, Several test functions
(test_argv_omits_false_bool_flags,
test_explicit_output_dir_in_args_wins_over_subdir_default,
test_all_models_has_four_jobs_with_distinct_gpus,
test_protenix_dual_uses_different_checkpoints,
test_rf3_partial_chiral_off_flag_present) are missing NumPy-style docstrings;
add a single-line NumPy-style docstring under each def that briefly describes
the contract being tested (e.g., what input/state and what behavior/assertion is
expected) so they match the project's docstring guideline and neighboring tests'
style.
150-163: 💤 Low value

Helper is correct but brittle on flags whose value starts with --.

The single-dash check (startswith("--")) means a future flag whose value happens to begin with -- (e.g. a passthrough string, or a path containing --) silently becomes a boolean. Not a current bug given the presets, but worth a comment so the next reader understands the precondition, or upgrade the assertion to raise on unexpected shapes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runs/test_runner.py` around lines 150 - 163, The helper _argv_to_dict
silently treats a value that begins with "--" as a boolean flag; change it to
explicitly fail instead and/or document the precondition: inside _argv_to_dict
(and around the loop using variables flag and tail), when you see that i+1 <
len(tail) and tail[i+1].startswith("--"), raise an AssertionError or ValueError
with a clear message like "unexpected token starting with '--' as value for
{flag}", or alternatively add a docstring comment above _argv_to_dict stating
the precondition that values must not begin with "--" so callers know this
limitation. Ensure the error message references the offending flag/token for
easier debugging.
42-92: ⚡ Quick win

Tests can flake based on shell env vars — clear loader inputs in a fixture.

Only test_argv_for_rf3_partial_matches_bash clears DATA_DIR/RESULTS_DIR, yet the other tests assert hard-coded paths the loader interpolates from environment (e.g. /data/inputs/proteins.csv at line 28 via shared default, /checkpoints/rf3_foundry_01_24_latest.ckpt at line 34, /extra_checkpoints/protenix_* at lines 83–84). If a developer or CI worker has any of DATA_DIR, RESULTS_DIR, CHECKPOINTS_DIR, EXTRA_CHECKPOINTS_DIR, SAMPLEWORKS_FORCE_PIXI, SAMPLEWORKS_PIXI_PROJECT_DIR, or SAMPLEWORKS_GRID_SEARCH_SCRIPT exported, these assertions break in a confusing way.

Centralise the cleanup in an autouse fixture (or in conftest.py) so every test starts from a known env.
♻️ Suggested fixture
+@pytest.fixture(autouse=True)
+def _isolate_runner_env(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Clear env vars consumed by loader/runner so tests are hermetic."""
+    for var in (
+        "DATA_DIR",
+        "RESULTS_DIR",
+        "CHECKPOINTS_DIR",
+        "EXTRA_CHECKPOINTS_DIR",
+        "SAMPLEWORKS_FORCE_PIXI",
+        "SAMPLEWORKS_PIXI_PROJECT_DIR",
+        "SAMPLEWORKS_GRID_SEARCH_SCRIPT",
+    ):
+        monkeypatch.delenv(var, raising=False)
+    monkeypatch.setenv("HOME", "/home/test")
Then drop the per-test setenv("HOME", ...) and the ad-hoc delenv calls.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runs/test_runner.py` around lines 42 - 92, Tests flake because
environment variables like DATA_DIR/RESULTS_DIR/CHECKPOINTS_DIR/etc. can be set
externally; add an autouse pytest fixture that clears or sets to known values
the problematic env vars before each test to ensure loader.load_preset and
runner.build_invocations see a deterministic environment. Implement a fixture
(in this test module or conftest.py) named e.g. clear_loader_env or
reset_loader_env with `@pytest.fixture`(autouse=True) that pops/overrides
DATA_DIR, RESULTS_DIR, CHECKPOINTS_DIR, EXTRA_CHECKPOINTS_DIR,
SAMPLEWORKS_FORCE_PIXI, SAMPLEWORKS_PIXI_PROJECT_DIR,
SAMPLEWORKS_GRID_SEARCH_SCRIPT (and any other loader interpolated vars) and
ensures HOME is set if needed; then remove the ad-hoc monkeypatch.setenv("HOME",
...) calls from the test functions (e.g. test_argv_omits_false_bool_flags,
test_all_models_has_four_jobs_with_distinct_gpus,
test_protenix_dual_uses_different_checkpoints,
test_rf3_partial_chiral_off_flag_present) so all tests consistently use the
fixture-controlled environment when calling loader.load_preset and
runner.build_invocations.
src/sampleworks/models/boltz/wrapper.py (2)
1034-1099: ⚡ Quick win

Update docstring to reflect changed default.

The num_workers parameter docstring states "by default 2" but the signature now defaults to 0. As per coding guidelines, NumPy-style docstrings should accurately document all parameters.
📝 Suggested docstring fix
         num_workers: int, optional
-            Number of parallel workers for input data processing, by default 2
+            Number of parallel workers for input data processing, by default 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/models/boltz/wrapper.py` around lines 1034 - 1099, The
docstring for _setup_data_module incorrectly states num_workers defaults to 2
while the function signature defaults it to 0; update the NumPy-style docstring
for the num_workers parameter to reflect the actual default of 0 (or remove the
explicit default text), and ensure the description matches the behavior used
when constructing BoltzInferenceDataModule (refer to parameter num_workers and
its usage in BoltzInferenceDataModule instantiation).
569-642: ⚡ Quick win

Update docstring to reflect changed default.

The num_workers parameter docstring states "by default 8" but the signature now defaults to 0. As per coding guidelines, NumPy-style docstrings should accurately document all parameters.
📝 Suggested docstring fix
         num_workers: int, optional
-            Number of parallel workers for input data processing, by default 8
+            Number of parallel workers for input data processing, by default 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/models/boltz/wrapper.py` around lines 569 - 642, The
docstring for _setup_data_module incorrectly states the default for num_workers
is 8 while the function signature defaults it to 0; update the NumPy-style
docstring for the num_workers parameter in _setup_data_module to reflect the
actual default (0) or remove the default mention so it matches the signature and
coding guidelines, making sure the parameter description aligns with the
function's behavior.
src/sampleworks/utils/guidance_script_arguments.py (1)
45-76: ⚡ Quick win

Update docstring to document environment variable precedence.

The docstring states "Tries baked-in Docker paths first" but the implementation now checks model-specific environment variables (e.g., BOLTZ1_CHECKPOINT) before falling back to baked-in candidates. As per coding guidelines, NumPy-style docstrings should accurately document function behavior.
📝 Suggested docstring update
 def _resolve_checkpoint(model_key: str) -> str:
-    """Return the first checkpoint path that exists on disk for *model_key*.
+    """Return the first checkpoint path that exists on disk for *model_key*.
 
-    Tries baked-in Docker paths first (``/checkpoints/``), then falls back to
-    legacy development paths.  If none are found the first candidate is returned
-    so that downstream validation produces a clear error message.
+    Checks model-specific environment variables (e.g., ``BOLTZ1_CHECKPOINT``)
+    first, then tries baked-in Docker paths (``/checkpoints/``), ACTL shared
+    storage, and finally legacy development paths. If none are found, raises
+    ValueError with the list of checked candidates and environment variable hint.
+
+    Parameters
+    ----------
+    model_key : str
+        Model identifier (e.g., "boltz1", "rf3", "protenix").
+
+    Returns
+    -------
+    str
+        Resolved checkpoint path.
+
+    Raises
+    ------
+    ValueError
+        If no checkpoint candidates exist or model_key is unknown.
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/utils/guidance_script_arguments.py` around lines 45 - 76, The
docstring for _resolve_checkpoint is outdated: it claims baked-in Docker paths
are tried first but the function actually prefers a model-specific environment
variable (from _CHECKPOINT_ENV_VARS) before falling back to
_CHECKPOINT_CANDIDATES; update the NumPy-style docstring of _resolve_checkpoint
to state the actual precedence (check model-specific env var if set, then
candidate baked-in paths, then legacy/development paths) and note that if no
candidates exist an empty string triggers a ValueError; reference
_CHECKPOINT_ENV_VARS and _CHECKPOINT_CANDIDATES in the docstring to make the
precedence explicit.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Dockerfile`:
- Line 111: The Dockerfile currently copies the checkpoints image using the
mutable tag in the COPY instruction (COPY
--from=harbor.astera.sh/library/sampleworks-checkpoints:latest /checkpoints/
/checkpoints/); replace the :latest tag with the image's immutable digest
(`@sha256`:...) to make builds deterministic. Obtain the digest for
harbor.astera.sh/library/sampleworks-checkpoints (e.g., via docker pull then
docker inspect --format='{{index .RepoDigests 0}}' or via the registry API /
manifest) and update the COPY --from reference to use that `@sha256` digest so the
build always uses the exact same artifact.

In `@src/sampleworks/runs/loader.py`:
- Around line 369-374: The _expand fixed-point loop can hang on cyclic variable
references (prev/current never stabilize); add a hard iteration cap (e.g.,
MAX_EXPAND_ITERATIONS = 1000) inside the _expand function and increment a
counter each loop iteration, breaking and raising a clear error (ValueError) if
the cap is exceeded; locate the loop using the symbols prev, current and the
call _VAR_PATTERN.sub(repl, current) and ensure the raised error message
explains a cyclic variable expansion was detected and suggests inspecting the
variable map/overrides.
- Around line 1-7: Update the module docstring to reflect the actual resolution
order used by load_preset: state that CLI --set overrides (e.g., --set
defaults.DATA_DIR=...) are applied before variable interpolation, and then
${VAR} references are resolved against the process environment with the preset's
[defaults] filling in any unset keys; reference load_preset as the function
enforcing this order so readers know where the behavior originates.

In `@src/sampleworks/runs/runner.py`:
- Around line 255-269: The _terminate_all function can hang because
proc.terminate() sends SIGTERM and proc.wait() is called without a timeout;
change _terminate_all (and handling of _RunningJob.proc and
_RunningJob.tee_thread) to wait with a bounded timeout per-process, catch
subprocess.TimeoutExpired, call proc.kill() if the terminate wait times out,
then wait again (with timeout) and proceed to join tee_thread also with a
timeout; ensure you import and handle subprocess.TimeoutExpired and avoid
infinite waits by applying these timeouts for both terminate and final wait
paths.
- Around line 384-406: The _spawn sequence creates log_file, starts Popen,
constructs the tee thread and then starts it, but the current try/except only
covers Popen so failures after Popen (e.g. Thread construction or the assert)
can leak log_file and leave the child process running without a reader; update
_spawn to widen the protection so that the try covers creation of log_file,
subprocess.Popen, thread creation and thread.start(), and on any exception
before the tee thread is running explicitly close log_file and terminate/kill
the proc (proc.kill() or proc.terminate() then wait) to avoid orphaned
processes, and replace the assert proc.stdout is not None with an explicit guard
that cleans up and raises a clear exception; keep references to _tee, inv, and
_RunningJob to locate where to implement these changes.

In `@tests/runs/conftest.py`:
- Around line 8-11: Update the autouse fixture force_pixi_argv to also
neutralize SAMPLEWORKS_GRID_SEARCH_SCRIPT and any SAMPLEWORKS_<ENV>_PYTHON
environment variables so tests are deterministic regardless of developer
environment; specifically, in force_pixi_argv use monkeypatch.delenv or
monkeypatch.setenv to clear SAMPLEWORKS_GRID_SEARCH_SCRIPT and iterate current
env keys that match the pattern r"^SAMPLEWORKS_.*_PYTHON$" (or explicitly list
known variants) and delete or unset them, leaving SAMPLEWORKS_FORCE_PIXI set to
"1" as before; this ensures runner._grid_search_script() and _pixi_env_python()
read a clean environment during tests.

---

Outside diff comments:
In `@Dockerfile`:
- Around line 107-108: Update the stale comment that says "Docker Hub" to
reference "Harbor" so the source note is accurate; edit the comment block
containing the line "# Bake in model checkpoints from pre-built base image on
Docker Hub" (in the Dockerfile) to say "Harbor" instead of "Docker Hub" to keep
documentation aligned with the actual image source.

---

Nitpick comments:
In `@README.md`:
- Line 162: Replace the PR-specific pod name in the example command (the string
"actl pod up sampleworks-pr236 --profile 8x --image sampleworks --storage shared
--pvc-size 200Gi --mount diffuse-shared --yes") with a generic name such as
"my-sampleworks-pod" or simply "sampleworks" so the README example is not tied
to a specific PR number.

In `@run_all_models.sh`:
- Around line 78-87: The defaulting of runner_env to "rf3" is brittle; update
the logic in run_all_models.sh around runner_env, SAMPLEWORKS_RUNNER_ENV,
pixi_project_dir and runner_python so it either documents the intent with a
concise comment or dynamically picks an available env instead of hardcoding
"rf3" — e.g., if SAMPLEWORKS_RUNNER_ENV is unset, scan
$pixi_project_dir/.pixi/envs for available env names (or iterate a preferred
list like rf3, boltz, <other>) and set runner_env to the first existing one;
ensure runner_python continues to be constructed from the chosen runner_env and
add a one-line comment explaining why a specific env is selected or why any
available env is acceptable.

In `@run_grid_search.py`:
- Around line 222-246: The env/command selection in run_guidance_queue_script
duplicates activation logic already in runner._pixi_env_python and
runner._job_env; extract a shared helper (e.g., create_pixi_process_cmd_env or
similar) that encapsulates the logic using get_pixi_env, get_pixi_env_python and
get_pixi_env_process_env, then replace the duplicated branch in
run_guidance_queue_script to call this helper so both run_guidance_queue_script
and the runner use the same implementation and remain in sync as activation
requirements evolve.

In `@src/sampleworks/models/boltz/wrapper.py`:
- Around line 1034-1099: The docstring for _setup_data_module incorrectly states
num_workers defaults to 2 while the function signature defaults it to 0; update
the NumPy-style docstring for the num_workers parameter to reflect the actual
default of 0 (or remove the explicit default text), and ensure the description
matches the behavior used when constructing BoltzInferenceDataModule (refer to
parameter num_workers and its usage in BoltzInferenceDataModule instantiation).
- Around line 569-642: The docstring for _setup_data_module incorrectly states
the default for num_workers is 8 while the function signature defaults it to 0;
update the NumPy-style docstring for the num_workers parameter in
_setup_data_module to reflect the actual default (0) or remove the default
mention so it matches the signature and coding guidelines, making sure the
parameter description aligns with the function's behavior.

In `@src/sampleworks/runs/cli.py`:
- Around line 101-133: The _filter_only implementation currently builds keep =
[j for j in preset.jobs if j.name in names], so the returned Preset preserves
preset.jobs order rather than the order specified in the --only string; update
the docstring of _filter_only to explicitly state that the resulting Preset
preserves the original preset job ordering (preset.jobs) and does not reorder
jobs to match the command-line --only sequence (mention names and keep to make
the connection), so users know that --only filters but does not control
launch/log order.

In `@src/sampleworks/runs/loader.py`:
- Around line 93-102: The current loader reads TOML via Path.read_text() and
tomllib.loads which uses the locale codec and can mis-decode non-ASCII; instead
open the real file in binary and call tomllib.load() on the file object for both
local files (Path(name_or_path) when path.suffix == ".toml") and bundled files
(the resources.files(...) / f"{name_or_path}.toml" object), ensuring you replace
uses of Path.read_text() and tomllib.loads with binary-open +
tomllib.load(file_obj) while keeping the same FileNotFoundError behavior and
references to list_bundled_presets() unchanged.

In `@src/sampleworks/runs/runner.py`:
- Around line 115-143: The helper _pixi_env_python (and the related
_job_env/get_pixi_env_process_env) is duplicated in run_grid_search.py as
get_pixi_env_python; extract these shared functions into a new shared module
(e.g. create sampleworks.runs.pixi_env or sampleworks.utils.pixi_env) and move
the logic for SAMPLEWORKS_FORCE_PIXI and SAMPLEWORKS_<ENV>_PYTHON overrides
there, then update both runner.py (use _pixi_env_python import) and
run_grid_search.py (use get_pixi_env_python import) to import the centralized
helpers so the env resolution rules are implemented in one place.
- Around line 230-252: The code calls build_invocations(preset,
results_dir=results_dir) twice—first to compute pixi_envs then again to get
invocations—wasting work; either remove the second call and reuse the original
invocations for launching, or (if _prepare_pixi_env can change invocation
resolution) add an explicit comment and a clear conditional rebuild: compute
invocations once, derive pixi_envs from invocations, call _prepare_pixi_env for
each pixi_env, then if _prepare_pixi_env may alter what build_invocations would
return re-run build_invocations only when a deterministic flag indicates envs
were mutated (or compare before/after state), otherwise reuse the initial
invocations; reference functions build_invocations, _prepare_pixi_env, and the
pixi_envs variable when making the change.

In `@src/sampleworks/utils/guidance_script_arguments.py`:
- Around line 45-76: The docstring for _resolve_checkpoint is outdated: it
claims baked-in Docker paths are tried first but the function actually prefers a
model-specific environment variable (from _CHECKPOINT_ENV_VARS) before falling
back to _CHECKPOINT_CANDIDATES; update the NumPy-style docstring of
_resolve_checkpoint to state the actual precedence (check model-specific env var
if set, then candidate baked-in paths, then legacy/development paths) and note
that if no candidates exist an empty string triggers a ValueError; reference
_CHECKPOINT_ENV_VARS and _CHECKPOINT_CANDIDATES in the docstring to make the
precedence explicit.

In `@tests/runs/test_runner.py`:
- Around line 137-147: The test test_dry_run_does_not_create_directories only
asserts that (results_dir / "rf3") does not exist, which misses other unexpected
per-job dirs; update the assertion after calling runner.run(preset,
results_dir=results_dir, dry_run=True) to confirm results_dir exists (since run
creates it for logs) and that it contains no per-job subdirectories (e.g.,
assert results_dir.exists() and list(results_dir.iterdir()) == []), or
alternatively parametrize the test over multiple presets so runner.run never
creates any per-job subdirs during dry_run; ensure you reference the existing
variables/presets (results_dir, preset) and the test function name
test_dry_run_does_not_create_directories when making the change.
- Around line 42-92: Several test functions (test_argv_omits_false_bool_flags,
test_explicit_output_dir_in_args_wins_over_subdir_default,
test_all_models_has_four_jobs_with_distinct_gpus,
test_protenix_dual_uses_different_checkpoints,
test_rf3_partial_chiral_off_flag_present) are missing NumPy-style docstrings;
add a single-line NumPy-style docstring under each def that briefly describes
the contract being tested (e.g., what input/state and what behavior/assertion is
expected) so they match the project's docstring guideline and neighboring tests'
style.
- Around line 150-163: The helper _argv_to_dict silently treats a value that
begins with "--" as a boolean flag; change it to explicitly fail instead and/or
document the precondition: inside _argv_to_dict (and around the loop using
variables flag and tail), when you see that i+1 < len(tail) and
tail[i+1].startswith("--"), raise an AssertionError or ValueError with a clear
message like "unexpected token starting with '--' as value for {flag}", or
alternatively add a docstring comment above _argv_to_dict stating the
precondition that values must not begin with "--" so callers know this
limitation. Ensure the error message references the offending flag/token for
easier debugging.
- Around line 42-92: Tests flake because environment variables like
DATA_DIR/RESULTS_DIR/CHECKPOINTS_DIR/etc. can be set externally; add an autouse
pytest fixture that clears or sets to known values the problematic env vars
before each test to ensure loader.load_preset and runner.build_invocations see a
deterministic environment. Implement a fixture (in this test module or
conftest.py) named e.g. clear_loader_env or reset_loader_env with
`@pytest.fixture`(autouse=True) that pops/overrides DATA_DIR, RESULTS_DIR,
CHECKPOINTS_DIR, EXTRA_CHECKPOINTS_DIR, SAMPLEWORKS_FORCE_PIXI,
SAMPLEWORKS_PIXI_PROJECT_DIR, SAMPLEWORKS_GRID_SEARCH_SCRIPT (and any other
loader interpolated vars) and ensures HOME is set if needed; then remove the
ad-hoc monkeypatch.setenv("HOME", ...) calls from the test functions (e.g.
test_argv_omits_false_bool_flags,
test_all_models_has_four_jobs_with_distinct_gpus,
test_protenix_dual_uses_different_checkpoints,
test_rf3_partial_chiral_off_flag_present) so all tests consistently use the
fixture-controlled environment when calling loader.load_preset and
runner.build_invocations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8ce1b8b-2a8d-4612-b59c-10c4b006cfd2

📥 Commits

Reviewing files that changed from the base of the PR and between 9fc7ca9 and 2da1e2d.

📒 Files selected for processing (25)

.actlignore
AGENTS.md
Dockerfile
README.md
pyproject.toml
run_all_models.sh
run_grid_search.py
src/sampleworks/models/boltz/wrapper.py
src/sampleworks/runs/__init__.py
src/sampleworks/runs/cli.py
src/sampleworks/runs/loader.py
src/sampleworks/runs/presets/all_models.toml
src/sampleworks/runs/presets/protenix_dual.toml
src/sampleworks/runs/presets/rf3_partial.toml
src/sampleworks/runs/presets/rf3_partial_chiral_off.toml
src/sampleworks/runs/presets/rf3_protenix.toml
src/sampleworks/runs/runner.py
src/sampleworks/runs/schema.py
src/sampleworks/utils/guidance_script_arguments.py
src/sampleworks/utils/guidance_script_utils.py
tests/runs/__init__.py
tests/runs/conftest.py
tests/runs/test_cli.py
tests/runs/test_loader.py
tests/runs/test_runner.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@run_experiments`:
- Around line 63-97: The script currently always rewrites manifest_proteins_csv
and sets PROTEINS_CSV before checking for dry-run flags; change the control flow
so the manifest rewrite (the block that reads source_proteins_csv, creates
manifest_dir, replaces legacy_data_dir and writes manifest_proteins_csv, and
exports PROTEINS_CSV) only runs when needs_runtime_paths=1 (i.e., after the
for‑arg loop that sets needs_runtime_paths). Concretely, move or guard the
entire manifest creation block with a conditional like: if [[
$needs_runtime_paths -eq 1 ]]; then ... fi (referencing source_proteins_csv,
manifest_proteins_csv, legacy_data_dir, PROTEINS_CSV and needs_runtime_paths) so
--dry-run/--show/--list do not perform filesystem writes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be4004e6-0e54-498a-8c97-7ffa7976036e

📥 Commits

Reviewing files that changed from the base of the PR and between 3d771dc and 6869dcb.

📒 Files selected for processing (7)

Dockerfile
GRID_SEARCH.md
README.md
run_all_models.sh
run_experiments
run_experiments.sh
tests/models/boltz/test_boltz_wrapper.py

✅ Files skipped from review due to trivial changes (2)

run_experiments.sh
GRID_SEARCH.md

marcuscollins

Thanks for the changes. Approved.

mag-astera and others added 9 commits May 18, 2026 18:38

style(runs): apply ruff format

c3f0d12

CI's lint job runs both `ruff check` and `ruff format --check`. The prior commit fixed only the former. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(runs): make ACTL sampleworks image self-contained

2da1e2d

manzuoni-astera had a problem deploying to gpu-testing May 21, 2026 16:17 — with GitHub Actions Error

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread Dockerfile

Comment thread src/sampleworks/runs/loader.py

Comment thread src/sampleworks/runs/loader.py Outdated

Comment thread src/sampleworks/runs/runner.py Outdated

Comment thread src/sampleworks/runs/runner.py

Comment thread tests/runs/conftest.py

Merge branch 'main' into feat-runs-yaml-presets

3d771dc

xraymemory had a problem deploying to gpu-testing May 21, 2026 17:27 — with GitHub Actions Error

fix(runs): align experiment entrypoint and boltz tests

6869dcb

manzuoni-astera had a problem deploying to gpu-testing May 21, 2026 19:30 — with GitHub Actions Error

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread run_experiments Outdated

fix(runs): prefer synced workspace source

e56d1f1

Copilot AI review requested due to automatic review settings May 22, 2026 15:11

manzuoni-astera had a problem deploying to gpu-testing May 22, 2026 15:11 — with GitHub Actions Error

Copilot started reviewing on behalf of manzuoni-astera May 22, 2026 15:12 View session

feat(runs): add standalone model presets

e99e5f2

manzuoni-astera had a problem deploying to gpu-testing May 23, 2026 18:16 — with GitHub Actions Error

feat(runs): auto-assign preset GPUs

d86a299

manzuoni-astera had a problem deploying to gpu-testing May 23, 2026 18:34 — with GitHub Actions Error

manzuoni-astera had a problem deploying to gpu-testing May 23, 2026 18:53 — with GitHub Actions Error

manzuoni-astera force-pushed the feat-runs-yaml-presets branch from fd644fd to d86a299 Compare May 23, 2026 18:58

manzuoni-astera had a problem deploying to gpu-testing May 23, 2026 18:58 — with GitHub Actions Error

fix(runs): clear GPU override counterpart

b508c51

manzuoni-astera had a problem deploying to gpu-testing May 23, 2026 21:25 — with GitHub Actions Error

ci(docker): tag Harbor sampleworks image

797600c

manzuoni-astera had a problem deploying to gpu-testing May 25, 2026 18:26 — with GitHub Actions Error

marcuscollins requested changes May 25, 2026

View reviewed changes

fix(runs): address ACTL review feedback

866c4da

manzuoni-astera had a problem deploying to gpu-testing May 25, 2026 23:18 — with GitHub Actions Error

docs(runs): document runtime pixi opt-in

e1f8341

marcuscollins approved these changes May 26, 2026

View reviewed changes

Conversation

manzuoni-astera commented May 21, 2026 • edited by xraymemory Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Using run_experiments

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

manzuoni-astera commented May 21, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcuscollins left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

manzuoni-astera commented May 21, 2026 •

edited by xraymemory

Loading

Using `run_experiments`

coderabbitai Bot commented May 21, 2026 •

edited

Loading