ACTL sampleworks image + run_experiments script#240
Conversation
Adds a new `sampleworks-runs` CLI under `src/sampleworks/runs/` that launches parallel `run_grid_search.py` jobs from a single TOML preset. Five bundled presets (`all_models`, `rf3_partial`, `rf3_partial_chiral_off`, `protenix_dual`, `rf3_protenix`) cover the canonical multi-model sweeps. Each preset declares its jobs (pixi env, GPU assignment, args); the runner sets `CUDA_VISIBLE_DEVICES`, shells out via `pixi run -e <env>`, tees per-job logs, and aggregates exit codes. Dotted-path `--set` overrides let users sweep parameters without editing TOML, e.g.: sampleworks-runs rf3_partial --set jobs.rf3.args.gradient-weights="0.0 0.01" Pure stdlib (tomllib + dataclasses + argparse + subprocess), no new deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ruff auto-fixed import ordering (I001) and unused imports (F401); two
remaining were a leftover unused local `rf3 = preset.job("rf3")` (F841)
and one over-100-char assertion (E501) extracted into a local.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's lint job runs both `ruff check` and `ruff format --check`. The prior commit fixed only the former. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- cli.py: `--only` filter now preserves `shared_args`; previously the filtered Preset dropped them, silently losing every job's shared flags. - loader.py: `--set` now rejects unknown top-level keys. A typo like `--set job.rf3.gpus=0` (note missing 's') used to auto-create an unused `job` dict and silently no-op; now it raises KeyError with the valid keys listed. - runner.py: handle partial spawn failures. If one job fails to spawn midway, already-launched jobs are terminated and joined instead of orphaned. Also wraps Popen in try/close to avoid log-file handle leak if the subprocess fails to start. - presets/rf3_partial_chiral_off.toml: change `output_subdir = "."` (write to RESULTS_DIR root) to `"rf3"` (subdir under RESULTS_DIR) for consistency with the other RF3 presets and collision safety when RESULTS_DIR is overridden to a shared location. Skipped (faithful to bash original): rf3_protenix.toml asymmetric partial-diffusion-step — the source `run_rf3_protenix_mdc_actl.sh` deliberately set it only for the Protenix job, not RF3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aligns the new sampleworks.runs module with project style policy (AGENTS.md L7 and L338): - Add NumPy-style docstrings (summary, Parameters, Returns, Raises) to every public and private function/class across schema.py, loader.py, runner.py, and cli.py. - Mark Job, Preset, JobInvocation, and _RunningJob as @DataClass(frozen=True). Behavior is preserved: Preset.effective_args already returns a fresh dict, and the runner mutates that local copy rather than the Preset itself. All 32 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the actual mounted layout on the ACTL pod:
/mnt/diffuse-shared/raw/sampleworks/
├── initial_dataset_40/ # DATA_DIR for all_models
├── initial_dataset_40_occ_sweeps/ # DATA_DIR for occ-sweep presets
├── actl_msa_cache/ # MSA_CACHE_DIR for all presets
└── actl_results/<preset_name>/ # RESULTS_DIR namespaced per preset
Per-preset RESULTS_DIR avoids cross-preset collisions when multiple
presets share output_subdirs (e.g. all_models and rf3_partial both have
a job named "rf3").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The proteins.csv files in the shared dataset reference /data/inputs/ paths that only resolve after the pod-init script (actl_setup_sampleworks_paths.sh) creates the canonical symlinks. Point presets at those canonical paths so everything stays consistent: DATA_DIR /data/input | /data/inputs RESULTS_DIR /data/results/<preset_name> MSA_CACHE_DIR /root/.sampleworks /data/results is namespaced by the pod-init script via $SAMPLEWORKS_ACTL_RUN_NAME (defaults to $HOSTNAME), so per-preset subdirs sit inside a per-session root. README documents the one-time setup script step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_grid_search.py writes its work-queue pickle (wjq_*.pkl) directly into --output-dir without creating it first, so the orchestrator must. Previously only results_dir itself was mkdir'd, which broke any preset whose output_subdir didn't already exist on disk. JobInvocation now carries the resolved output_dir explicitly so _spawn can mkdir it alongside the log directory. --dry-run still doesn't touch the filesystem. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a TOML-driven preset system with a CLI and runner to launch parallel multi-job grid-searches on ACTL, bundled presets, shell wrappers delegating to the Python runner, GPU/checkpoint validation and env handling, Boltz I/O defaults, Docker/packaging updates, docs, and comprehensive tests. ChangesTOML Preset Orchestration System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
CI is currently blocked waiting for |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Dockerfile (1)
107-108:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate stale source comment to match Harbor.
Line 107 says Docker Hub, but the actual source is Harbor. Keeping this aligned avoids operational confusion during image maintenance.
Suggested change
-# Bake in model checkpoints from pre-built base image on Docker Hub +# Bake in model checkpoints from pre-built Harbor image🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile` around lines 107 - 108, Update the stale comment that says "Docker Hub" to reference "Harbor" so the source note is accurate; edit the comment block containing the line "# Bake in model checkpoints from pre-built base image on Docker Hub" (in the Dockerfile) to say "Harbor" instead of "Docker Hub" to keep documentation aligned with the actual image source.
🧹 Nitpick comments (14)
README.md (1)
162-162: ⚡ Quick winUse a generic pod name in the example.
The example uses "sampleworks-pr236", which is tied to a specific PR number and may confuse users. Consider a more generic name like "my-sampleworks-pod" or just "sampleworks".
📝 Suggested improvement
-actl pod up sampleworks-pr236 --profile 8x --image sampleworks --storage shared --pvc-size 200Gi --mount diffuse-shared --yes +actl pod up my-sampleworks-pod --profile 8x --image sampleworks --storage shared --pvc-size 200Gi --mount diffuse-shared --yes🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 162, Replace the PR-specific pod name in the example command (the string "actl pod up sampleworks-pr236 --profile 8x --image sampleworks --storage shared --pvc-size 200Gi --mount diffuse-shared --yes") with a generic name such as "my-sampleworks-pod" or simply "sampleworks" so the README example is not tied to a specific PR number.src/sampleworks/runs/loader.py (1)
93-102: ⚡ Quick winRead TOML as UTF-8 via binary mode +
tomllib.load()
tomllib.load()expects a binary file object (open(..., "rb")), and TOML files are UTF-8 by spec. UsingPath.read_text()decodes with the locale encoding by default, which can causeUnicodeDecodeErroror mojibake on non-UTF-8 systems when presets contain non-ASCII.♻️ Suggested change
- if path.suffix == ".toml" and path.exists(): - return tomllib.loads(path.read_text()) + if path.suffix == ".toml" and path.exists(): + with path.open("rb") as f: + return tomllib.load(f) bundled = resources.files(_BUNDLED_PRESETS_PACKAGE) / f"{name_or_path}.toml" if not bundled.is_file(): raise FileNotFoundError( f"No preset {name_or_path!r}. Bundled: {list_bundled_presets()}. " f"Or pass a path to a .toml file." ) - return tomllib.loads(bundled.read_text()) + with bundled.open("rb") as f: + return tomllib.load(f)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/runs/loader.py` around lines 93 - 102, The current loader reads TOML via Path.read_text() and tomllib.loads which uses the locale codec and can mis-decode non-ASCII; instead open the real file in binary and call tomllib.load() on the file object for both local files (Path(name_or_path) when path.suffix == ".toml") and bundled files (the resources.files(...) / f"{name_or_path}.toml" object), ensuring you replace uses of Path.read_text() and tomllib.loads with binary-open + tomllib.load(file_obj) while keeping the same FileNotFoundError behavior and references to list_bundled_presets() unchanged.run_grid_search.py (1)
222-246: ⚡ Quick winWorker-side
cmd/envselection is correct, but duplicates the runner's env resolution.The branch between
env_python/pixi runand the env construction inget_pixi_env_process_envmirrorrunner._pixi_env_pythonandrunner._job_envexactly. See the corresponding comment onsrc/sampleworks/runs/runner.py:115-143— hoisting these into a shared helper would letrun_guidance_queue_scriptand the new runner stay in sync as we evolve the activation contract (e.g. if we addLD_LIBRARY_PATHorPIXI_PROJECT_MANIFEST).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@run_grid_search.py` around lines 222 - 246, The env/command selection in run_guidance_queue_script duplicates activation logic already in runner._pixi_env_python and runner._job_env; extract a shared helper (e.g., create_pixi_process_cmd_env or similar) that encapsulates the logic using get_pixi_env, get_pixi_env_python and get_pixi_env_process_env, then replace the duplicated branch in run_guidance_queue_script to call this helper so both run_guidance_queue_script and the runner use the same implementation and remain in sync as activation requirements evolve.run_all_models.sh (1)
78-87: 💤 Low value
runner_envis hardcoded torf3by default — confirm intent.The runner module itself only orchestrates subprocess launches and never imports model code, so any of the three pixi envs would work. Defaulting to
rf3is fine on the ACTL image (it's always baked), but if a downstream consumer slims the image and ships onlyboltz, the wrapper would unexpectedly fall through topixi run -e rf3 ...and fail. A brief comment or selecting any-available env would make the intent clearer.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@run_all_models.sh` around lines 78 - 87, The defaulting of runner_env to "rf3" is brittle; update the logic in run_all_models.sh around runner_env, SAMPLEWORKS_RUNNER_ENV, pixi_project_dir and runner_python so it either documents the intent with a concise comment or dynamically picks an available env instead of hardcoding "rf3" — e.g., if SAMPLEWORKS_RUNNER_ENV is unset, scan $pixi_project_dir/.pixi/envs for available env names (or iterate a preferred list like rf3, boltz, <other>) and set runner_env to the first existing one; ensure runner_python continues to be constructed from the chosen runner_env and add a one-line comment explaining why a specific env is selected or why any available env is acceptable.src/sampleworks/runs/runner.py (2)
115-143: ⚡ Quick win
_pixi_env_pythonis duplicated inrun_grid_search.py(get_pixi_env_python).This helper, along with
_job_env/get_pixi_env_process_env, is byte-for-byte equivalent to the worker-side copy added inrun_grid_search.py. Having two copies of the env resolution rules (including theSAMPLEWORKS_FORCE_PIXIopt-out andSAMPLEWORKS_<ENV>_PYTHONoverrides) means they will drift. Consider hoisting both helpers into a shared module (e.g.sampleworks.runs.pixi_envorsampleworks.utils.pixi_env) and importing from both call sites.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/runs/runner.py` around lines 115 - 143, The helper _pixi_env_python (and the related _job_env/get_pixi_env_process_env) is duplicated in run_grid_search.py as get_pixi_env_python; extract these shared functions into a new shared module (e.g. create sampleworks.runs.pixi_env or sampleworks.utils.pixi_env) and move the logic for SAMPLEWORKS_FORCE_PIXI and SAMPLEWORKS_<ENV>_PYTHON overrides there, then update both runner.py (use _pixi_env_python import) and run_grid_search.py (use get_pixi_env_python import) to import the centralized helpers so the env resolution rules are implemented in one place.
230-252: 💤 Low valueRedundant
build_invocationscall.
build_invocationsis invoked at line 232 and again at line 242 with identical arguments. The first result is only used to derivepixi_envs, then discarded. The rebuild is presumably defensive in case_prepare_pixi_envmaterializes new env binaries that change the result of_pixi_env_python, but this is implicit and wasteful (re-runsos.environlookups andos.accessprobes per job). Consider either dropping the rebuild or explicitly commenting why it's needed.♻️ Possible simplification
- invocations = build_invocations(preset, results_dir=results_dir) - - if dry_run: + if dry_run: + invocations = build_invocations(preset, results_dir=results_dir) for inv in invocations: _print_dry_run(inv) return 0 - pixi_envs = sorted({inv.job.env for inv in invocations}) + pixi_envs = sorted({job.env for job in preset.jobs}) for pixi_env in pixi_envs: _prepare_pixi_env(pixi_env) invocations = build_invocations(preset, results_dir=results_dir)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/runs/runner.py` around lines 230 - 252, The code calls build_invocations(preset, results_dir=results_dir) twice—first to compute pixi_envs then again to get invocations—wasting work; either remove the second call and reuse the original invocations for launching, or (if _prepare_pixi_env can change invocation resolution) add an explicit comment and a clear conditional rebuild: compute invocations once, derive pixi_envs from invocations, call _prepare_pixi_env for each pixi_env, then if _prepare_pixi_env may alter what build_invocations would return re-run build_invocations only when a deterministic flag indicates envs were mutated (or compare before/after state), otherwise reuse the initial invocations; reference functions build_invocations, _prepare_pixi_env, and the pixi_envs variable when making the change.src/sampleworks/runs/cli.py (1)
101-133: 💤 Low value
_filter_onlypreserves preset order, not user-specified order.
keep = [j for j in preset.jobs if j.name in names]iteratespreset.jobs, so the--onlyflag silently ignores the order given on the command line. This is fine for parallel runs (order is irrelevant), but worth noting in the docstring so users don't expect e.g.--only protenix,rf3to influence log/launch ordering.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/runs/cli.py` around lines 101 - 133, The _filter_only implementation currently builds keep = [j for j in preset.jobs if j.name in names], so the returned Preset preserves preset.jobs order rather than the order specified in the --only string; update the docstring of _filter_only to explicitly state that the resulting Preset preserves the original preset job ordering (preset.jobs) and does not reorder jobs to match the command-line --only sequence (mention names and keep to make the connection), so users know that --only filters but does not control launch/log order.tests/runs/test_runner.py (4)
137-147: 💤 Low valueDry-run assertion is one-sided — also assert the rf3 dir is the only thing missing.
The test only checks that
results_dir/"rf3"doesn't exist. If a future change accidentally createsresults_dir/<other_job>(e.g. when dry-run is requested forall_models), this test wouldn't catch it. Consider either:
- Asserting
results_dir.exists()andlist(results_dir.iterdir()) == [](no per-job subdirs created), or- Parametrising over multiple presets so any per-job mkdir leak is caught.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runs/test_runner.py` around lines 137 - 147, The test test_dry_run_does_not_create_directories only asserts that (results_dir / "rf3") does not exist, which misses other unexpected per-job dirs; update the assertion after calling runner.run(preset, results_dir=results_dir, dry_run=True) to confirm results_dir exists (since run creates it for logs) and that it contains no per-job subdirectories (e.g., assert results_dir.exists() and list(results_dir.iterdir()) == []), or alternatively parametrize the test over multiple presets so runner.run never creates any per-job subdirs during dry_run; ensure you reference the existing variables/presets (results_dir, preset) and the test function name test_dry_run_does_not_create_directories when making the change.
42-92: 💤 Low valueMissing NumPy-style docstrings on several tests.
test_argv_omits_false_bool_flags(line 42),test_explicit_output_dir_in_args_wins_over_subdir_default(line 51),test_all_models_has_four_jobs_with_distinct_gpus(line 67),test_protenix_dual_uses_different_checkpoints(line 78), andtest_rf3_partial_chiral_off_flag_present(line 87) all lack docstrings while neighbouring tests have them. A one-liner explaining the contract under test is sufficient.As per coding guidelines: "Always include NumPy-style docstrings for every function and class."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runs/test_runner.py` around lines 42 - 92, Several test functions (test_argv_omits_false_bool_flags, test_explicit_output_dir_in_args_wins_over_subdir_default, test_all_models_has_four_jobs_with_distinct_gpus, test_protenix_dual_uses_different_checkpoints, test_rf3_partial_chiral_off_flag_present) are missing NumPy-style docstrings; add a single-line NumPy-style docstring under each def that briefly describes the contract being tested (e.g., what input/state and what behavior/assertion is expected) so they match the project's docstring guideline and neighboring tests' style.
150-163: 💤 Low valueHelper is correct but brittle on flags whose value starts with
--.The single-dash check (
startswith("--")) means a future flag whose value happens to begin with--(e.g. a passthrough string, or a path containing--) silently becomes a boolean. Not a current bug given the presets, but worth a comment so the next reader understands the precondition, or upgrade the assertion to raise on unexpected shapes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runs/test_runner.py` around lines 150 - 163, The helper _argv_to_dict silently treats a value that begins with "--" as a boolean flag; change it to explicitly fail instead and/or document the precondition: inside _argv_to_dict (and around the loop using variables flag and tail), when you see that i+1 < len(tail) and tail[i+1].startswith("--"), raise an AssertionError or ValueError with a clear message like "unexpected token starting with '--' as value for {flag}", or alternatively add a docstring comment above _argv_to_dict stating the precondition that values must not begin with "--" so callers know this limitation. Ensure the error message references the offending flag/token for easier debugging.
42-92: ⚡ Quick winTests can flake based on shell env vars — clear loader inputs in a fixture.
Only
test_argv_for_rf3_partial_matches_bashclearsDATA_DIR/RESULTS_DIR, yet the other tests assert hard-coded paths the loader interpolates from environment (e.g./data/inputs/proteins.csvat line 28 via shared default,/checkpoints/rf3_foundry_01_24_latest.ckptat line 34,/extra_checkpoints/protenix_*at lines 83–84). If a developer or CI worker has any ofDATA_DIR,RESULTS_DIR,CHECKPOINTS_DIR,EXTRA_CHECKPOINTS_DIR,SAMPLEWORKS_FORCE_PIXI,SAMPLEWORKS_PIXI_PROJECT_DIR, orSAMPLEWORKS_GRID_SEARCH_SCRIPTexported, these assertions break in a confusing way.Centralise the cleanup in an autouse fixture (or in
conftest.py) so every test starts from a known env.♻️ Suggested fixture
+@pytest.fixture(autouse=True) +def _isolate_runner_env(monkeypatch: pytest.MonkeyPatch) -> None: + """Clear env vars consumed by loader/runner so tests are hermetic.""" + for var in ( + "DATA_DIR", + "RESULTS_DIR", + "CHECKPOINTS_DIR", + "EXTRA_CHECKPOINTS_DIR", + "SAMPLEWORKS_FORCE_PIXI", + "SAMPLEWORKS_PIXI_PROJECT_DIR", + "SAMPLEWORKS_GRID_SEARCH_SCRIPT", + ): + monkeypatch.delenv(var, raising=False) + monkeypatch.setenv("HOME", "/home/test")Then drop the per-test
setenv("HOME", ...)and the ad-hocdelenvcalls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runs/test_runner.py` around lines 42 - 92, Tests flake because environment variables like DATA_DIR/RESULTS_DIR/CHECKPOINTS_DIR/etc. can be set externally; add an autouse pytest fixture that clears or sets to known values the problematic env vars before each test to ensure loader.load_preset and runner.build_invocations see a deterministic environment. Implement a fixture (in this test module or conftest.py) named e.g. clear_loader_env or reset_loader_env with `@pytest.fixture`(autouse=True) that pops/overrides DATA_DIR, RESULTS_DIR, CHECKPOINTS_DIR, EXTRA_CHECKPOINTS_DIR, SAMPLEWORKS_FORCE_PIXI, SAMPLEWORKS_PIXI_PROJECT_DIR, SAMPLEWORKS_GRID_SEARCH_SCRIPT (and any other loader interpolated vars) and ensures HOME is set if needed; then remove the ad-hoc monkeypatch.setenv("HOME", ...) calls from the test functions (e.g. test_argv_omits_false_bool_flags, test_all_models_has_four_jobs_with_distinct_gpus, test_protenix_dual_uses_different_checkpoints, test_rf3_partial_chiral_off_flag_present) so all tests consistently use the fixture-controlled environment when calling loader.load_preset and runner.build_invocations.src/sampleworks/models/boltz/wrapper.py (2)
1034-1099: ⚡ Quick winUpdate docstring to reflect changed default.
The
num_workersparameter docstring states "by default 2" but the signature now defaults to0. As per coding guidelines, NumPy-style docstrings should accurately document all parameters.📝 Suggested docstring fix
num_workers: int, optional - Number of parallel workers for input data processing, by default 2 + Number of parallel workers for input data processing, by default 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/models/boltz/wrapper.py` around lines 1034 - 1099, The docstring for _setup_data_module incorrectly states num_workers defaults to 2 while the function signature defaults it to 0; update the NumPy-style docstring for the num_workers parameter to reflect the actual default of 0 (or remove the explicit default text), and ensure the description matches the behavior used when constructing BoltzInferenceDataModule (refer to parameter num_workers and its usage in BoltzInferenceDataModule instantiation).
569-642: ⚡ Quick winUpdate docstring to reflect changed default.
The
num_workersparameter docstring states "by default 8" but the signature now defaults to0. As per coding guidelines, NumPy-style docstrings should accurately document all parameters.📝 Suggested docstring fix
num_workers: int, optional - Number of parallel workers for input data processing, by default 8 + Number of parallel workers for input data processing, by default 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/models/boltz/wrapper.py` around lines 569 - 642, The docstring for _setup_data_module incorrectly states the default for num_workers is 8 while the function signature defaults it to 0; update the NumPy-style docstring for the num_workers parameter in _setup_data_module to reflect the actual default (0) or remove the default mention so it matches the signature and coding guidelines, making sure the parameter description aligns with the function's behavior.src/sampleworks/utils/guidance_script_arguments.py (1)
45-76: ⚡ Quick winUpdate docstring to document environment variable precedence.
The docstring states "Tries baked-in Docker paths first" but the implementation now checks model-specific environment variables (e.g.,
BOLTZ1_CHECKPOINT) before falling back to baked-in candidates. As per coding guidelines, NumPy-style docstrings should accurately document function behavior.📝 Suggested docstring update
def _resolve_checkpoint(model_key: str) -> str: - """Return the first checkpoint path that exists on disk for *model_key*. + """Return the first checkpoint path that exists on disk for *model_key*. - Tries baked-in Docker paths first (``/checkpoints/``), then falls back to - legacy development paths. If none are found the first candidate is returned - so that downstream validation produces a clear error message. + Checks model-specific environment variables (e.g., ``BOLTZ1_CHECKPOINT``) + first, then tries baked-in Docker paths (``/checkpoints/``), ACTL shared + storage, and finally legacy development paths. If none are found, raises + ValueError with the list of checked candidates and environment variable hint. + + Parameters + ---------- + model_key : str + Model identifier (e.g., "boltz1", "rf3", "protenix"). + + Returns + ------- + str + Resolved checkpoint path. + + Raises + ------ + ValueError + If no checkpoint candidates exist or model_key is unknown. """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/utils/guidance_script_arguments.py` around lines 45 - 76, The docstring for _resolve_checkpoint is outdated: it claims baked-in Docker paths are tried first but the function actually prefers a model-specific environment variable (from _CHECKPOINT_ENV_VARS) before falling back to _CHECKPOINT_CANDIDATES; update the NumPy-style docstring of _resolve_checkpoint to state the actual precedence (check model-specific env var if set, then candidate baked-in paths, then legacy/development paths) and note that if no candidates exist an empty string triggers a ValueError; reference _CHECKPOINT_ENV_VARS and _CHECKPOINT_CANDIDATES in the docstring to make the precedence explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Dockerfile`:
- Line 111: The Dockerfile currently copies the checkpoints image using the
mutable tag in the COPY instruction (COPY
--from=harbor.astera.sh/library/sampleworks-checkpoints:latest /checkpoints/
/checkpoints/); replace the :latest tag with the image's immutable digest
(`@sha256`:...) to make builds deterministic. Obtain the digest for
harbor.astera.sh/library/sampleworks-checkpoints (e.g., via docker pull then
docker inspect --format='{{index .RepoDigests 0}}' or via the registry API /
manifest) and update the COPY --from reference to use that `@sha256` digest so the
build always uses the exact same artifact.
In `@src/sampleworks/runs/loader.py`:
- Around line 369-374: The _expand fixed-point loop can hang on cyclic variable
references (prev/current never stabilize); add a hard iteration cap (e.g.,
MAX_EXPAND_ITERATIONS = 1000) inside the _expand function and increment a
counter each loop iteration, breaking and raising a clear error (ValueError) if
the cap is exceeded; locate the loop using the symbols prev, current and the
call _VAR_PATTERN.sub(repl, current) and ensure the raised error message
explains a cyclic variable expansion was detected and suggests inspecting the
variable map/overrides.
- Around line 1-7: Update the module docstring to reflect the actual resolution
order used by load_preset: state that CLI --set overrides (e.g., --set
defaults.DATA_DIR=...) are applied before variable interpolation, and then
${VAR} references are resolved against the process environment with the preset's
[defaults] filling in any unset keys; reference load_preset as the function
enforcing this order so readers know where the behavior originates.
In `@src/sampleworks/runs/runner.py`:
- Around line 255-269: The _terminate_all function can hang because
proc.terminate() sends SIGTERM and proc.wait() is called without a timeout;
change _terminate_all (and handling of _RunningJob.proc and
_RunningJob.tee_thread) to wait with a bounded timeout per-process, catch
subprocess.TimeoutExpired, call proc.kill() if the terminate wait times out,
then wait again (with timeout) and proceed to join tee_thread also with a
timeout; ensure you import and handle subprocess.TimeoutExpired and avoid
infinite waits by applying these timeouts for both terminate and final wait
paths.
- Around line 384-406: The _spawn sequence creates log_file, starts Popen,
constructs the tee thread and then starts it, but the current try/except only
covers Popen so failures after Popen (e.g. Thread construction or the assert)
can leak log_file and leave the child process running without a reader; update
_spawn to widen the protection so that the try covers creation of log_file,
subprocess.Popen, thread creation and thread.start(), and on any exception
before the tee thread is running explicitly close log_file and terminate/kill
the proc (proc.kill() or proc.terminate() then wait) to avoid orphaned
processes, and replace the assert proc.stdout is not None with an explicit guard
that cleans up and raises a clear exception; keep references to _tee, inv, and
_RunningJob to locate where to implement these changes.
In `@tests/runs/conftest.py`:
- Around line 8-11: Update the autouse fixture force_pixi_argv to also
neutralize SAMPLEWORKS_GRID_SEARCH_SCRIPT and any SAMPLEWORKS_<ENV>_PYTHON
environment variables so tests are deterministic regardless of developer
environment; specifically, in force_pixi_argv use monkeypatch.delenv or
monkeypatch.setenv to clear SAMPLEWORKS_GRID_SEARCH_SCRIPT and iterate current
env keys that match the pattern r"^SAMPLEWORKS_.*_PYTHON$" (or explicitly list
known variants) and delete or unset them, leaving SAMPLEWORKS_FORCE_PIXI set to
"1" as before; this ensures runner._grid_search_script() and _pixi_env_python()
read a clean environment during tests.
---
Outside diff comments:
In `@Dockerfile`:
- Around line 107-108: Update the stale comment that says "Docker Hub" to
reference "Harbor" so the source note is accurate; edit the comment block
containing the line "# Bake in model checkpoints from pre-built base image on
Docker Hub" (in the Dockerfile) to say "Harbor" instead of "Docker Hub" to keep
documentation aligned with the actual image source.
---
Nitpick comments:
In `@README.md`:
- Line 162: Replace the PR-specific pod name in the example command (the string
"actl pod up sampleworks-pr236 --profile 8x --image sampleworks --storage shared
--pvc-size 200Gi --mount diffuse-shared --yes") with a generic name such as
"my-sampleworks-pod" or simply "sampleworks" so the README example is not tied
to a specific PR number.
In `@run_all_models.sh`:
- Around line 78-87: The defaulting of runner_env to "rf3" is brittle; update
the logic in run_all_models.sh around runner_env, SAMPLEWORKS_RUNNER_ENV,
pixi_project_dir and runner_python so it either documents the intent with a
concise comment or dynamically picks an available env instead of hardcoding
"rf3" — e.g., if SAMPLEWORKS_RUNNER_ENV is unset, scan
$pixi_project_dir/.pixi/envs for available env names (or iterate a preferred
list like rf3, boltz, <other>) and set runner_env to the first existing one;
ensure runner_python continues to be constructed from the chosen runner_env and
add a one-line comment explaining why a specific env is selected or why any
available env is acceptable.
In `@run_grid_search.py`:
- Around line 222-246: The env/command selection in run_guidance_queue_script
duplicates activation logic already in runner._pixi_env_python and
runner._job_env; extract a shared helper (e.g., create_pixi_process_cmd_env or
similar) that encapsulates the logic using get_pixi_env, get_pixi_env_python and
get_pixi_env_process_env, then replace the duplicated branch in
run_guidance_queue_script to call this helper so both run_guidance_queue_script
and the runner use the same implementation and remain in sync as activation
requirements evolve.
In `@src/sampleworks/models/boltz/wrapper.py`:
- Around line 1034-1099: The docstring for _setup_data_module incorrectly states
num_workers defaults to 2 while the function signature defaults it to 0; update
the NumPy-style docstring for the num_workers parameter to reflect the actual
default of 0 (or remove the explicit default text), and ensure the description
matches the behavior used when constructing BoltzInferenceDataModule (refer to
parameter num_workers and its usage in BoltzInferenceDataModule instantiation).
- Around line 569-642: The docstring for _setup_data_module incorrectly states
the default for num_workers is 8 while the function signature defaults it to 0;
update the NumPy-style docstring for the num_workers parameter in
_setup_data_module to reflect the actual default (0) or remove the default
mention so it matches the signature and coding guidelines, making sure the
parameter description aligns with the function's behavior.
In `@src/sampleworks/runs/cli.py`:
- Around line 101-133: The _filter_only implementation currently builds keep =
[j for j in preset.jobs if j.name in names], so the returned Preset preserves
preset.jobs order rather than the order specified in the --only string; update
the docstring of _filter_only to explicitly state that the resulting Preset
preserves the original preset job ordering (preset.jobs) and does not reorder
jobs to match the command-line --only sequence (mention names and keep to make
the connection), so users know that --only filters but does not control
launch/log order.
In `@src/sampleworks/runs/loader.py`:
- Around line 93-102: The current loader reads TOML via Path.read_text() and
tomllib.loads which uses the locale codec and can mis-decode non-ASCII; instead
open the real file in binary and call tomllib.load() on the file object for both
local files (Path(name_or_path) when path.suffix == ".toml") and bundled files
(the resources.files(...) / f"{name_or_path}.toml" object), ensuring you replace
uses of Path.read_text() and tomllib.loads with binary-open +
tomllib.load(file_obj) while keeping the same FileNotFoundError behavior and
references to list_bundled_presets() unchanged.
In `@src/sampleworks/runs/runner.py`:
- Around line 115-143: The helper _pixi_env_python (and the related
_job_env/get_pixi_env_process_env) is duplicated in run_grid_search.py as
get_pixi_env_python; extract these shared functions into a new shared module
(e.g. create sampleworks.runs.pixi_env or sampleworks.utils.pixi_env) and move
the logic for SAMPLEWORKS_FORCE_PIXI and SAMPLEWORKS_<ENV>_PYTHON overrides
there, then update both runner.py (use _pixi_env_python import) and
run_grid_search.py (use get_pixi_env_python import) to import the centralized
helpers so the env resolution rules are implemented in one place.
- Around line 230-252: The code calls build_invocations(preset,
results_dir=results_dir) twice—first to compute pixi_envs then again to get
invocations—wasting work; either remove the second call and reuse the original
invocations for launching, or (if _prepare_pixi_env can change invocation
resolution) add an explicit comment and a clear conditional rebuild: compute
invocations once, derive pixi_envs from invocations, call _prepare_pixi_env for
each pixi_env, then if _prepare_pixi_env may alter what build_invocations would
return re-run build_invocations only when a deterministic flag indicates envs
were mutated (or compare before/after state), otherwise reuse the initial
invocations; reference functions build_invocations, _prepare_pixi_env, and the
pixi_envs variable when making the change.
In `@src/sampleworks/utils/guidance_script_arguments.py`:
- Around line 45-76: The docstring for _resolve_checkpoint is outdated: it
claims baked-in Docker paths are tried first but the function actually prefers a
model-specific environment variable (from _CHECKPOINT_ENV_VARS) before falling
back to _CHECKPOINT_CANDIDATES; update the NumPy-style docstring of
_resolve_checkpoint to state the actual precedence (check model-specific env var
if set, then candidate baked-in paths, then legacy/development paths) and note
that if no candidates exist an empty string triggers a ValueError; reference
_CHECKPOINT_ENV_VARS and _CHECKPOINT_CANDIDATES in the docstring to make the
precedence explicit.
In `@tests/runs/test_runner.py`:
- Around line 137-147: The test test_dry_run_does_not_create_directories only
asserts that (results_dir / "rf3") does not exist, which misses other unexpected
per-job dirs; update the assertion after calling runner.run(preset,
results_dir=results_dir, dry_run=True) to confirm results_dir exists (since run
creates it for logs) and that it contains no per-job subdirectories (e.g.,
assert results_dir.exists() and list(results_dir.iterdir()) == []), or
alternatively parametrize the test over multiple presets so runner.run never
creates any per-job subdirs during dry_run; ensure you reference the existing
variables/presets (results_dir, preset) and the test function name
test_dry_run_does_not_create_directories when making the change.
- Around line 42-92: Several test functions (test_argv_omits_false_bool_flags,
test_explicit_output_dir_in_args_wins_over_subdir_default,
test_all_models_has_four_jobs_with_distinct_gpus,
test_protenix_dual_uses_different_checkpoints,
test_rf3_partial_chiral_off_flag_present) are missing NumPy-style docstrings;
add a single-line NumPy-style docstring under each def that briefly describes
the contract being tested (e.g., what input/state and what behavior/assertion is
expected) so they match the project's docstring guideline and neighboring tests'
style.
- Around line 150-163: The helper _argv_to_dict silently treats a value that
begins with "--" as a boolean flag; change it to explicitly fail instead and/or
document the precondition: inside _argv_to_dict (and around the loop using
variables flag and tail), when you see that i+1 < len(tail) and
tail[i+1].startswith("--"), raise an AssertionError or ValueError with a clear
message like "unexpected token starting with '--' as value for {flag}", or
alternatively add a docstring comment above _argv_to_dict stating the
precondition that values must not begin with "--" so callers know this
limitation. Ensure the error message references the offending flag/token for
easier debugging.
- Around line 42-92: Tests flake because environment variables like
DATA_DIR/RESULTS_DIR/CHECKPOINTS_DIR/etc. can be set externally; add an autouse
pytest fixture that clears or sets to known values the problematic env vars
before each test to ensure loader.load_preset and runner.build_invocations see a
deterministic environment. Implement a fixture (in this test module or
conftest.py) named e.g. clear_loader_env or reset_loader_env with
`@pytest.fixture`(autouse=True) that pops/overrides DATA_DIR, RESULTS_DIR,
CHECKPOINTS_DIR, EXTRA_CHECKPOINTS_DIR, SAMPLEWORKS_FORCE_PIXI,
SAMPLEWORKS_PIXI_PROJECT_DIR, SAMPLEWORKS_GRID_SEARCH_SCRIPT (and any other
loader interpolated vars) and ensures HOME is set if needed; then remove the
ad-hoc monkeypatch.setenv("HOME", ...) calls from the test functions (e.g.
test_argv_omits_false_bool_flags,
test_all_models_has_four_jobs_with_distinct_gpus,
test_protenix_dual_uses_different_checkpoints,
test_rf3_partial_chiral_off_flag_present) so all tests consistently use the
fixture-controlled environment when calling loader.load_preset and
runner.build_invocations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c8ce1b8b-2a8d-4612-b59c-10c4b006cfd2
📒 Files selected for processing (25)
.actlignoreAGENTS.mdDockerfileREADME.mdpyproject.tomlrun_all_models.shrun_grid_search.pysrc/sampleworks/models/boltz/wrapper.pysrc/sampleworks/runs/__init__.pysrc/sampleworks/runs/cli.pysrc/sampleworks/runs/loader.pysrc/sampleworks/runs/presets/all_models.tomlsrc/sampleworks/runs/presets/protenix_dual.tomlsrc/sampleworks/runs/presets/rf3_partial.tomlsrc/sampleworks/runs/presets/rf3_partial_chiral_off.tomlsrc/sampleworks/runs/presets/rf3_protenix.tomlsrc/sampleworks/runs/runner.pysrc/sampleworks/runs/schema.pysrc/sampleworks/utils/guidance_script_arguments.pysrc/sampleworks/utils/guidance_script_utils.pytests/runs/__init__.pytests/runs/conftest.pytests/runs/test_cli.pytests/runs/test_loader.pytests/runs/test_runner.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@run_experiments`:
- Around line 63-97: The script currently always rewrites manifest_proteins_csv
and sets PROTEINS_CSV before checking for dry-run flags; change the control flow
so the manifest rewrite (the block that reads source_proteins_csv, creates
manifest_dir, replaces legacy_data_dir and writes manifest_proteins_csv, and
exports PROTEINS_CSV) only runs when needs_runtime_paths=1 (i.e., after the
for‑arg loop that sets needs_runtime_paths). Concretely, move or guard the
entire manifest creation block with a conditional like: if [[
$needs_runtime_paths -eq 1 ]]; then ... fi (referencing source_proteins_csv,
manifest_proteins_csv, legacy_data_dir, PROTEINS_CSV and needs_runtime_paths) so
--dry-run/--show/--list do not perform filesystem writes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: be4004e6-0e54-498a-8c97-7ffa7976036e
📒 Files selected for processing (7)
DockerfileGRID_SEARCH.mdREADME.mdrun_all_models.shrun_experimentsrun_experiments.shtests/models/boltz/test_boltz_wrapper.py
✅ Files skipped from review due to trivial changes (2)
- run_experiments.sh
- GRID_SEARCH.md
fd644fd to
d86a299
Compare
marcuscollins
left a comment
There was a problem hiding this comment.
Thanks for the changes. Approved.
Summary
This PR makes the ACTL Sampleworks experiment workflow usable from the prebuilt image
pixi-with-checkpoints. The user-facing entry point isrun_experiments: it reads TOML presets and launches the right grid-search jobs, pixi environments, GPU assignments, logs, shared data paths, result directories, and MSA cache paths. The dir/home/dev/workspaceis synced to the local dir that actl is launched from, which should be this sampleworks repo.What changed
boltz,boltz1,boltz2,boltz2_xrd,boltz2_md,rf3, andprotenix, plus comparison presets.gpu_count = Npreset support so jobs auto-assign visible GPUs in order; single-job presets default to all 8 GPUs.experiments/and simplify target syntax (run_experiments rf3,run_experiments boltz,run_experiments full_8gpu --jobs rf3,protenix).CUDA_VISIBLE_DEVICESentry.Using
run_experimentsIn an ACTL Sampleworks pod, from
/home/dev/workspace:Presets live in
experiments/*.toml. The default full run can be subset by job name:Useful launch-time overrides:
run_experiments rf3 --set jobs.rf3.gpu_count=4 run_experiments rf3 --set jobs.rf3.args.gradient-weights="0.0 0.01 0.02"By default, data comes from
/mnt/diffuse-shared/raw/sampleworks, results go under/mnt/diffuse-shared/results/sampleworks/<pod>/<target>/, and MSA cache goes under/mnt/diffuse-shared/cache/sampleworks/msa. Override withDATA_DIR,RESULTS_DIR, orMSA_CACHE_DIR.