The alpha release pipeline. End-to-end NDLAr charge-light matching: front-stage track/shower placement, Phase 2 large-cluster scan, V2 light rescue (the phase25_trial2_v_alpha_test module), Phase 3 small-cluster matrix association — and a per-file .pt output with the schema documented in config.yaml.
This is intended to be the same version integrated into flow. Everything needed to reproduce a run lives in this folder except the perceiver charge-light-relation weights (~490 MB), which are too big to commit to git and ship instead as a GitHub Release asset. The variance-prediction model is optional (the pipeline runs with a constant-std fallback when absent).
# 1. Pick where you want the install to live, then clone.
INSTALL_DIR=/path/to/where/you/want/it
mkdir -p "$INSTALL_DIR" && cd "$INSTALL_DIR"
git clone https://github.com/MadivB/CLMatching_AlphaRelease.git
cd v_alpha_test
# 2. Preflight check (no GPU needed): tells you exactly what's missing
# and how to fix it.
python scripts/check_install.py
# Expected on a fresh clone: perceiver MISSING (required), pulse OK,
# variance optional. Exit code 1.
# 3. Download the perceiver weights (~490 MB; ~30 s on a fast network)
# into the path paths.yaml expects:
mkdir -p NewMLSection/runs/ndfull_run_distributed
curl -L -o NewMLSection/runs/ndfull_run_distributed/checkpoint.pt \
https://github.com/MadivB/CLMatching_AlphaRelease/releases/download/v0.1.0/checkpoint.pt
# 4. Verify the SHA matches release.yaml (paranoia check; recommended).
sha256sum NewMLSection/runs/ndfull_run_distributed/checkpoint.pt
# Expected:
# 38655cca2b50f2caa643ef572fb80c77332611eafd3a831215cbe0f117473ac5 ...
# 5. Re-verify the install (should now be all green).
python scripts/check_install.py
# Expected: "All required assets are present.", exit code 0.
# 6. Optional: edit paths.yaml if any of your assets/data live somewhere
# other than the defaults. paths.yaml is the single source of truth.
# 7. Run the single-file smoke test on a GPU node (assuming that you are on nersc) (~6-10 min wall clock,
# 8 workers across 4 GPUs, auto-aggregates per-event NPZ -> per-file .pt).
salloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 30 \
srun -N1 -n1 --gpus-per-node=4 \
bash scripts/run_v_alpha_test_pt_one_file.sh
# 8. Inspect the result.
python scripts/inspect_pt.py output/test_one_file/pt_outputs/*.v_alpha_test.ptThe launcher writes outputs to output/test_one_file/ inside your clone (override with OUT_DIR=...).
Expected coverage on the default test file: ~98.7% of prompt hits get a finite t0 in calib_hit_t0_reco.
If check_install.py reports a missing required asset, it prints the exact path it tried, the download URL, the copy-pasteable download command, and the paths.yaml key to edit.
After the salloc lands, in another login shell:
cd "$INSTALL_DIR"/v_alpha_test # same install dir as above
tail -f output/test_one_file/parallel8_logs/worker*.logsalloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 90 \
srun -N1 -n1 --gpus-per-node=4 \
bash scripts/run_v_alpha_test_pt_parallel8.shThe pipeline can be driven three ways, all sharing the same engine and output schema:
| # | mode | script | when to use |
|---|---|---|---|
| 1 | batch submission | scripts/submit_production_robust.sh [N] |
mass production; launches N preemption-robust SLURM chains that self-resubmit and cooperate via atomic file claims |
| 2 | interactive folder | scripts/run_interactive_forward_0000000.sh |
run inside an existing salloc GPU node; processes a whole folder forward, cooperating with any batch chains |
| 3 | single file | scripts/process_one_flow_file.sh <flow.hdf5> [out_dir] |
run inside an existing salloc GPU node; process exactly one FLOW file |
All three use 8 workers (2 per GPU × 4 GPUs) and auto-aggregate per-event NPZ shards into one per-file .pt.
Example for mode 3 (already on an interactive GPU node):
bash scripts/process_one_flow_file.sh \
/global/cfs/cdirs/dunepro/people/abooth/nd-production/output/MiniProdN5/run-ndlar-flow/MiniProdN5p1_NDComplex_FHC.flow.full.sanddrift/FLOW/0000000/MiniProdN5p1_NDComplex_FHC.flow.full.sanddrift.0000123.FLOW.hdf5
# -> output/single/<basename>/pt_outputs/<basename>.v_alpha_test.ptEvery external file the pipeline loads is listed in paths.yaml:
| asset | required? | default location |
|---|---|---|
perceiver_charge_light_relation |
yes | NewMLSection/runs/ndfull_run_distributed/checkpoint.pt (download from GitHub Release) |
pulse_template |
yes | assets/avg_pulse.npy (bundled, ~4 KB) |
variance_prediction |
optional | NewMLSection/var_prediction/runs/.../best_model.pt (constant-std fallback if missing) |
input_data.default_data_dir |
optional | NERSC default; override via CLI or paths.yaml |
Each path: can be absolute or repo-relative. path_candidates: lets you list multiple fallbacks.
You can also point the resolver at a different YAML via V_ALPHA_TEST_PATHS_YAML=/path/to/your.yaml.
v_alpha_test/
├── README.md # this file
├── paths.yaml # USER-EDITABLE asset paths (perceiver, pulse, variance)
├── config.yaml # per-file .pt output schema + field provenance
├── release.yaml # release manifest (sha256s, asset URLs, distribution)
├── assets/
│ └── avg_pulse.npy # bundled pulse template (4 KB)
├── M5p1/ # M5p1 python package (front stage + V2 + Phase 3 + resolver)
│ └── first_stage_matching/
│ └── asset_resolver.py # reads paths.yaml, validates, friendly errors
├── NewMLSection/ # perceiver model code (weights downloaded separately)
└── scripts/
├── check_install.py # validates paths.yaml; exits 1 on missing required assets
├── aggregate_to_pt.py # per-event NPZ shards -> per-file .pt
├── inspect_pt.py # peek at a per-file .pt
├── run_v_alpha_test_pt_one_file.sh # 8-worker single-file launcher (auto-aggregates)
└── run_v_alpha_test_pt_parallel8.sh # 8-worker 10-file launcher (auto-aggregates)
The launcher scripts auto-detect the repo location from their own path — they work from any clone, no editing needed.
See config.yaml for the full schema. Highlights:
Per-prompt-hit fields (size n_calib_hits):
| field | dtype | filled by |
|---|---|---|
calib_hit_t0_reco |
float32 | full pipeline (Front + Phase 2 + V2 + Phase 3); hit_timestamps_post_phase3 scattered via event.hit_refs |
prompt_hit_t_cluster_id |
int16 | front-stage labels_global re-labeled by every V2 spatial+light move (each move yields a brand-new id past the original cluster count) |
Per-merged-hit fields (size n_calib_final_hits, vBeta3-compatible):
| field | dtype | filled by |
|---|---|---|
calib_final_hit_t0_reco |
float32 | aggregator: calib_hit_t0_reco[prompt_idx[i]] where prompt_idx = charge/calib_prompt_hits/ref/charge/calib_final_hits/ref[:, 0] |
calib_final_hit_cluster_id |
int16 | aggregator: same prompt-index lookup against prompt_hit_t_cluster_id |
calib_final_hit_prompt_index |
int64 | aggregator: the column-0 ref above |
Counts + metadata:
| field | type | filled by |
|---|---|---|
n_calib_hits, n_assigned, n_unassigned |
int | aggregator |
n_calib_final_hits, n_calib_final_assigned, n_calib_final_unassigned |
int | aggregator |
processed_event_ids, all_event_ids |
int64 | aggregator |
event_summaries, failed_events |
list[dict] | aggregator |
version, algorithm, input_file, calib_final_hit_source |
str | aggregator |
Sentinels: unassigned prompt and merged hits have *_t0_reco = -1.0 and *_cluster_id = -1.
python scripts/inspect_pt.py output/test_one_file/pt_outputs/*.v_alpha_test.ptIf you ran the batch but didn't auto-aggregate, run the aggregator separately:
python scripts/aggregate_to_pt.py \
--shard-dir output/test_one_file \
--output-dir output/test_one_file/pt_outputsThe default paths in paths.yaml and the launchers are NERSC-friendly out of the box. Run on a 4-GPU GPU-node interactive allocation:
salloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 30 \
srun -N1 -n1 --gpus-per-node=4 \
bash scripts/run_v_alpha_test_pt_one_file.shFor 10 files / ~130 events in ~30 min:
salloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 90 \
srun -N1 -n1 --gpus-per-node=4 \
bash scripts/run_v_alpha_test_pt_parallel8.sh- The perceiver charge-light relation weights (
checkpoint.pt, ~490 MB) — GitHub Release asset - The variance-prediction
.pt(when produced) — GitHub Release asset, optional
Both are loaded by M5p1.first_stage_matching.load_first_stage_models using the paths from paths.yaml. Missing required assets trigger a friendly error with download instructions.