Charge Light Matching Alpha Release

The alpha release pipeline. End-to-end NDLAr charge-light matching: front-stage track/shower placement, Phase 2 large-cluster scan, V2 light rescue (the phase25_trial2_v_alpha_test module), Phase 3 small-cluster matrix association — and a per-file .pt output with the schema documented in config.yaml.

This is intended to be the same version integrated into flow. Everything needed to reproduce a run lives in this folder except the perceiver charge-light-relation weights (~490 MB), which are too big to commit to git and ship instead as a GitHub Release asset. The variance-prediction model is optional (the pipeline runs with a constant-std fallback when absent).

Quick start (any machine, any path)

# 1. Pick where you want the install to live, then clone.
INSTALL_DIR=/path/to/where/you/want/it
mkdir -p "$INSTALL_DIR" && cd "$INSTALL_DIR"
git clone https://github.com/MadivB/CLMatching_AlphaRelease.git
cd v_alpha_test

# 2. Preflight check (no GPU needed): tells you exactly what's missing
#    and how to fix it.
python scripts/check_install.py
#    Expected on a fresh clone: perceiver MISSING (required), pulse OK,
#                               variance optional.  Exit code 1.

# 3. Download the perceiver weights (~490 MB; ~30 s on a fast network)
#    into the path paths.yaml expects:
mkdir -p NewMLSection/runs/ndfull_run_distributed
curl -L -o NewMLSection/runs/ndfull_run_distributed/checkpoint.pt \
  https://github.com/MadivB/CLMatching_AlphaRelease/releases/download/v0.1.0/checkpoint.pt

# 4. Verify the SHA matches release.yaml (paranoia check; recommended).
sha256sum NewMLSection/runs/ndfull_run_distributed/checkpoint.pt
#    Expected:
#    38655cca2b50f2caa643ef572fb80c77332611eafd3a831215cbe0f117473ac5  ...

# 5. Re-verify the install (should now be all green).
python scripts/check_install.py
#    Expected: "All required assets are present.", exit code 0.

# 6. Optional: edit paths.yaml if any of your assets/data live somewhere
#    other than the defaults.  paths.yaml is the single source of truth.

# 7. Run the single-file smoke test on a GPU node (assuming that you are on nersc) (~6-10 min wall clock,
#    8 workers across 4 GPUs, auto-aggregates per-event NPZ -> per-file .pt).
salloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 30 \
  srun -N1 -n1 --gpus-per-node=4 \
    bash scripts/run_v_alpha_test_pt_one_file.sh 

# 8. Inspect the result.
python scripts/inspect_pt.py output/test_one_file/pt_outputs/*.v_alpha_test.pt

The launcher writes outputs to output/test_one_file/ inside your clone (override with OUT_DIR=...). Expected coverage on the default test file: ~98.7% of prompt hits get a finite t0 in calib_hit_t0_reco.

If check_install.py reports a missing required asset, it prints the exact path it tried, the download URL, the copy-pasteable download command, and the paths.yaml key to edit.

Watch progress (separate terminal)

After the salloc lands, in another login shell:

cd "$INSTALL_DIR"/v_alpha_test    # same install dir as above
tail -f output/test_one_file/parallel8_logs/worker*.log

Alternative: 10-file production run (~30-50 min wall)

salloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 90 \
  srun -N1 -n1 --gpus-per-node=4 \
    bash scripts/run_v_alpha_test_pt_parallel8.sh

Three run modes

The pipeline can be driven three ways, all sharing the same engine and output schema:

#	mode	script	when to use
1	batch submission	`scripts/submit_production_robust.sh [N]`	mass production; launches N preemption-robust SLURM chains that self-resubmit and cooperate via atomic file claims
2	interactive folder	`scripts/run_interactive_forward_0000000.sh`	run inside an existing `salloc` GPU node; processes a whole folder forward, cooperating with any batch chains
3	single file	`scripts/process_one_flow_file.sh <flow.hdf5> [out_dir]`	run inside an existing `salloc` GPU node; process exactly one FLOW file

All three use 8 workers (2 per GPU × 4 GPUs) and auto-aggregate per-event NPZ shards into one per-file .pt.

Example for mode 3 (already on an interactive GPU node):

bash scripts/process_one_flow_file.sh \
  /global/cfs/cdirs/dunepro/people/abooth/nd-production/output/MiniProdN5/run-ndlar-flow/MiniProdN5p1_NDComplex_FHC.flow.full.sanddrift/FLOW/0000000/MiniProdN5p1_NDComplex_FHC.flow.full.sanddrift.0000123.FLOW.hdf5
# -> output/single/<basename>/pt_outputs/<basename>.v_alpha_test.pt

paths.yaml — single source of truth for external paths

Every external file the pipeline loads is listed in paths.yaml:

asset	required?	default location
`perceiver_charge_light_relation`	yes	`NewMLSection/runs/ndfull_run_distributed/checkpoint.pt` (download from GitHub Release)
`pulse_template`	yes	`assets/avg_pulse.npy` (bundled, ~4 KB)
`variance_prediction`	optional	`NewMLSection/var_prediction/runs/.../best_model.pt` (constant-std fallback if missing)
`input_data.default_data_dir`	optional	NERSC default; override via CLI or paths.yaml

Each path: can be absolute or repo-relative. path_candidates: lets you list multiple fallbacks.

You can also point the resolver at a different YAML via V_ALPHA_TEST_PATHS_YAML=/path/to/your.yaml.

Layout

v_alpha_test/
├── README.md                                    # this file
├── paths.yaml                                   # USER-EDITABLE asset paths (perceiver, pulse, variance)
├── config.yaml                                  # per-file .pt output schema + field provenance
├── release.yaml                                 # release manifest (sha256s, asset URLs, distribution)
├── assets/
│   └── avg_pulse.npy                            # bundled pulse template (4 KB)
├── M5p1/                                        # M5p1 python package (front stage + V2 + Phase 3 + resolver)
│   └── first_stage_matching/
│       └── asset_resolver.py                    # reads paths.yaml, validates, friendly errors
├── NewMLSection/                                # perceiver model code (weights downloaded separately)
└── scripts/
    ├── check_install.py                         # validates paths.yaml; exits 1 on missing required assets
    ├── aggregate_to_pt.py                       # per-event NPZ shards -> per-file .pt
    ├── inspect_pt.py                            # peek at a per-file .pt
    ├── run_v_alpha_test_pt_one_file.sh          # 8-worker single-file launcher (auto-aggregates)
    └── run_v_alpha_test_pt_parallel8.sh         # 8-worker 10-file launcher (auto-aggregates)

The launcher scripts auto-detect the repo location from their own path — they work from any clone, no editing needed.

Output: per-file `.pt` schema (vBeta3-compatible + new field)

See config.yaml for the full schema. Highlights:

Per-prompt-hit fields (size n_calib_hits):

field	dtype	filled by
`calib_hit_t0_reco`	float32	full pipeline (Front + Phase 2 + V2 + Phase 3); `hit_timestamps_post_phase3` scattered via `event.hit_refs`
`prompt_hit_t_cluster_id`	int16	front-stage `labels_global` re-labeled by every V2 spatial+light move (each move yields a brand-new id past the original cluster count)

Per-merged-hit fields (size n_calib_final_hits, vBeta3-compatible):

field	dtype	filled by
`calib_final_hit_t0_reco`	float32	aggregator: `calib_hit_t0_reco[prompt_idx[i]]` where `prompt_idx = charge/calib_prompt_hits/ref/charge/calib_final_hits/ref[:, 0]`
`calib_final_hit_cluster_id`	int16	aggregator: same prompt-index lookup against `prompt_hit_t_cluster_id`
`calib_final_hit_prompt_index`	int64	aggregator: the column-0 ref above

Counts + metadata:

field	type	filled by
`n_calib_hits`, `n_assigned`, `n_unassigned`	int	aggregator
`n_calib_final_hits`, `n_calib_final_assigned`, `n_calib_final_unassigned`	int	aggregator
`processed_event_ids`, `all_event_ids`	int64	aggregator
`event_summaries`, `failed_events`	list[dict]	aggregator
`version`, `algorithm`, `input_file`, `calib_final_hit_source`	str	aggregator

Sentinels: unassigned prompt and merged hits have *_t0_reco = -1.0 and *_cluster_id = -1.

Inspect a result

python scripts/inspect_pt.py output/test_one_file/pt_outputs/*.v_alpha_test.pt

Manual aggregation

If you ran the batch but didn't auto-aggregate, run the aggregator separately:

python scripts/aggregate_to_pt.py \
    --shard-dir output/test_one_file \
    --output-dir output/test_one_file/pt_outputs

On NERSC

The default paths in paths.yaml and the launchers are NERSC-friendly out of the box. Run on a 4-GPU GPU-node interactive allocation:

salloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 30 \
  srun -N1 -n1 --gpus-per-node=4 \
    bash scripts/run_v_alpha_test_pt_one_file.sh

For 10 files / ~130 events in ~30 min:

salloc -A dune -q interactive -C gpu --gpus-per-node=4 -N 1 -t 90 \
  srun -N1 -n1 --gpus-per-node=4 \
    bash scripts/run_v_alpha_test_pt_parallel8.sh

Excluded from this folder (too large for git)

The perceiver charge-light relation weights (checkpoint.pt, ~490 MB) — GitHub Release asset
The variance-prediction .pt (when produced) — GitHub Release asset, optional

Both are loaded by M5p1.first_stage_matching.load_first_stage_models using the paths from paths.yaml. Missing required assets trigger a friendly error with download instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Charge Light Matching Alpha Release

Quick start (any machine, any path)

Watch progress (separate terminal)

Alternative: 10-file production run (~30-50 min wall)

Three run modes

paths.yaml — single source of truth for external paths

Layout

Output: per-file `.pt` schema (vBeta3-compatible + new field)

Inspect a result

Manual aggregation

On NERSC

Excluded from this folder (too large for git)

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
M5p1		M5p1
NewMLSection		NewMLSection
assets		assets
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
paths.yaml		paths.yaml
release.yaml		release.yaml

Folders and files

Latest commit

History

Repository files navigation

Charge Light Matching Alpha Release

Quick start (any machine, any path)

Watch progress (separate terminal)

Alternative: 10-file production run (~30-50 min wall)

Three run modes

paths.yaml — single source of truth for external paths

Layout

Output: per-file .pt schema (vBeta3-compatible + new field)

Inspect a result

Manual aggregation

On NERSC

Excluded from this folder (too large for git)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Output: per-file `.pt` schema (vBeta3-compatible + new field)

Packages