sldsc_postprocessing_pipeline: align sd_annot names to polyfun .results categories (covers --snp-list mode) by al4225 · Pull Request #488 · StatFunGen/pecotmr

al4225 · 2026-05-08T14:59:53Z

Summary

sldsc_postprocessing_pipeline() matches sd_annot_full (named after .annot.gz columns) against polyfun's .results.Category. The previous code only worked when polyfun preserved the original annot column name. This PR makes the matching robust to all four pipeline configurations (single | joint × snp-list | no).

Background — the naming asymmetry

Polyfun appends _<file_idx> to LD score column names when writing .results.Category (target = file_idx=0). The LD score column name itself depends on which polyfun script wrote the file, and the new pipeline branches on --snp-list:

Branch	polyfun script	LD score col	`.results` target
no `--snp-list`, single	`compute_ldscores.py`	preserved `ANNOT`	`ANNOT_0`
`--snp-list`, single	`ldsc.py --l2`	hardcoded `L2` (`ldsc.py:317`)	`L2_0`
no `--snp-list`, joint (N)	`compute_ldscores.py`	preserved `A1, A2, …`	`A1_0, A2_0, …`
`--snp-list`, joint (N)	`ldsc.py --l2`	`<annot>L2` per col	`A1L2_0, A2L2_0, …`

compute_sldsc_annot_sd() only reads .annot.gz, so its return names never see the polyfun side. Without alignment, intersect() was empty whenever --snp-list was used → empty target_categories → cryptic downstream failure.

Fix

Two-stage match in sldsc_postprocessing_pipeline:

paste0("_0") on sd_annot_full / is_binary_full names (covers no-snp-list).
If intersect() is empty, take the first length(sd_annot_full) rows of .results.Category as targets and rename positionally (covers snp-list). Polyfun puts file_idx=0 rows first in .results, so position alignment is safe across all 4 branches.

Why this approach

Pipeline-side renaming would reproduce ldsc.py's hardcoded L2/<annot>L2 rule — brittle.
A use_snp_list flag from pipeline → pecotmr would couple the two repos.
Trusting polyfun's row ordering (target before baseline because file_idx=0 < 1) needs no flag and works for any future polyfun script with the same convention.

Validation

Standalone match test on real .results: ADSP allm (ANNOT_0, stage 1) and 1000G allm_snplist (L2_0, stage 2) both resolve correctly.
MWE end-to-end (validate_pecotmr_fix.sh): COMPLETED 0:0 / 9:41 — stage 1 path unchanged.
Production (1000G allm_snplist + m50_snplist, 6 contexts × 96 traits): all 6 produced <ctx>.sldsc_postprocess.rds (per_trait[96] + meta{tau_star, enrichment, enrichstat}) and 5 <group>.meta.rds per context — stage 2 confirmed at scale.

Test plan

Reproduce MWE: bash xqtl-protocol/test/scripts/validate_pecotmr_fix.sh
Run on a --snp-list .results (target = L2_0) and confirm fallback message in logs
Run on a no-snp-list .results (target = ANNOT_0) and confirm no fallback
Joint smoke: 2-col annot + --snp-list, verify target_categories = <a>L2_0, <b>L2_0

Files

R/sldsc_wrapper.R (+32 lines, single function modified)

🤖 Generated with Claude Code

…ts categories Polyfun appends "_<file_idx>" to LD score column names when writing .results Category, where the target annotation is file_idx=0. Two cases now handled: 1. compute_ldscores.py path (no --snp-list): preserves .annot.gz column names, so .results target Category = "<annot_col>_0". Add paste0("_0") to sd_annot_full / is_binary_full names so intersect() with .results categories matches. 2. ldsc.py --l2 path (with --snp-list): hardcodes LD score col to "L2" (single) or "<annot_col>L2" (joint), so paste0("_0") on .annot.gz names gives the wrong key. Fall back to positional rename: take the first length(sd_annot_full) rows of .results Category as target_categories (polyfun puts target before baseline because file_idx=0 < 1), rename sd_annot_full / is_binary_full to those names, and emit an INFO message with old/new names and the baseline count for traceability. Without this, postprocess silently produced empty target_categories whenever the pipeline ran with --snp-list, breaking downstream meta-analysis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

al4225 and others added 2 commits May 8, 2026 10:13

Update documentation

547a929

al4225 mentioned this pull request May 8, 2026

sldsc_enrichment: fix snp-list mode + meta_subset tau_star handling StatFunGen/xqtl-protocol#1319

Merged

gaow merged commit 326fc90 into StatFunGen:main May 9, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sldsc_postprocessing_pipeline: align sd_annot names to polyfun .results categories (covers --snp-list mode)#488

sldsc_postprocessing_pipeline: align sd_annot names to polyfun .results categories (covers --snp-list mode)#488
gaow merged 2 commits into
StatFunGen:mainfrom
al4225:main

al4225 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

al4225 commented May 8, 2026

Summary

Background — the naming asymmetry

Fix

Why this approach

Validation

Test plan

Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants