sldsc_enrichment: fix snp-list mode + meta_subset tau_star handling by al4225 · Pull Request #1319 · StatFunGen/xqtl-protocol

al4225 · 2026-05-08T15:02:03Z

Summary

Two independent fixes to code/enrichment/sldsc_enrichment.ipynb to make the new pipeline runnable at production scale:

[make_annotation_files_ldscore]: reshape annot dataframe so ldsc.py --print-snps accepts it (snp-list mode).
[meta_subset]: split tau_star output into tau_star_single / tau_star_joint and project per-trait summaries through the view helper, so meta_sldsc_random finds the bare column names it expects.

Fix 1 — `[make_annotation_files_ldscore]` snp-list mode (commit `d89e3918`)

When --snp-list is set, Step C invokes polyfun/ldsc.py --l2 --print-snps (instead of compute_ldscores.py). ldsc.py strict-positionally reads cols 0..3 as CHR/BP/SNP/CM and cols 4+ as numeric annotations, AND requires the .annot SNP set to equal the .bim SNP set in identical row order. The previous Step A output failed both:

Constraint	Old behavior	Symptom
Cols 4+ must be numeric	Wrote `A1/A2/MAF` as cols 4-6	`TypeError: can't multiply sequence by non-int of type 'float'` (1000G)
`.annot` rows == `.bim` rows	Used merged-input row set	`ValueError: shapes (634887,) (1698778,) not broadcastable` (ADSP)

Add a small normalize_for_ldsc() helper, applied to both single and joint annot dataframes before fwrite when use_print_snps is true:

drops A1/A2/MAF/CM (CM is re-sourced),
left-joins to .bim SNP set, fills 0 for missing SNPs,
takes CM from .bim (authoritative; ADSP .bim has CM=0, 1000G has real cM),
reorders to CHR-BP-SNP-CM-<annot…> matching .bim row order.

No-op when --snp-list is not set (compute_ldscores.py path is unchanged).

Fix 2 — `[meta_subset]` view helper for tau_star (commit `c08438db`)

postprocess writes per_trait[i]$summary with wide column names so a single per-trait list can hold both modes:

target, is_binary,
tau_single, tau_se_single, tau_star_single, tau_star_se_single,
enrichment_single, enrichment_se_single, enrichment_p_single,
enrichstat_single, enrichstat_se_single,
tau_joint, tau_se_joint, tau_star_joint, tau_star_se_joint,
enrichment_joint, ...

But meta_sldsc_random looks up bare names (tau_star, tau_star_se, etc.). The previous meta_subset cell passed subset_per_trait directly to meta_sldsc_random(..., "tau_star") → no bare tau_star column → all 96 traits skipped → out$tau_star was a list of NA.

Project subset_per_trait through pecotmr:::.sldsc_view_for_meta() once per mode (single | joint) before calling meta_sldsc_random. Output structure now mirrors postprocess for the "all" group:

out$tau_star_single   (per-target meta over single-tau)
out$tau_star_joint    (per-target meta over joint-tau)
out$enrichment        (single only — joint enrichment isn't well-defined)
out$enrichstat        (single only)

Validation

MWE end-to-end (test/scripts/validate_pecotmr_fix.sh): COMPLETED 0:0 / 9:41 (full make_annotation → get_heritability → postprocess → meta_subset for category1).
Production (1000G allm_snplist + 1000G m50_snplist, 6 contexts × 96 traits, ROSMAP_eQTL_{Ast,Inh,Mic}_mega): all 6 jobs produced complete <ctx>.sldsc_postprocess.rds and 5 <group>.meta.rds per context (brain / blood / brain_neurodegenerative / brain_psychiatric / brain_imaging).

Dependencies

This PR depends on the matching pecotmr PR (StatFunGen/pecotmr#488) for sd_annot ↔ polyfun .results category alignment. Without that, postprocess fails on snp-list-mode .results (target Category = L2_0/<annot>L2_0 instead of the .annot.gz column name).

Files

code/enrichment/sldsc_enrichment.ipynb (2 commits, +33 / -3 lines total)

🤖 Generated with Claude Code

…in snp-list mode When --snp-list is set, Step C invokes polyfun's ldsc.py --l2 --print-snps (rather than compute_ldscores.py). ldsc.py strict-positionally reads cols 0..3 as CHR/BP/SNP/CM and cols 4+ as numeric annotations, and requires the .annot SNP set to equal the .bim SNP set in identical order. Add a small normalize_for_ldsc() helper, applied to both single and joint annot dataframes before fwrite when use_print_snps is true. No-op otherwise. Without this, snp-list-flavored settings (ADSP allm_snplist / m50_snplist, 1000G allm_snplist / m50_snplist) failed Step C with TypeError on 1000G (A1/A2 strings parsed as numeric) or ValueError on ADSP (annot vs bim shape mismatch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…view helper postprocess writes per_trait[i]$summary with wide names (tau_star_single, tau_star_joint, ...) so a single per-trait list can hold both modes. But meta_sldsc_random looks up bare names (tau_star, tau_star_se), so passing subset_per_trait directly returned NULLs for all 96 traits, leaving meta output empty. Project subset_per_trait through pecotmr:::.sldsc_view_for_meta() once per mode (single | joint) before calling meta_sldsc_random. Output structure now mirrors postprocess for the "all" group: tau_star_single, tau_star_joint, enrichment, enrichstat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

al4225 and others added 2 commits May 8, 2026 10:46

gaow merged commit 2e5034a into StatFunGen:main May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sldsc_enrichment: fix snp-list mode + meta_subset tau_star handling#1319

sldsc_enrichment: fix snp-list mode + meta_subset tau_star handling#1319
gaow merged 2 commits into
StatFunGen:mainfrom
al4225:main

al4225 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

al4225 commented May 8, 2026

Summary

Fix 1 — [make_annotation_files_ldscore] snp-list mode (commit d89e3918)

Fix 2 — [meta_subset] view helper for tau_star (commit c08438db)

Validation

Dependencies

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix 1 — `[make_annotation_files_ldscore]` snp-list mode (commit `d89e3918`)

Fix 2 — `[meta_subset]` view helper for tau_star (commit `c08438db`)