Skip to content

sldsc: fix compute_sldsc_M_ref reference panel; add target_labels relabeling#489

Merged
gaow merged 1 commit into
StatFunGen:mainfrom
al4225:sldsc-mref-fix-and-target-labels
May 13, 2026
Merged

sldsc: fix compute_sldsc_M_ref reference panel; add target_labels relabeling#489
gaow merged 1 commit into
StatFunGen:mainfrom
al4225:sldsc-mref-fix-and-target-labels

Conversation

@al4225
Copy link
Copy Markdown
Collaborator

@al4225 al4225 commented May 12, 2026

Summary

Two changes to the sLDSC post-processing in R/sldsc_wrapper.R:

1. compute_sldsc_M_ref — count the reference panel, not the regression set

M_ref is the reference-panel SNP count over which partitioned heritability is defined:
h²(C) = Σ_{j∈M_ref} a_C(j)·Σ_{C'} τ_{C'}·a_{C'}(j). It is panel-defined (it matches polyfun's .l2.M for allm / .l2.M_5_50 for m50), not the regression / HapMap3 SNP set (~1M).

Previously, under maf_cutoff == 0 the function counted the .l2.ldscore rows of target_anno_dir. For snplist runs that directory holds the HM3 regression set (~1M), not the reference panel (~8M) — so allm + snplist runs reported a ~8× too-small M_ref, and therefore a ~8× too-small τ* / EnrichStat (e.g. meta τ* 0.018 instead of 0.137).

Now it counts the reference panel from the .frq files:

  • maf_cutoff == 0 → all rows (matches polyfun .l2.M)
  • maf_cutoff > 0MAF > cutoff rows (matches polyfun .l2.M_5_50)
  • target_anno_dir is kept only as a fallback (with a warning) when no .frq dir is supplied.

Enrichment is unaffected (polyfun computes it independently of M_ref); only τ* / EnrichStat standardization changes. m50 / m50_snplist runs were already correct.

2. sldsc_postprocessing_pipeline — new optional target_labels argument

When given (same length & order as the resolved target_categories), every target column and tau*-block column name in the output is renamed to those labels; params$target_categories then holds the labels and params$target_categories_orig keeps the original polyfun .results names (ANNOT_0, L2_0, …). When NULL (default) nothing is renamed — original behaviour. Lets downstream output read e.g. quantile_eQTL instead of ANNOT_0.

Testing

End-to-end on the updated_pipeline_by_gao/test MWE (postprocess + meta_subset), both with and without --target-categories-label:

  • with label → params$target_categories = quantile_eQTL, target_categories_orig = ANNOT_0, all target columns relabeled, meta_subset inherits the label;
  • without label → unchanged (ANNOT_0, no target_categories_orig);
  • numeric values (enrichment, τ*) identical between the two — relabel is cosmetic only.
  • M_ref fix verified on ADSP_allm_snplist: M_ref 1.04M → 8.13M, meta τ* 0.018 → 0.137, Enrichment unchanged.

man/compute_sldsc_M_ref.Rd and man/sldsc_postprocessing_pipeline.Rd regenerated via roxygen2.

🤖 Generated with Claude Code

…abeling

compute_sldsc_M_ref:
  M_ref is the REFERENCE-PANEL SNP count over which heritability is partitioned
  (h2(C) = sum_{j in M_ref} a_C(j) sum_{C'} tau_{C'} a_{C'}(j)) — it is panel-
  defined, not the regression/HapMap3 SNP set. Previously, under maf_cutoff == 0
  it counted the .l2.ldscore rows of target_anno_dir, which is the HM3 regression
  set (~1M) for snplist runs instead of the reference panel (~8M) — making
  "allm + snplist" runs report a ~8x too-small M_ref and hence a ~8x too-small
  tau*. Now it counts the reference panel from the .frq files: all rows when
  maf_cutoff == 0 (matches polyfun's .l2.M), MAF > cutoff rows when
  maf_cutoff > 0 (matches polyfun's .l2.M_5_50). target_anno_dir is kept only as
  a fallback (with a warning) when no .frq dir is given. Enrichment is unchanged
  (polyfun computes it without M_ref); only tau* / EnrichStat standardization is
  affected. m50 / m50_snplist were already correct.

sldsc_postprocessing_pipeline: new optional `target_labels` argument. When given
  (same length & order as the resolved target_categories), every "target" column
  and tau*-block column name in the output is renamed to those labels;
  params$target_categories then holds the labels and
  params$target_categories_orig keeps the original polyfun .results names. When
  NULL (default) nothing is renamed — original behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gaow gaow merged commit 177735b into StatFunGen:main May 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants