sldsc: fix compute_sldsc_M_ref reference panel; add target_labels relabeling#489
Merged
Merged
Conversation
…abeling
compute_sldsc_M_ref:
M_ref is the REFERENCE-PANEL SNP count over which heritability is partitioned
(h2(C) = sum_{j in M_ref} a_C(j) sum_{C'} tau_{C'} a_{C'}(j)) — it is panel-
defined, not the regression/HapMap3 SNP set. Previously, under maf_cutoff == 0
it counted the .l2.ldscore rows of target_anno_dir, which is the HM3 regression
set (~1M) for snplist runs instead of the reference panel (~8M) — making
"allm + snplist" runs report a ~8x too-small M_ref and hence a ~8x too-small
tau*. Now it counts the reference panel from the .frq files: all rows when
maf_cutoff == 0 (matches polyfun's .l2.M), MAF > cutoff rows when
maf_cutoff > 0 (matches polyfun's .l2.M_5_50). target_anno_dir is kept only as
a fallback (with a warning) when no .frq dir is given. Enrichment is unchanged
(polyfun computes it without M_ref); only tau* / EnrichStat standardization is
affected. m50 / m50_snplist were already correct.
sldsc_postprocessing_pipeline: new optional `target_labels` argument. When given
(same length & order as the resolved target_categories), every "target" column
and tau*-block column name in the output is renamed to those labels;
params$target_categories then holds the labels and
params$target_categories_orig keeps the original polyfun .results names. When
NULL (default) nothing is renamed — original behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes to the sLDSC post-processing in
R/sldsc_wrapper.R:1.
compute_sldsc_M_ref— count the reference panel, not the regression setM_refis the reference-panel SNP count over which partitioned heritability is defined:h²(C) = Σ_{j∈M_ref} a_C(j)·Σ_{C'} τ_{C'}·a_{C'}(j). It is panel-defined (it matches polyfun's.l2.Mforallm/.l2.M_5_50form50), not the regression / HapMap3 SNP set (~1M).Previously, under
maf_cutoff == 0the function counted the.l2.ldscorerows oftarget_anno_dir. For snplist runs that directory holds the HM3 regression set (~1M), not the reference panel (~8M) — soallm + snplistruns reported a ~8× too-smallM_ref, and therefore a ~8× too-smallτ*/EnrichStat(e.g. metaτ*0.018 instead of 0.137).Now it counts the reference panel from the
.frqfiles:maf_cutoff == 0→ all rows (matches polyfun.l2.M)maf_cutoff > 0→MAF > cutoffrows (matches polyfun.l2.M_5_50)target_anno_diris kept only as a fallback (with a warning) when no.frqdir is supplied.Enrichmentis unaffected (polyfun computes it independently ofM_ref); onlyτ*/EnrichStatstandardization changes.m50/m50_snplistruns were already correct.2.
sldsc_postprocessing_pipeline— new optionaltarget_labelsargumentWhen given (same length & order as the resolved
target_categories), everytargetcolumn andtau*-block column name in the output is renamed to those labels;params$target_categoriesthen holds the labels andparams$target_categories_origkeeps the original polyfun.resultsnames (ANNOT_0,L2_0, …). WhenNULL(default) nothing is renamed — original behaviour. Lets downstream output read e.g.quantile_eQTLinstead ofANNOT_0.Testing
End-to-end on the
updated_pipeline_by_gao/testMWE (postprocess + meta_subset), both with and without--target-categories-label:params$target_categories = quantile_eQTL,target_categories_orig = ANNOT_0, alltargetcolumns relabeled,meta_subsetinherits the label;ANNOT_0, notarget_categories_orig);τ*) identical between the two — relabel is cosmetic only.M_reffix verified onADSP_allm_snplist:M_ref1.04M → 8.13M, metaτ*0.018 → 0.137, Enrichment unchanged.man/compute_sldsc_M_ref.Rdandman/sldsc_postprocessing_pipeline.Rdregenerated via roxygen2.🤖 Generated with Claude Code