sldsc_postprocessing_pipeline: align sd_annot names to polyfun .results categories (covers --snp-list mode)#488
Merged
Merged
Conversation
…ts categories
Polyfun appends "_<file_idx>" to LD score column names when writing .results
Category, where the target annotation is file_idx=0. Two cases now handled:
1. compute_ldscores.py path (no --snp-list): preserves .annot.gz column names,
so .results target Category = "<annot_col>_0". Add paste0("_0") to
sd_annot_full / is_binary_full names so intersect() with .results categories
matches.
2. ldsc.py --l2 path (with --snp-list): hardcodes LD score col to "L2" (single)
or "<annot_col>L2" (joint), so paste0("_0") on .annot.gz names gives the
wrong key. Fall back to positional rename: take the first
length(sd_annot_full) rows of .results Category as target_categories
(polyfun puts target before baseline because file_idx=0 < 1), rename
sd_annot_full / is_binary_full to those names, and emit an INFO message
with old/new names and the baseline count for traceability.
Without this, postprocess silently produced empty target_categories whenever
the pipeline ran with --snp-list, breaking downstream meta-analysis.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sldsc_postprocessing_pipeline()matchessd_annot_full(named after.annot.gzcolumns) against polyfun's.results.Category. The previous code only worked when polyfun preserved the original annot column name. This PR makes the matching robust to all four pipeline configurations (single | joint × snp-list | no).Background — the naming asymmetry
Polyfun appends
_<file_idx>to LD score column names when writing.results.Category(target =file_idx=0). The LD score column name itself depends on which polyfun script wrote the file, and the new pipeline branches on--snp-list:.resultstarget--snp-list, singlecompute_ldscores.pyANNOTANNOT_0--snp-list, singleldsc.py --l2L2(ldsc.py:317)L2_0--snp-list, joint (N)compute_ldscores.pyA1, A2, …A1_0, A2_0, …--snp-list, joint (N)ldsc.py --l2<annot>L2per colA1L2_0, A2L2_0, …compute_sldsc_annot_sd()only reads.annot.gz, so its return names never see the polyfun side. Without alignment,intersect()was empty whenever--snp-listwas used → emptytarget_categories→ cryptic downstream failure.Fix
Two-stage match in
sldsc_postprocessing_pipeline:paste0("_0")onsd_annot_full/is_binary_fullnames (covers no-snp-list).intersect()is empty, take the firstlength(sd_annot_full)rows of.results.Categoryas targets and rename positionally (covers snp-list). Polyfun putsfile_idx=0rows first in.results, so position alignment is safe across all 4 branches.Why this approach
L2/<annot>L2rule — brittle.use_snp_listflag from pipeline → pecotmr would couple the two repos.file_idx=0 < 1) needs no flag and works for any future polyfun script with the same convention.Validation
.results: ADSP allm (ANNOT_0, stage 1) and 1000G allm_snplist (L2_0, stage 2) both resolve correctly.validate_pecotmr_fix.sh): COMPLETED 0:0 / 9:41 — stage 1 path unchanged.<ctx>.sldsc_postprocess.rds(per_trait[96]+meta{tau_star, enrichment, enrichstat}) and 5<group>.meta.rdsper context — stage 2 confirmed at scale.Test plan
bash xqtl-protocol/test/scripts/validate_pecotmr_fix.sh--snp-list.results(target =L2_0) and confirm fallback message in logs.results(target =ANNOT_0) and confirm no fallback--snp-list, verifytarget_categories = <a>L2_0, <b>L2_0Files
R/sldsc_wrapper.R(+32 lines, single function modified)🤖 Generated with Claude Code