Skip to content

Chain export_top_loci 1+2+3; fix chr-prefix bug; drop "export" jargon in mnm_postprocessing.ipynb#1323

Open
rfeng2023 wants to merge 2 commits into
StatFunGen:mainfrom
rfeng2023:main
Open

Chain export_top_loci 1+2+3; fix chr-prefix bug; drop "export" jargon in mnm_postprocessing.ipynb#1323
rfeng2023 wants to merge 2 commits into
StatFunGen:mainfrom
rfeng2023:main

Conversation

@rfeng2023
Copy link
Copy Markdown
Contributor

Summary

  • One sos run ... export_top_loci now does export + top_loci + combine via SOS DAG (was two SLURM array scripts).
  • --combine auto|yes|no controls the combine step; auto-skips on --region-name / --region-list.
  • Output filenames lose .export. / .export_sumstats. / toploci jargon. Use *.export.* to find legacy files for purging.
  • Fix chr-prefix bug introduced by a pecotmr update: tabix_region now requires literal chr:start-end; three call sites updated.
  • --job_size default 50 → 1 (old default exceeded SLURM max_mem).
  • Refreshed example docs: removed the example doc designed for old fine mapping output, placeholder legend in examples, modality reference for --fsusie/--metaQTL/--mnm.

Migration

  • Consumers of *.cis_results_db.export.rds / *.export_sumstats.rds → repoint to *.cis_results_db.rds / *.cis_results_db.sumstats.rds.
  • export_top_loci's --prefix / --suffix renamed to --export_prefix / --export_suffix.

Tested codes

  • dryrun: full run + subset + subset-with---combine yes
  • end-to-end SLURM run on 5 RDS files, all three steps green

Feng added 2 commits May 13, 2026 15:59
…gon; fix chr-prefix bug

Previously producing per-study top_loci files required two SLURM
array scripts run sequentially (one calling cis_results_export
--step1_only, then one calling export_top_loci per region). This
patch chains both stages plus a final combine into one sos run
invocation, fixes a latent chr-prefix bug surfaced by the chained
flow, and removes "export"/"exported" jargon from output filenames.

Changes:

* Alias export_top_loci_1 onto the cis_results_export_1 step body
  so `sos run ... export_top_loci` triggers export (step 1) ->
  top_loci extraction (step 2) -> combine (step 3) in one DAG.

* [export_top_loci_2]: glob-based input over step-1 RDS outputs;
  parameters renamed to export_prefix/export_suffix to avoid CLI
  collision with step-1 --prefix/--suffix. No --region required.

* New [export_top_loci_3]: combines per-region top_loci.bed.gz
  into {cwd}/summary/{name}.{qtl_type}[.{variant_tag}].top_loci.bed.gz.
  Bash combine tolerates empty per-region outputs. --combine
  auto|yes|no controls whether step 3 runs (auto-skips on
  --region-name / --region-list; override with yes/no).

* Drop "export"/"exported" jargon from output filenames so legacy
  outputs can be purged by pattern (e.g. *.export.*):
    *.cis_results_db.export.rds -> *.cis_results_db.rds
    *.export_sumstats.rds       -> *.cis_results_db.sumstats.rds
    *.toploci.bed.gz            -> *.top_loci.bed.gz

* Fix chr-prefix handling broken by a pecotmr update. tabix_region
  now requires the literal "chr:start-end" format (its parse_region
  rejects bare "10:..."). align_to_genoref was stripping the chr
  before the call and getting an empty tibble back, which surfaced
  as `object 'chr' not found`. Replace gsub("chr","",chr) with
  sub("^(chr)?","chr",chr) at three call sites.

* Lower global --job_size default 50 -> 1. With the old default,
  one SLURM task bundled 50 substeps x per-substep mem in parallel
  (e.g. 50 x 6G = 300G), exceeding typical max_mem caps. Users who
  want bundling pass --job_size N explicitly.

* Example Docs: consolidate four cis_results_export variant cells into a
  single workflow/modality reference; rewrite the export_top_loci
  examples with a placeholder legend explaining the
  {prefix}.{region}.{suffix} input filename pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant