add replicates, perform QC on individual replicates before merge#138
add replicates, perform QC on individual replicates before merge#138ljwharbers merged 7 commits intodevfrom
Conversation
|
There was a problem hiding this comment.
Pull request overview
This PR introduces replicate awareness to the lrsomatic Nextflow pipeline, aiming to run QC on individual replicates before merging them for downstream analysis.
Changes:
- Extend samplesheet/schema parsing to carry
tumor_replicate/normal_replicateand propagate a unifiedmeta.replicate. - Run CRAMINO QC earlier (intended per-replicate) and publish CRAMINO_PRE outputs into replicate-specific directories.
- Update nf-test snapshot outputs to reflect replicate-specific QC directories and an added multi-replicate sample case.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/lrsomatic.nf | Adds replicate to meta, moves CRAMINO_PRE earlier, attempts to strip replicate before merge/concat. |
| subworkflows/local/utils_nfcore_lrsomatic_pipeline/main.nf | Parses new replicate columns and populates meta.replicate. |
| conf/modules.config | Updates CRAMINO_PRE publishDir to include replicate in the output path. |
| assets/schema_input.json | Adds tumor_replicate and normal_replicate fields with defaults. |
| tests/default.nf.test.snap | Updates expected outputs for replicate-specific QC directories and new sample outputs. |
Comments suppressed due to low confidence (1)
workflows/lrsomatic.nf:188
- After stripping
replicatefrommeta, the channel is not regrouped, so replicate rows will continue through as separate items (andSAMTOOLS_CATwill never seebam.size() > 1). This can lead to duplicate downstream processing and incorrect tumor/normal pairing in latergroupTuple(size: 2)logic. Regroup by the replicate-free meta (e.g.,groupTuple()then flatten) before branching/concatenation so replicates are merged into a single BAM list/file per sample/type.
ch_samplesheet
.map{ meta, bam ->
def new_meta = meta.subMap('id',
'paired_data',
'type',
'platform',
'sex',
'fiber',
'clair3_model',
'clairS_model',
'clairSTO_model',
'kinetics')
return[new_meta, bam]
}
.set{ch_samplesheet_no_rep}
// ch_samplesheet -> meta: [id, paired_data, platform, sex, type, fiber, basecall_model]
// bam: list of unaligned bams
ch_split = ch_samplesheet_no_rep
.branch { meta, bam ->
single: bam.size() == 1
multiple: bam.size() > 1
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
workflows/lrsomatic.nf
Outdated
|
|
||
| downloaded_clair3_models = PREPARE_REFERENCE_FILES.out.downloaded_clair3_models | ||
|
|
||
| ch_samplesheet.view() |
There was a problem hiding this comment.
ch_samplesheet.view() looks like a debugging statement and will spam logs / potentially expose input paths in normal runs. Consider removing it or gating behind a dedicated debug/trace parameter.
| ch_samplesheet.view() | |
| if (params.trace_samplesheet) { | |
| ch_samplesheet.view() | |
| } |
workflows/lrsomatic.nf
Outdated
| .mix ( ch_split.single ) | ||
| .set { ch_cat_ubams } | ||
|
|
||
| ch_cat_ubams.view() |
There was a problem hiding this comment.
ch_cat_ubams.view() looks like a debugging statement and will spam logs / potentially expose input paths in normal runs. Consider removing it or gating behind a dedicated debug/trace parameter.
| ch_cat_ubams.view() | |
| if( params.debug ) { | |
| ch_cat_ubams.view() | |
| } |
workflows/lrsomatic.nf
Outdated
| include { MINIMAP2_INDEX } from '../modules/nf-core/minimap2/index/main' | ||
| include { MINIMAP2_ALIGN } from '../modules/nf-core/minimap2/align/main' | ||
| include { CRAMINO as CRAMINO_PRE } from '../modules/local/cramino/main' | ||
| include { CRAMINO as CRAMINO_PRE_REPLICATES } from '../modules/local/cramino/main' |
There was a problem hiding this comment.
CRAMINO_PRE_REPLICATES is imported but never used. Please remove the unused include or switch the per-replicate QC invocation to use this alias (and keep CRAMINO_PRE for any non-replicate use) to avoid confusion and potential lint failures.
| include { CRAMINO as CRAMINO_PRE_REPLICATES } from '../modules/local/cramino/main' |
| .map { meta, bam_tumor, bam_normal -> | ||
| [ meta, bam_tumor.flatten(), bam_normal.flatten() ] | ||
| } | ||
| .view() |
There was a problem hiding this comment.
The .view() call in PIPELINE_INITIALISATION appears to be left-over debugging and will print samplesheet entries (including file paths) to stdout for every run. Consider removing it or gating it behind a debug/trace parameter.
| .view() |
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).