Conversation
When `cnvkit call --purity` rescaled observed BAFs to estimate tumor BAFs, the formula could produce values outside [0, 1] -- and the negative or >1 results were written straight into the .cns "baf" column. This happens when the input VCF violates the heterozygous-germline-SNP assumption (e.g. somatic-only variants) or the purity estimate is too low. `rescale_baf` now clamps output to [0, 1] and logs a warning with the count of affected segments, suggesting the likely causes. A small tolerance avoids spurious warnings for values within float-arithmetic noise of the boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #616 noted that the VCF, BAF rescaling, and LOH detection parts of the docs were obscure. Restructure the docs and add load-time sanity checks that catch the most common misuses before they produce nonsense. Docs: * `doc/baf.rst` becomes the canonical guide -- expanded with sections on VCF preparation, sample identification (-i/-n, PEDIGREE), BAF rescaling math (with equations), allele-specific copy number / LOH, and a troubleshooting section that documents the load-time and rescale-time warnings. * `doc/fileformats.rst` and `doc/pipeline.rst` now defer to baf.rst rather than duplicating VCF-prep guidance. Code: * `cnvlib/cmdutil.py` -- new `_warn_if_baf_input_suspicious` helper invoked from `load_het_snps`, warning when no heterozygous variants survive filtering or when the median allele frequency is far from 0.5 (the latter catches misuse like wrong sample IDs or somatic-only VCFs lacking a SOMATIC INFO tag). Tests: * `test/test_commands.py` adds `LoadHetSnpsTests` with seven cases covering empty inputs, skewed/balanced distributions, the small-set silence rule, the empty-VCF integration path, the legitimate-VCF silence path, and the missing-sample-IDs warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
|
Claude finished @etal's task —— View job Code Review for PR #1047: Issues 601, 616Reviewing changes for negative BAF fixes and VCF documentation updates. Review Progress
SummaryThis PR addresses issues #601 and #616 by implementing proper clamping of BAF values to [0, 1] and adding comprehensive VCF documentation. The changes are well-implemented with excellent test coverage. Detailed Review✅ Core Logic Changes (call.py) Lines 25-28, 565-578: The addition of
Lines 186-187: Integration with the main pipeline is clean and maintains backward compatibility. ✅ VCF Input Validation (cmdutil.py) Lines 24-29, 93-125: Excellent addition of input validation for heterozygous SNPs:
Lines 82-86: Integration into ✅ Test Coverage test_commands.py (lines +69): The
test_properties.py (+50 lines): Outstanding property-based tests for
✅ Documentation Updates doc/baf.rst: Significantly improved with clearer structure:
doc/fileformats.rst & doc/pipeline.rst: Appropriate cross-reference updates maintain consistency. Code Quality AssessmentStyle & Conventions: ✅
Performance: ✅
Security: ✅
Maintainability: ✅
Suggestions for Improvement
Clinical Impact Assessment
Final AssessmentStrong approval recommended. This is a well-crafted fix that:
The implementation demonstrates thoughtful engineering with comprehensive edge case handling and clear communication to users about potential issues. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1047 +/- ##
==========================================
+ Coverage 66.41% 66.54% +0.12%
==========================================
Files 72 72
Lines 7409 7425 +16
Branches 1303 1307 +4
==========================================
+ Hits 4921 4941 +20
+ Misses 2049 2048 -1
+ Partials 439 436 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Addresses #601 and #616 --
Ensure negative BAF values aren't produced, even from VCFs that break assumptions/expectations.
Update and clarify documentation around VCF handling.