only using target question #992

HuZhuocheng · 2025-08-18T04:40:09Z

HuZhuocheng
Aug 18, 2025

Hello and thank you for the software developed by you and your team. I currently have an issue that I haven't found answers to elsewhere.

My data is NGS sequencing data of target hybrid panel, with only 9 mismatched normal samples. After using cnvkit to split the original bed file for normal samples, I deleted some segments based on depth and the sequencing quality of those segments (for example, if a bed segment was divided into three parts, I might have only deleted the middle part due to sequencing quality issues). Then, I reconstructed the reference using the bed file after deleting some segments and proceeded with subsequent steps. However, in the resulting reference, the average value of log2 ratios greater than 0 is only 0.04075515, while the average value of log2 ratios less than 0 is -0.1360878. I believe this leads to very low log2 > 0 values in tumor sample detection, which prevents me from setting a threshold to obtain amplicon results.

It's worth noting that I only used the target and did not use the antitarget. My reasoning is that there are many mapping failures and poor-quality segments in the non-target regions. Could you please advise whether I should still use the antitarget? Or should I correct the log2 ratios in the reference?

Best regards,

HuZhuocheng · 2025-08-18T04:50:23Z

HuZhuocheng
Aug 18, 2025
Author

And the BAM file I used has already undergone GATK's BQSR and had duplicates marked.

0 replies

HuZhuocheng · 2025-08-18T05:09:40Z

HuZhuocheng
Aug 18, 2025
Author

Among them, I checked the maximum value (0.31) of the part where log2 is greater than 0 in the reference.cnn file, which also looks strange. Below is the IGV screenshot of this site (the top is a normal sample, and the bottom is a cancer sample). This was originally a large bed segment, but after splitting, I removed many abnormal segments, so only three segments remain in this region.

0 replies

HuZhuocheng · 2025-08-18T05:28:58Z

HuZhuocheng
Aug 18, 2025
Author

I tried to add antitarget, but it seems that the changes in the reference.cnn are not significant. The average log2 value of the part where log2 is greater than 0 is still around 0.05, while the average log2 value of the part where log2 is less than 0 is greater than -0.13. Should I discard the reference built with only 9 mismatched normal samples and use flat.cnn instead?
Thank you very much.

0 replies

etal · 2025-08-20T21:49:14Z

etal
Aug 20, 2025
Maintainer

Here's a starting point in the docs:
https://cnvkit.readthedocs.io/en/stable/tumor.html

The skew in log2 values is not surprising; there tend to be more genomic regions with poor sequencing coverage than with abnormally high sequencing coverage. The "--drop-low-coverage" option may help with that.

There are a lot of other reasons why coverage depth can vary across the genome; reference log2 values drifting away from 0.0 is not necessarily incorrect if the bias is consistent across sequenced samples and not due to real copy number variation in the control samples. You may also try masking out problematic genomic regions with access -x and rely less on manual removal of bins in the constructed reference.

0 replies

HuZhuocheng · 2025-08-22T12:32:57Z

HuZhuocheng
Aug 22, 2025
Author

Thank you very much for your reply. I think I have successfully manually processed the bed file after autobin, and I have ensured that the average depth of fragments is greater than 150x.
My current concern is whether not using the antitarget method would result in better or worse outcomes for my NGS target hybrid sequencing data (considering that I believe the antitarget contains a lot of noisy situations with MQ=0 and other issues).
I noticed that you mentioned when analysing Targeted Amplicon Sequencing (TAS) data, can only use target. I'm not sure if it is working well in target hybrid data.
I have already used the results from cnvkit to run pureCN; would this affect pureCN? (If you don't mind answering.)
In any case, thank you very much for always responding to questions on GitHub. I have gained a lot of insights from other people's questions, and I am very grateful for your efforts and responses.

0 replies

etal · 2026-02-03T20:37:19Z

etal
Feb 3, 2026
Maintainer

Tough to say. I'd recommend trying it both ways and reviewing the results to see if either approach is clearly better for your data/assay. Many labs doing target enrichment (with baits, not TAS) will skip antitargets if they have very high on-target rates, since the remaining off-target reads are so thin the antitarget bins would have to be huge. But -- it's comples, so try an empirical approach.

Generally, if you have stable results from CNVkit they should work well with PureCN, but I can't speak to the details.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only using target question #992

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

only using target question #992

Uh oh!

HuZhuocheng Aug 18, 2025

Replies: 6 comments

Uh oh!

HuZhuocheng Aug 18, 2025 Author

Uh oh!

HuZhuocheng Aug 18, 2025 Author

Uh oh!

HuZhuocheng Aug 18, 2025 Author

Uh oh!

etal Aug 20, 2025 Maintainer

Uh oh!

HuZhuocheng Aug 22, 2025 Author

Uh oh!

etal Feb 3, 2026 Maintainer

HuZhuocheng
Aug 18, 2025

HuZhuocheng
Aug 18, 2025
Author

HuZhuocheng
Aug 18, 2025
Author

HuZhuocheng
Aug 18, 2025
Author

etal
Aug 20, 2025
Maintainer

HuZhuocheng
Aug 22, 2025
Author

etal
Feb 3, 2026
Maintainer