Improving Detection of Genome Structural Variation
from MassGenomics by Dan Koboldt
Large-scale structural variation (SV) is pervasive in the human genome, both in healthy individuals and in tumor cells. Numerous methods have been developed to detect such variants, most of which rely on the information provided by molecularly paired reads. Even the most sophisticated methods, however, still generate numerous false positives. A new study in Nature Genetics describes an innovative, population-based method to improve the accuracy of SV calling. In their introduction, Handsaker et al offer four main causes underlying false positives in SV calls:
References
Handsaker RE, Korn JM, Nemesh J, & McCarroll SA (2011). Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature genetics, 43 (3), 269-76 PMID: 21317889
- Sequencing errors, which occur more frequently in next-generation sequencing data and exhibit both random and platform-specific bias distributions.
- Chimeric molecules, in which read pairs linking two non-continguous segments of DNA masquerade as SVs. Sequencing libraries can contain millions of such fragments, which represent ~1% of sequence reads.
- Read depth variation, which fluctuates across the genome for both biological and technical reasons.
- Genome repeats, which confound most short-read aligners even when read pairing information is available.
References
Handsaker RE, Korn JM, Nemesh J, & McCarroll SA (2011). Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature genetics, 43 (3), 269-76 PMID: 21317889
No comments:
Post a Comment