Bioinformatics. 2012 Apr 2. [Epub ahead of print]
CONTRA: copy number analysis for targeted resequencing.
Li J, Lupat R, Amarasinghe KC, Thompson ER, Doyle MA, Ryland GL,
Tothill RW, Halgamuge SK, Campbell IG, Gorringe KL.
Bioinformatics Core Facility, Victorian Breast Cancer Research
Consortium Cancer Genetics Laboratory and Molecular Genomics Core
Facility, Peter MacCallum Cancer Centre, VIC 3002, Australia, 3Dept.
of Mechanical Engineering, Sir Peter MacCallum Dept. of Oncology, and
Dept. of Pathology, University of Melbourne, Parkville, VIC 3010,
In light of the increasing adoption of targeted resequencing as a
cost-effective strategy to identify disease-causing variants, a robust
method for copy number variation (CNV) analysis is needed to maximize
the value of this promising technology.
We present a method for CNV detection for targeted resequencing data,
including whole-exome capture data. Our method calls copy number gains
and losses for each target region based on normalized depth of
coverage. Our key strategies include the use of base-level log-ratios
to remove GC-content bias, correction for an imbalanced library size
effect on log-ratios, and the estimation of log-ratio variations via
binning and interpolation. Our methods are made available via CONTRA
(COpy Number Targeted Resequencing Analysis), a software package that
takes standard alignment formats (BAM/SAM) and outputs in variant call
format (VCF4.0), for easy integration with other next-generation
sequencing analysis packages. We assessed our methods using samples
from seven different target enrichment assays, and evaluated our
results using simulated data and real germline data with known CNV
genotypes.Availability and implementation: Source code and sample data
are freely available under GNU license (GPLv3) at