Wednesday, 31 October 2012

Rare variant discovery and calling by sequencing pooled samples with overlaps. Wang W, Yin X, Pyon YS, Hayes M, Li J.

Item 1 of 1    (Display the citation in PubMed)

1. Bioinformatics. 2012 Oct 27. [Epub ahead of print]

Rare variant discovery and calling by sequencing pooled samples with overlaps.

Wang W, Yin X, Pyon YS, Hayes M, Li J.


Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA.



For many complex traits/diseases, it is believed that rare variants account for some of the missing heritability that cannot be explained by common variants. Sequencing a large number of samples through DNA pooling is a cost effective strategy to discover rare variants and to investigate their associations with phenotypes. Overlapping pool designs provide further benefit because such approaches can potentially identify variant carriers, which is important for downstream applications of association analysis of rare variants. However, existing algorithms for analyzing sequence data from overlapping pools are limited.


We propose a complete data analysis framework for overlapping pool designs, with novelties in all three major steps: variant pool and variant locus identification, variant allele frequency estimation and variant sample decoding. The framework can be utilized in combination with any design matrix. We have investigated its performance based on two different overlapping designs, and have compared it with three state-of-the-art methods, by simulating targeted sequencing and by pooling real sequence data. Results on both datasets show that our algorithm has made significant improvements over existing ones. In conclusion, successful discovery of rare variants and identification of variant carriers using overlapping pool strategies critically depends on many steps, from generation of design matrixes to decoding algorithms. The proposed framework in combination with the design matrixes generated based on the Chinese remainder theorem achieves best overall results.


Source code of the program, termed VIP for Variant Identification by Pooling, is available at


PMID: 23104896 [PubMed - as supplied by publisher]
Icon for HighWire Press

No comments:

Post a Comment

Datanami, Woe be me