Item 1 of 1 (Display the citation in PubMed)
|1.||Bioinformatics. 2012 Sep 1. [Epub ahead of print]|
Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads.Chong Z, Ruan J, Wu CI.
Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, Graduate University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China and Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637.
The innovation of Restriction site Associated DNA sequencing (RAD-seq) method takes full advantage of next-generation sequencing technology. By clustering paired-end short reads into groups with their own unique tags, RAD-seq assembly problem is divided into subproblems. Fast and accurately clustering and assembling millions of RAD-seq reads with sequencing errors, different levels of heterozygosity and repetitive sequences is a challenging question.
Rainbow is developed to provide an ultra-fast and memory-efficient solution to clustering and assembling short reads produced by RAD-seq. First, Rainbow clusters reads using a spaced seed method. Then, Rainbow implements a heterozygote calling like strategy to divide potential groups into haplotypes in a top-down manner. And along a guided tree, it iteratively merges sibling leaves in a bottom-up manner if they are similar enough. Here, the similarity is defined by comparing the 2nd reads of a RAD segment. This approach tries to collapse heterozygote while discriminate repetitive sequences. At last, Rainbow uses a greedy algorithm to locally assemble merged reads into contigs. Rainbow not only outputs the optimal but also suboptimal assembly results. Based on simulation and a real guppy RAD-seq data, we show that Rainbow is more competent than the other tools in dealing with RAD-seq data.
Source code in C, Rainbow is freely available at http://sourceforge.net/projects/bio-rainbow/files/
|PMID: 22942077 [PubMed - as supplied by publisher]|