Very interesting to turn the assembly problem backwards ... though it has limited applications outside of phylogenomics I suppose since you need to have the protein sequences avail in the first place.
I am not sure if there are tools that can easily extract mini-assemblies from BAM files i.e. extract aligned reads (in their entirety instead of being trimmed by the region you specify)
which should be nice / useful to do when trying to look at assemblies in regions and trying to add new reads or info to them (Do we need a phrap/consed for NGS de novo assembly? )
Mol Phylogenet Evol. 2012 Sep 18. pii: S1055-7903(12)00364-8. doi: 10.1016/j.ympev.2012.09.007. [Epub ahead of print]
Next-generation Phylogenomics Using a Target Restricted Assembly Method.
Source
Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, Champaign, IL 61820, USA. Electronic address: kjohnson@inhs.uiuc.edu.
Abstract
Next-generation sequencing technologies are revolutionizing the field of phylogenetics by making available genome scale data for a fraction of the cost of traditional targeted sequencing. One challenge will be to make use of these genomic level data without necessarily resorting to full-scale genome assembly and annotation, which is often time and labor intensive. Here we describe a technique, the Target Restricted Assembly Method (TRAM), in which the typical process of genome assembly and annotation is in essence reversed. Protein sequences of phylogenetically useful genes from a species within the group of interest are used as targets in tblastn searches of a data set from a lane of Illumina reads for a related species. Resulting blast hits are then assembled locally into contigs and these contigs are then aligned against the reference "cDNA" sequence to remove portions of the sequences that include introns. We illustrate the Target Restricted Assembly Method using genomic scale datasets for 20 species of lice (Insecta: Psocodea) to produce a test phylogenetic data set of 10 nuclear protein coding gene sequences. Given the advantages of using DNA instead of RNA, this technique is very cost effective and feasible given current technologies.
Copyright © 2012. Published by Elsevier Inc.
- PMID:
- 23000819
- [PubMed - as supplied by publisher]
No comments:
Post a Comment