Kevin's GATTACA World: alignment

Showing posts with label alignment. Show all posts

Friday, 15 July 2011

WGsim and bowtie paired end mapping

I wanted to simulate PE reads with wgsim but by using the default params, i didn't realise that I was creating reads with 500 bp insert. which doesn't go well with Bowtie's PE default of 250 bp which to cut a long story short, resulted in low pairing rates.

Only found this out by googling biostar of course there is the obligatory discussion of merits of using bowtie versus bwa or vice versa. and there are some nice graphs in there that illustrate the problem at hand.

Monday, 4 July 2011

BWA SOLiD Paired Ends mapping

The short answer seems to be 'no' BWA can't do it yet.
(I went through steps like the one here to find that out in the end, the only difference is that I used a modified solid2fastq.pl to process the F5 correctly)
as
bwa sampe
expects the orientation to be the same as SOLiD mate pair
see http://biostar.stackexchange.com/questions/9086/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe
while reverse complementing the F5 might work, that itself is problematic due to the colorspace nature of SOLiD reads.

Your options?
Bioscope :(
or bowtie (if you don't need indels) see http://bowtie-bio.sourceforge.net/manual.shtml#paired-end-colorspace-alignment

Tuesday, 12 October 2010

SRMA: tool for Improved variant discovery through local re-alignment of short-read next-generation sequencing data

Have a look at this tool http://genomebiology.com/2010/11/10/R99/abstract
it is a realigner for NGS reads, that doesn't use a lot of ram. Not too sure how it compares to GATK's Local realignment around indels as it is not mentioned. but the authors used reads that were aligned with the popular BWA or BFAST as input. (Bowtie was left out though.)

Excerpted

SRMA was able to improve the ultimate variant calling using a variety of measures on the simulated data from two different popular aligners (BWA and BFAST. These aligners were selected based on their sensitivity to insertions and deletions (BFAST and BWA), since a property of SRMA is that it produces a better consensus around indel positions. The initial alignments from BFAST allow local SRMA re-alignment using the original color sequence and qualities to be assessed as BFAST retains this color space information. This further reduces the bias towards calling the reference allele at SNP positions in ABI SOLiD data, and reduces the false discovery rate of new variants. Thus, local re-alignment is a powerful approach to improving genomic sequencing with next generation sequencing technologies. The alignments to the reference genome were implicitly split into 1Mb regions and processed in parallel on a large computer cluster; the re-alignments from each region were then merged in a hierarchical fashion. This allows for the utilization of multi-core computers, with one re-alignment per core, as well as parallelization across a computer cluster or a cloud. The average peak memory utilization per process was 876Mb (on a single-core), with a maximum peak memory utilization of 1.25GB. On average, each 1Mb region required approximately 2.58 minutes to complete, requiring approximately 86.17 hours total running time for the whole U87MG genome. SRMA also supports re- alignment within user-specified regions for efficiency, so that only regions of interest need to be re-aligned. This is particularly useful for exome-sequencing or targeted re-sequencing data.

Sunday, 30 May 2010

Paper:Comparison of Multiple Genome Sequence Alignment Methods

Comparison of Multiple Genome Sequence Alignment Methods
Chen and Tompa, Nature Biotechnology
Xiaoyu Chen and Martin Tompa at the University of Washington in Seattle present their "comparative assessment of methods for aligning multiple genome sequences." In evaluating the level of agreement among the four ENCODE alignments, the team shows that Pecan "produces the most accurate or nearly most accurate alignment in all species and genomic location categories, while still providing coverage comparable to or better than that of the other alignments in the placental mammals."

Kevin's GATTACA World