Showing posts with label bowtie. Show all posts
Showing posts with label bowtie. Show all posts

Friday, 15 July 2011

WGsim and bowtie paired end mapping

I wanted to simulate PE reads with wgsim but by using the default params, i didn't realise that I was creating reads with 500 bp insert. which doesn't go well with Bowtie's PE default of 250 bp which to cut a long story short, resulted in low pairing rates.

Only found this out by googling biostar  of course there is the obligatory discussion of merits of using bowtie versus bwa or vice versa. and there are some nice graphs in there that illustrate the problem at hand.

Monday, 4 July 2011

BWA SOLiD Paired Ends mapping

The short answer seems to be 'no' BWA can't do it yet.
(I went through steps like the one here to find that out in the end, the only difference is that I used a modified solid2fastq.pl to process the F5 correctly)
as
   bwa sampe
expects the orientation to be the same as SOLiD mate pair
see http://biostar.stackexchange.com/questions/9086/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe
while reverse complementing the F5 might work, that itself is problematic due to the colorspace nature of SOLiD reads.

Your options?
Bioscope :(
or bowtie (if you don't need indels) see http://bowtie-bio.sourceforge.net/manual.shtml#paired-end-colorspace-alignment

Thursday, 17 March 2011

Common numbers / statistics for Uniquely mapped reads?

Was asked if there was a commonly reported numbers for
uniquely mapped reads (which is troublesome to define with bowtie)
vs
total mapped reads

Not sure also if the numbers differ for applications
e.g.
WGS
exome reseq

human vs other organisms.
Got this figure from a 2009 paper. Not sure if anyone collates data like this
http://bioinformatics.oxfordjournals.org/content/25/7/969.full.pdf

Wednesday, 15 September 2010

Myrna-calculate differential gene expression on Elastic MapReduce or local Hadoop

The software, termed “Myrna” was funded in part by Amazon Web Services (in addition to the Bloomberg School of Public Health and the National Institutes of Health) was, not surprisingly, making use of compute resources from Amazon. In order to test Myrna, researchers rented time and storage resources from AWS and were able to realize solid performance and cost savings. According to the study's authors, “Myrna calculated differential expression from 1.1 billion RNA sequences reads in less than two hours at a cost of about $66.”

Note:
Myrna is a cloud computing tool for calculating differential gene expression in large RNA-seq datasets. Myrna uses Bowtie for short read alignment and R/Bioconductor for interval calculations, normalization, and statistical testing. These tools are combined in an automatic, parallel pipeline that runs in the cloud (Elastic MapReduce in this case) on a local Hadoop cluster, or on a single computer, exploiting multiple computers and CPUs wherever possible. 

also see

Cloud computing method greatly increases gene analysis

Friday, 9 April 2010

bowtie build time statistics / benchmark

Recently Built colorspace index for hg19 (from ftp://ftp.sanger.ac.uk/pub/1000genomes/tk2/main_project_reference/)

AMD Phenom II X4 955 chip
8 GB RAM

SATA2 HDD

Wrote 822714402 bytes to primary EBWT file: hg19.rev.1.ebwt
Wrote 358098108 bytes to secondary EBWT file: hg19.rev.2.ebwt
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 2864784823
    bwtLen: 2864784824
    sz: 716196206
    bwtSz: 716196206
    lineRate: 6
    linesPerSide: 1
    offRate: 5
    offMask: 0xffffffe0
    isaRate: -1
    isaMask: 0xffffffff
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 89524526
    offsSz: 358098104
    isaLen: 0
    isaSz: 0
    lineSz: 64
    sideSz: 64
    sideBwtSz: 56
    sideBwtLen: 224
    numSidePairs: 6394609
    numSides: 12789218
    numLines: 12789218
    ebwtTotLen: 818509952
    ebwtTotSz: 818509952
Total time for backward call to driver() for mirror index: 01:24:19

Gosh and I thought it will take hours!

Datanami, Woe be me