I wanted to simulate PE reads with wgsim but by using the default params, i didn't realise that I was creating reads with 500 bp insert. which doesn't go well with Bowtie's PE default of 250 bp which to cut a long story short, resulted in low pairing rates.
Only found this out by googling biostar of course there is the obligatory discussion of merits of using bowtie versus bwa or vice versa. and there are some nice graphs in there that illustrate the problem at hand.
Showing posts with label bowtie. Show all posts
Showing posts with label bowtie. Show all posts
Friday, 15 July 2011
WGsim and bowtie paired end mapping
Monday, 4 July 2011
BWA SOLiD Paired Ends mapping
The short answer seems to be 'no' BWA can't do it yet.
(I went through steps like the one here to find that out in the end, the only difference is that I used a modified solid2fastq.pl to process the F5 correctly)
as
bwa sampe
expects the orientation to be the same as SOLiD mate pair
see http://biostar.stackexchange.com/questions/9086/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe
while reverse complementing the F5 might work, that itself is problematic due to the colorspace nature of SOLiD reads.
Your options?
Bioscope :(
or bowtie (if you don't need indels) see http://bowtie-bio.sourceforge.net/manual.shtml#paired-end-colorspace-alignment
(I went through steps like the one here to find that out in the end, the only difference is that I used a modified solid2fastq.pl to process the F5 correctly)
as
bwa sampe
expects the orientation to be the same as SOLiD mate pair
see http://biostar.stackexchange.com/questions/9086/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe
while reverse complementing the F5 might work, that itself is problematic due to the colorspace nature of SOLiD reads.
Your options?
Bioscope :(
or bowtie (if you don't need indels) see http://bowtie-bio.sourceforge.net/manual.shtml#paired-end-colorspace-alignment
Thursday, 17 March 2011
Common numbers / statistics for Uniquely mapped reads?
Was asked if there was a commonly reported numbers for
uniquely mapped reads (which is troublesome to define with bowtie)
vs
total mapped reads
Not sure also if the numbers differ for applications
e.g.
WGS
exome reseq
human vs other organisms.
Got this figure from a 2009 paper. Not sure if anyone collates data like this
http://bioinformatics.oxfordjournals.org/content/25/7/969.full.pdf
uniquely mapped reads (which is troublesome to define with bowtie)
vs
total mapped reads
Not sure also if the numbers differ for applications
e.g.
WGS
exome reseq
human vs other organisms.
Got this figure from a 2009 paper. Not sure if anyone collates data like this
http://bioinformatics.oxfordjournals.org/content/25/7/969.full.pdf
Labels:
bowtie,
bwa,
journal,
NGS,
statistics,
uniquely mapped
Wednesday, 15 September 2010
Myrna-calculate differential gene expression on Elastic MapReduce or local Hadoop
The software, termed “Myrna” was funded in part by Amazon Web Services (in addition to the Bloomberg School of Public Health and the National Institutes of Health) was, not surprisingly, making use of compute resources from Amazon. In order to test Myrna, researchers rented time and storage resources from AWS and were able to realize solid performance and cost savings. According to the study's authors, “Myrna calculated differential expression from 1.1 billion RNA sequences reads in less than two hours at a cost of about $66.”
Note:
Myrna is a cloud computing tool for calculating differential gene expression in large RNA-seq datasets. Myrna uses Bowtie for short read alignment and R/Bioconductor for interval calculations, normalization, and statistical testing. These tools are combined in an automatic, parallel pipeline that runs in the cloud (Elastic MapReduce in this case) on a local Hadoop cluster, or on a single computer, exploiting multiple computers and CPUs wherever possible.
also see
Note:
Myrna is a cloud computing tool for calculating differential gene expression in large RNA-seq datasets. Myrna uses Bowtie for short read alignment and R/Bioconductor for interval calculations, normalization, and statistical testing. These tools are combined in an automatic, parallel pipeline that runs in the cloud (Elastic MapReduce in this case) on a local Hadoop cluster, or on a single computer, exploiting multiple computers and CPUs wherever possible.
also see
Cloud computing method greatly increases gene analysis
Labels:
Amazon Web Services,
AWS,
Bioconductor,
bioinformatics,
bowtie,
cloud,
Hadoop,
news,
Next Generation Sequencing,
software
Friday, 9 April 2010
bowtie build time statistics / benchmark
Recently Built colorspace index for hg19 (from ftp://ftp.sanger.ac.uk/pub/1000genomes/tk2/main_project_reference/)
AMD Phenom II X4 955 chip
8 GB RAM
SATA2 HDD
Wrote 822714402 bytes to primary EBWT file: hg19.rev.1.ebwt
Wrote 358098108 bytes to secondary EBWT file: hg19.rev.2.ebwt
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 2864784823
bwtLen: 2864784824
sz: 716196206
bwtSz: 716196206
lineRate: 6
linesPerSide: 1
offRate: 5
offMask: 0xffffffe0
isaRate: -1
isaMask: 0xffffffff
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 89524526
offsSz: 358098104
isaLen: 0
isaSz: 0
lineSz: 64
sideSz: 64
sideBwtSz: 56
sideBwtLen: 224
numSidePairs: 6394609
numSides: 12789218
numLines: 12789218
ebwtTotLen: 818509952
ebwtTotSz: 818509952
Total time for backward call to driver() for mirror index: 01:24:19
Gosh and I thought it will take hours!
AMD Phenom II X4 955 chip
8 GB RAM
SATA2 HDD
Wrote 822714402 bytes to primary EBWT file: hg19.rev.1.ebwt
Wrote 358098108 bytes to secondary EBWT file: hg19.rev.2.ebwt
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 2864784823
bwtLen: 2864784824
sz: 716196206
bwtSz: 716196206
lineRate: 6
linesPerSide: 1
offRate: 5
offMask: 0xffffffe0
isaRate: -1
isaMask: 0xffffffff
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 89524526
offsSz: 358098104
isaLen: 0
isaSz: 0
lineSz: 64
sideSz: 64
sideBwtSz: 56
sideBwtLen: 224
numSidePairs: 6394609
numSides: 12789218
numLines: 12789218
ebwtTotLen: 818509952
ebwtTotSz: 818509952
Total time for backward call to driver() for mirror index: 01:24:19
Gosh and I thought it will take hours!
Subscribe to:
Posts (Atom)