#install prereq else you will get utils.c:33:18: fatal error: zlib.h: No such file or directory
sudo apt-get install zlib1g-dev
#download the latest version and compile
$ wget http://downloads.sourceforge.net/project/bio-bwa/bwa-0.7.12.tar.bz2
$ tar jxvf bwa-0.7.12.tar.bz2
$ cd bwa-0.7.12/
$ make
Showing posts with label bwa. Show all posts
Showing posts with label bwa. Show all posts
Wednesday, 9 November 2016
Compiling BWA on Ubuntu 16.04.1 LTS
Friday, 15 July 2011
WGsim and bowtie paired end mapping
I wanted to simulate PE reads with wgsim but by using the default params, i didn't realise that I was creating reads with 500 bp insert. which doesn't go well with Bowtie's PE default of 250 bp which to cut a long story short, resulted in low pairing rates.
Only found this out by googling biostar of course there is the obligatory discussion of merits of using bowtie versus bwa or vice versa. and there are some nice graphs in there that illustrate the problem at hand.
Only found this out by googling biostar of course there is the obligatory discussion of merits of using bowtie versus bwa or vice versa. and there are some nice graphs in there that illustrate the problem at hand.
Thursday, 7 July 2011
BGI Announces Cloud Genome Assembly Service
I am very excited about cloud solutions for de novo assembly as they are quite computational intensive and with parameters tweaking, you have a massive parallelization problem that just begs for computer cores. I do wonder if there's a need for a cloud solution for resequencing pipelines, especially when it involves BWA which can be run rather efficiently on desktop or in house clusters. Only whole genome reseq might require more compute hours, but I would think that any center that does WGS regularly would at least have a genome reseq capable cluster at the very minimum to just store the data before it is analyzed.
Anyway let's see if BGI will change the computational cloud scene ...
By Allison Proffitt
July 6, 2011 | SHENZHEN, CHINA—At the BGI Bioinformatics Software Release Conference today, researchers announced two new Cloud-based software-as-a-service offerings for next-gen data analysis. Hecate and Gaea (named for Greek gods) are “flexible computing” solutions for do novo assembly and genome resequencing.
These are “cloud-based services for genetic researchers” so that researchers don’t need to “purchase your own cloud clusters,” said Evan Xiang, part of the flexible computing group at BGI Shenzhen. Hecate will do de novo assembly, and Gaea will run the SOAP2, BWA, Samtools, DIndel, and BGI’s realSFS algorithms. Xiang expects an updated version of Gaea to be released later this year with more algorithms available. .......full article
Anyway let's see if BGI will change the computational cloud scene ...
By Allison Proffitt
July 6, 2011 | SHENZHEN, CHINA—At the BGI Bioinformatics Software Release Conference today, researchers announced two new Cloud-based software-as-a-service offerings for next-gen data analysis. Hecate and Gaea (named for Greek gods) are “flexible computing” solutions for do novo assembly and genome resequencing.
These are “cloud-based services for genetic researchers” so that researchers don’t need to “purchase your own cloud clusters,” said Evan Xiang, part of the flexible computing group at BGI Shenzhen. Hecate will do de novo assembly, and Gaea will run the SOAP2, BWA, Samtools, DIndel, and BGI’s realSFS algorithms. Xiang expects an updated version of Gaea to be released later this year with more algorithms available. .......full article
Labels:
BGI,
bioinformatics,
bwa,
cloud,
cloud computing,
Next Generation Sequencing,
NGS
Monday, 4 July 2011
BWA SOLiD Paired Ends mapping
The short answer seems to be 'no' BWA can't do it yet.
(I went through steps like the one here to find that out in the end, the only difference is that I used a modified solid2fastq.pl to process the F5 correctly)
as
bwa sampe
expects the orientation to be the same as SOLiD mate pair
see http://biostar.stackexchange.com/questions/9086/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe
while reverse complementing the F5 might work, that itself is problematic due to the colorspace nature of SOLiD reads.
Your options?
Bioscope :(
or bowtie (if you don't need indels) see http://bowtie-bio.sourceforge.net/manual.shtml#paired-end-colorspace-alignment
(I went through steps like the one here to find that out in the end, the only difference is that I used a modified solid2fastq.pl to process the F5 correctly)
as
bwa sampe
expects the orientation to be the same as SOLiD mate pair
see http://biostar.stackexchange.com/questions/9086/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe
while reverse complementing the F5 might work, that itself is problematic due to the colorspace nature of SOLiD reads.
Your options?
Bioscope :(
or bowtie (if you don't need indels) see http://bowtie-bio.sourceforge.net/manual.shtml#paired-end-colorspace-alignment
Friday, 10 June 2011
BWA to support multiple hits as separate lines in SAM with addon pl script
This is the reason why I love open source communities / software.
After a brief discussion and request for BWA to also report multiple hits as separate entries in sam/bam files. The author of BWA (Li Heng) promptly released a addon perl script to allow for this feature.
commercial providers: try to beat that for speed for new feature release!
Anyway if you are interested on the usage:
A new script xa2multi.pl is added to convert XA:Z tag to multiple lines.
bwa samse ref.fa reads.sai reads.fq.gz | xa2multi.pl > out.sam
A related question was also posted on biostars
Question: How to force 'bwa samse' to output multiple hits in .sam format?
http://www.biostars.org/p/45430/
After a brief discussion and request for BWA to also report multiple hits as separate entries in sam/bam files. The author of BWA (Li Heng) promptly released a addon perl script to allow for this feature.
commercial providers: try to beat that for speed for new feature release!
Anyway if you are interested on the usage:
A new script xa2multi.pl is added to convert XA:Z tag to multiple lines.
bwa samse ref.fa reads.sai reads.fq.gz | xa2multi.pl > out.sam
A related question was also posted on biostars
Question: How to force 'bwa samse' to output multiple hits in .sam format?
http://www.biostars.org/p/45430/
Thursday, 17 March 2011
Common numbers / statistics for Uniquely mapped reads?
Was asked if there was a commonly reported numbers for
uniquely mapped reads (which is troublesome to define with bowtie)
vs
total mapped reads
Not sure also if the numbers differ for applications
e.g.
WGS
exome reseq
human vs other organisms.
Got this figure from a 2009 paper. Not sure if anyone collates data like this
http://bioinformatics.oxfordjournals.org/content/25/7/969.full.pdf
uniquely mapped reads (which is troublesome to define with bowtie)
vs
total mapped reads
Not sure also if the numbers differ for applications
e.g.
WGS
exome reseq
human vs other organisms.
Got this figure from a 2009 paper. Not sure if anyone collates data like this
http://bioinformatics.oxfordjournals.org/content/25/7/969.full.pdf
Labels:
bowtie,
bwa,
journal,
NGS,
statistics,
uniquely mapped
Wednesday, 25 August 2010
howto do BWA mapping in colorspace
Here's what I use for bwa alignment (without removing PCR dups).
You can replace the paths with your own and put into a bash script for automation
comments or corrections welcome!
#Visit kevin-gattaca.blogspot.com to see updates of this template!
#http://kevin-gattaca.blogspot.com/2010/08/howto-do-bwa-mapping-in-colorspace.html
#updated 16th Mar 2011
#Creates colorspace index
bwa index -a bwtsw -c hg18.fasta
#convert to fastq.gz
perl /opt/bwa-0.5.7/solid2fastq.pl Sample-input-prefix-name Sample
#aln using 4 threads
#-l 25 seed length
#-k 2 mismatches allowed in seed
#-n 10 total mismatches allowed
bwa aln -c -t 4 -l 25 -k 2 -n 10 /data/public/bwa-color-index/hg18.fasta Sample.single.fastq.gz > Sample.bwa.hg18.sai
#for bwa samse
bwa samse /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sai Sample.single.fastq.gz > Sample.bwa.hg18.sam
#creates bam file from pre-generated .fai file
samtools view -bt /data/public/hg18.fasta.fai -o Sample.bwa.hg18.sam.bam Sample.bwa.hg18.sam
#sorts bam file
samtools sort Sample.bwa.hg18.sam.bam{,.sorted}
#From a sorted BAM alignment, raw SNP and indel calls are acquired by:
samtools pileup -vcf /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup
#resultant output should be further filtered by:
/opt/samtools/misc/samtools.pl varFilter Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup | awk '$6>=20' > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup.final.pileup
#new section using mpileup and bcftools to generate vcf files
samtools mpileup -ugf hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
Do note the helpful comments below! Repost here for clarity.
Different anon here. But try -n 3 and -e 10 and see how that works for you. Then filter out low quality alignments (MAPQ < 10) before you do any variant calling.
Also, depending on your task, you might consider disabling seeding altogether to get an even more sensitive alignment. -l 1000 should do that.
Also:
1) bwa is a global aligner with respect to reads, so consider trimming low-quality bases off the end of your reads with "bwa aln -q 10".
2) For user comprehension, it's easier if you replace "samtools view -bt /data/public/hg18.fasta.fai ..." with "samtools view -bT /data/public/hg18.fasta ..."
The T option handles reference files directly rather than having to deal with a .fai index file (which you haven't told people how to create in this guide).
2) Use "samtools view -F 4 -q 10" to get rid of unaligned reads (which are still in double-encoded color space) and dodgy alignments.
3) Use "samtools calmd" to correct MD and NM tags. (However, I'm not sure if this is necessary/helpful.)
4) Use Picard's SortSam and MarkDuplicates to take care of PCR duplicates.
5) View the alignments with samtools tview.
You can replace the paths with your own and put into a bash script for automation
comments or corrections welcome!
#Visit kevin-gattaca.blogspot.com to see updates of this template!
#http://kevin-gattaca.blogspot.com/2010/08/howto-do-bwa-mapping-in-colorspace.html
#updated 16th Mar 2011
#Creates colorspace index
bwa index -a bwtsw -c hg18.fasta
#convert to fastq.gz
perl /opt/bwa-0.5.7/solid2fastq.pl Sample-input-prefix-name Sample
#aln using 4 threads
#-l 25 seed length
#-k 2 mismatches allowed in seed
#-n 10 total mismatches allowed
bwa aln -c -t 4 -l 25 -k 2 -n 10 /data/public/bwa-color-index/hg18.fasta Sample.single.fastq.gz > Sample.bwa.hg18.sai
#for bwa samse
bwa samse /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sai Sample.single.fastq.gz > Sample.bwa.hg18.sam
#creates bam file from pre-generated .fai file
samtools view -bt /data/public/hg18.fasta.fai -o Sample.bwa.hg18.sam.bam Sample.bwa.hg18.sam
#sorts bam file
samtools sort Sample.bwa.hg18.sam.bam{,.sorted}
#From a sorted BAM alignment, raw SNP and indel calls are acquired by:
samtools pileup -vcf /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup
#resultant output should be further filtered by:
/opt/samtools/misc/samtools.pl varFilter Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup | awk '$6>=20' > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup.final.pileup
#new section using mpileup and bcftools to generate vcf files
samtools mpileup -ugf hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
Do note the helpful comments below! Repost here for clarity.
Different anon here. But try -n 3 and -e 10 and see how that works for you. Then filter out low quality alignments (MAPQ < 10) before you do any variant calling.
Also, depending on your task, you might consider disabling seeding altogether to get an even more sensitive alignment. -l 1000 should do that.
Also:
1) bwa is a global aligner with respect to reads, so consider trimming low-quality bases off the end of your reads with "bwa aln -q 10".
2) For user comprehension, it's easier if you replace "samtools view -bt /data/public/hg18.fasta.fai ..." with "samtools view -bT /data/public/hg18.fasta ..."
The T option handles reference files directly rather than having to deal with a .fai index file (which you haven't told people how to create in this guide).
2) Use "samtools view -F 4 -q 10" to get rid of unaligned reads (which are still in double-encoded color space) and dodgy alignments.
3) Use "samtools calmd" to correct MD and NM tags. (However, I'm not sure if this is necessary/helpful.)
4) Use Picard's SortSam and MarkDuplicates to take care of PCR duplicates.
5) View the alignments with samtools tview.
Labels:
bwa,
colorspace,
howto,
mapping,
NGS,
opensource,
software,
SOLiD,
tutorial
Tuesday, 10 August 2010
BWA sai files are useless.
if you know what sai means in a particular chinese dialect, you would have known that BWA sai files are redundant. Well it took abit of googling for me to know this from seqanswers
"No, sai is a fast changing format and does not guarantee backward compatibility at all. One should not keep sai files. I always delete them when I get SAM output."
"No, sai is a fast changing format and does not guarantee backward compatibility at all. One should not keep sai files. I always delete them when I get SAM output."
Subscribe to:
Posts (Atom)