Showing posts with label bwa. Show all posts
Showing posts with label bwa. Show all posts

Wednesday, 9 November 2016

Compiling BWA on Ubuntu 16.04.1 LTS

#install prereq else you will get utils.c:33:18: fatal error: zlib.h: No such file or directory
sudo apt-get install zlib1g-dev

#download the latest version and compile
$ wget http://downloads.sourceforge.net/project/bio-bwa/bwa-0.7.12.tar.bz2
$ tar jxvf bwa-0.7.12.tar.bz2
$ cd bwa-0.7.12/
$ make 

Friday, 15 July 2011

WGsim and bowtie paired end mapping

I wanted to simulate PE reads with wgsim but by using the default params, i didn't realise that I was creating reads with 500 bp insert. which doesn't go well with Bowtie's PE default of 250 bp which to cut a long story short, resulted in low pairing rates.

Only found this out by googling biostar  of course there is the obligatory discussion of merits of using bowtie versus bwa or vice versa. and there are some nice graphs in there that illustrate the problem at hand.

Thursday, 7 July 2011

BGI Announces Cloud Genome Assembly Service

I am very excited about cloud solutions for de novo assembly as they are quite computational intensive and with parameters tweaking, you have a massive parallelization  problem that just begs for computer cores. I do wonder if there's a need for a cloud solution for resequencing pipelines, especially when it involves BWA which can be run rather efficiently on desktop or in house clusters. Only whole genome reseq might require more compute hours, but I would think that any center that does WGS regularly would at least have a genome reseq capable cluster at the very minimum to just store the data before it is analyzed.

Anyway let's see if BGI will change the computational cloud scene ...


By Allison Proffitt 
July 6, 2011 | SHENZHEN, CHINA—At the BGI Bioinformatics Software Release Conference today, researchers announced two new Cloud-based software-as-a-service offerings for next-gen data analysis. Hecate and Gaea (named for Greek gods) are “flexible computing” solutions for do novo assembly and genome resequencing.  
These are “cloud-based services for genetic researchers” so that researchers don’t need to “purchase your own cloud clusters,” said Evan Xiang, part of the flexible computing group at BGI Shenzhen. Hecate will do de novo assembly, and Gaea will run the SOAP2, BWA, Samtools, DIndel, and BGI’s realSFS algorithms. Xiang expects an updated version of Gaea to be released later this year with more algorithms available.  .......full article

Monday, 4 July 2011

BWA SOLiD Paired Ends mapping

The short answer seems to be 'no' BWA can't do it yet.
(I went through steps like the one here to find that out in the end, the only difference is that I used a modified solid2fastq.pl to process the F5 correctly)
as
   bwa sampe
expects the orientation to be the same as SOLiD mate pair
see http://biostar.stackexchange.com/questions/9086/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe
while reverse complementing the F5 might work, that itself is problematic due to the colorspace nature of SOLiD reads.

Your options?
Bioscope :(
or bowtie (if you don't need indels) see http://bowtie-bio.sourceforge.net/manual.shtml#paired-end-colorspace-alignment

Friday, 10 June 2011

BWA to support multiple hits as separate lines in SAM with addon pl script

This is the reason why I love open source communities / software.
After a brief discussion and request for BWA to also report multiple hits as separate entries in sam/bam files. The author of BWA (Li Heng) promptly released a addon perl script to allow for this feature.

commercial providers: try to beat that for speed for new feature release!

Anyway if you are interested on the usage:

A new script xa2multi.pl is added to convert XA:Z tag to multiple lines.

   
  bwa samse ref.fa reads.sai reads.fq.gz | xa2multi.pl > out.sam


A related question was also posted on biostars

Question: How to force 'bwa samse' to output multiple hits in .sam format?
http://www.biostars.org/p/45430/


Thursday, 17 March 2011

Common numbers / statistics for Uniquely mapped reads?

Was asked if there was a commonly reported numbers for
uniquely mapped reads (which is troublesome to define with bowtie)
vs
total mapped reads

Not sure also if the numbers differ for applications
e.g.
WGS
exome reseq

human vs other organisms.
Got this figure from a 2009 paper. Not sure if anyone collates data like this
http://bioinformatics.oxfordjournals.org/content/25/7/969.full.pdf

Wednesday, 25 August 2010

howto do BWA mapping in colorspace

Here's what I use for bwa alignment (without removing PCR dups).
You can replace the paths with your own and put into a bash script for automation
comments or corrections welcome!


#Visit kevin-gattaca.blogspot.com to see updates of this template!
#http://kevin-gattaca.blogspot.com/2010/08/howto-do-bwa-mapping-in-colorspace.html
#updated 16th Mar 2011
#Creates colorspace index
bwa index -a bwtsw -c hg18.fasta

#convert to fastq.gz
perl /opt/bwa-0.5.7/solid2fastq.pl Sample-input-prefix-name Sample

#aln using 4 threads
#-l 25        seed length
#-k 2         mismatches allowed in seed
#-n 10      total mismatches allowed

bwa aln -c -t 4 -l 25 -k 2 -n 10 /data/public/bwa-color-index/hg18.fasta Sample.single.fastq.gz > Sample.bwa.hg18.sai

#for bwa samse
bwa samse /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sai Sample.single.fastq.gz > Sample.bwa.hg18.sam

#creates bam file from pre-generated .fai file

samtools view -bt /data/public/hg18.fasta.fai -o Sample.bwa.hg18.sam.bam Sample.bwa.hg18.sam

#sorts bam file

samtools sort Sample.bwa.hg18.sam.bam{,.sorted}

#From a sorted BAM alignment, raw SNP and indel calls are acquired by:

samtools pileup -vcf /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup

#resultant output should be further filtered by:

/opt/samtools/misc/samtools.pl varFilter Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup | awk '$6>=20' > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup.final.pileup



#new section using mpileup and bcftools to generate vcf files
samtools mpileup -ugf hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf


Do note the helpful comments below! Repost here for clarity.

Different anon here. But try -n 3 and -e 10 and see how that works for you. Then filter out low quality alignments (MAPQ < 10) before you do any variant calling.




Also, depending on your task, you might consider disabling seeding altogether to get an even more sensitive alignment. -l 1000 should do that.


Also:


1) bwa is a global aligner with respect to reads, so consider trimming low-quality bases off the end of your reads with "bwa aln -q 10".


2) For user comprehension, it's easier if you replace "samtools view -bt /data/public/hg18.fasta.fai ..." with "samtools view -bT /data/public/hg18.fasta ..."


The T option handles reference files directly rather than having to deal with a .fai index file (which you haven't told people how to create in this guide).


2) Use "samtools view -F 4 -q 10" to get rid of unaligned reads (which are still in double-encoded color space) and dodgy alignments.


3) Use "samtools calmd" to correct MD and NM tags. (However, I'm not sure if this is necessary/helpful.)


4) Use Picard's SortSam and MarkDuplicates to take care of PCR duplicates.


5) View the alignments with samtools tview.

Tuesday, 10 August 2010

BWA sai files are useless.

if you know what sai means in a particular chinese dialect, you would have known that BWA sai files are redundant. Well it took abit of googling for me to know this from seqanswers

"No, sai is a fast changing format and does not guarantee backward compatibility at all. One should not keep sai files. I always delete them when I get SAM output."

Datanami, Woe be me