Tuesday, 12 October 2010

SRMA: tool for Improved variant discovery through local re-alignment of short-read next-generation sequencing data

Have a look at this tool http://genomebiology.com/2010/11/10/R99/abstract
it is a realigner for NGS reads, that doesn't use a lot of ram. Not too sure how it compares to GATK's Local realignment around indels as it is not mentioned. but the authors used reads that were aligned with the popular BWA or BFAST as input. (Bowtie was left out though.)

 SRMA was able to improve the ultimate variant calling using a variety of measures on the simulated data from two different popular aligners (BWA and BFAST. These aligners were selected based on their sensitivity to insertions and deletions (BFAST and BWA), since a property of SRMA is that it produces a better consensus around indel positions. The initial alignments from BFAST allow local SRMA re-alignment using the original color sequence and qualities to be assessed as BFAST retains this color space information. This further reduces the bias towards calling the reference allele at SNP positions in ABI SOLiD data, and reduces the false discovery rate of new variants. Thus, local re-alignment is a powerful approach to improving genomic sequencing with next generation sequencing technologies.  The alignments to the reference genome were implicitly split into 1Mb regions and processed in parallel on a large computer cluster; the re-alignments from each region were then merged in a hierarchical fashion. This allows for the utilization of multi-core computers, with one re-alignment per core, as well as parallelization across a computer cluster or a cloud.  The average peak memory utilization per process was 876Mb (on a single-core), with a maximum peak memory utilization of 1.25GB. On average, each 1Mb region required approximately 2.58 minutes to complete, requiring approximately 86.17 hours total running time for the whole U87MG genome. SRMA also supports re- alignment within user-specified regions for efficiency, so that only regions of interest need to be re-aligned. This is particularly useful for exome-sequencing or targeted re-sequencing data.


  1. I've just compared it with GATK and the "raw" bwa output.
    Looking at samtools pileup results, 96% GATK SNPs also appear in raw bwa (although it's only ~80% of raw bwa SNPs). For srma however the overlap is only 50% ! Don't even know what to think...

  2. Update: I contacted Nils Homer (the author of srma) with this, and this turned out to be a bug which he's apparently now fixed (as of today's release 0.1.11)

  3. Thanks Mikhail and Nils! Lovely prompt action by both! I had meant to forward the comment but got bogged down by work.

  4. Hi,
    @Mikhail: I would be interested if you would like to share your pipeline from sam format to SNPs/INDELs using GATK/SRMA. I am currently working on 50b solid data and comapring BFAST and ShIMP to perform the alignement.


Datanami, Woe be me