Wednesday, 9 November 2011

snpEff: SNP effect predictor a 2nd look,%20how%20do%20I%20analyze%20the%20variants?

Am taking a 2nd look at snpeff as a annotation tool for variation. I love programs that post a bash script howto since it mades it dead easy to understand if I can use it for my data or not
e.g. in the example below only the last step is for snpeff, but I have been generating variants using the exact same preceding steps. Anyone else used snpeff before? Want to post comments / reviews?
This is an extremelly simplified version on how to analyze the data from scratch (this is not meant to be a tutorial on sequencing analysis).
Let's assume you have sequence data in FASTQ format (file "s.fastq") and your reference genome is dm5.34
 # Download the genome 
gunzip dmel-all-chromosome-r5.34.fasta.gz 
mv dmel-all-chromosome-r5.34.fasta dm5.34.fasta 
# Create a genome index (we assume you installed 
bwa index -bwtsw dm5.34.fasta 
# Map sequences to the genome: Create SAI file 
bwa aln -bwtsw dm5.34.fasta s.fastq > s.sai 
# Map sequences to the genome: Create SAM file 
bwa samse dm5.34.fasta s.sai s.fastq > s.sam 
# Create BAM file (we assume you installed SamTools
samtools view -S -b s.sam > s.bam 
# Sort BAM file (will create s_sort.bam) 
samtools sort s.bam s_sort 
# Create VCF file (BcfTools is part of samtools distribution) 
samtools mpileup -uf dm5.34.fasta s_sort.bam | bcftools view -vcg - > s.vcf 
# Analyze variants using snpEff 
java -Xmx4g -jar snpEff.jar -vcf4 dm5.34 s.vcf > s_snpeff.txt     

1 comment:

  1. hi, something is wrong with formatting. The code block doesn't look properly - one long line of code.


Datanami, Woe be me