There's a new tool for SNP calling.
this post is useful in understanding the challenges of reducing false positives / background noise for the bona fide mutations from NGS data. For the wet lab, trying a variety of SNP callers and doing validation on the ones that are present in all of the SNP callers gives u a higher validation rate. But having the bona fide mutation doesn't mean you have hit on the correct mutation associated with your disease ...
Excerpted from
http://www.massgenomics.org/2011/12/somatic-mutation-detection-in-whole-genome-sequencing-data.html
SomaticSniper adds to the growing arsenal of tools developed by our group to address the significant challenges presented by next-generation sequencing data analysis.
this post is useful in understanding the challenges of reducing false positives / background noise for the bona fide mutations from NGS data. For the wet lab, trying a variety of SNP callers and doing validation on the ones that are present in all of the SNP callers gives u a higher validation rate. But having the bona fide mutation doesn't mean you have hit on the correct mutation associated with your disease ...
Excerpted from
http://www.massgenomics.org/2011/12/somatic-mutation-detection-in-whole-genome-sequencing-data.html
Filtering Out the Noise
No matter how good the mutation caller, there are going to be some false positives. This is because you're looking for a one-in-a-million event, a true somatic mutation. Raw SomaticSniper calls therefore undergo a series of Maq-inspired filters. Sites are retained if they meet these criteria:- Covered by at least 3 reads
- Consensus quality of at least 20
- Called a SNP in the tumor sample with SNP quality of at least 20
- Maximum mapping quality of at least 40
- No high-quality predicted indel within 10 bp
- No more than 2 other SNVs called within 10 bp
Frequent Sources of False Positives
Even sites that pass the filters above are vulnerable to certain sequencing and alignment artifacts that produce false positive calls. A detailed study revealed (as many in the field know already) a few common sources of false positives: strand bias, homopolymer sequences, paralogous reads (deriving from a paralogous region of the genome, but mapped to the wrong region, usually three or more substitutions), and the read position of the predicted variant. The latter type of artifact is something new; it turned out that variants only seen near the “effective” 3′ end of reads (the start of soft-trimmed bases or the actual end of the read if untrimmed) were more likely to be false positives. This may be a combination of sequencing error, which is higher at the 3′ end of reads, and alignment bias favoring mismatches over gaps near the ends of reads. In any case, false positives deriving from these common causes tend to have certain properties enabling them to be identified and removed while maintaining sensitivity for true mutations.SomaticSniper adds to the growing arsenal of tools developed by our group to address the significant challenges presented by next-generation sequencing data analysis.
No comments:
Post a Comment