Thursday 14 February 2013

Comparison of Sequencing Platforms for Single Nucleotide Variant Calls in a Human Sample

Saw figure 2 from this paper from Stephan Schuster in talks wayyy back and his point about using different platforms/chemistry to reduce bias was always in the back of my head when handling single platform data.

Great work getting this finally published. 

His criteria for variant calling should be also a good starting reference point. 

"We used SAMtools version 0.1.16 to call the variants in the Illumina reads. We required a minimum coverage of 4, a maximum coverage of 60 and a minimum quality of 20 for the SNPs and indels that were found to be on the autosomes. We reduced the maximum coverage requirement to 45 for the sex chromosomes and increased it to 10,000 for the mitochondrial DNA. Only homozygous SNP and indels calls were kept from the sex chromosomes and mtDNA."

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0055089

PLoS One. 2013;8(2):e55089. doi: 10.1371/journal.pone.0055089. Epub 2013 Feb 6.

Comparison of sequencing platforms for single nucleotide variant calls in a human sample.

Source

Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania, United States of America.

Abstract

Next-generation sequencings platforms coupled with advanced bioinformatic tools enable re-sequencing of the human genome at high-speed and large cost savings. We compare sequencing platforms from Roche/454(GS FLX), Illumina/HiSeq (HiSeq 2000), and Life Technologies/SOLiD (SOLiD 3 ECC) for their ability to identify single nucleotide substitutions in whole genome sequences from the same human sample. We report on significant GC-related bias observed in the data sequenced on Illumina and SOLiD platforms. The differences in the variant calls were investigated with regards to coverage, and sequencing error. Some of the variants called by only one or two of the platforms were experimentally tested using mass spectrometry; a method that is independent of DNA sequencing. We establish several causes why variants remained unreported, specific to each platform. We report the indel called using the three sequencing technologies and from the obtained results we conclude that sequencing human genomes with more than a single platform and multiple libraries is beneficial when high level of accuracy is required.

PMID:
 
23405114
 
[PubMed - in process]

No comments:

Post a Comment

Datanami, Woe be me