Monday, 6 September 2010

Evaluation of next generation sequencing platforms for population targeted sequencing studies

I came across this paper earlier but didn't have time to blog much about it.
Papers that compare the sequencing platforms are getting rarer as the hype for NGS dies down and people are more interested in the next next gen seq machines (usually termed single molecule seq )
targetted reseq is a popular use of NGS as prices for human whole genome reseq is still not within reach for most. (see Exome sequencing: the sweet spot before whole genomes. )

There are inherent biases that people should be aware of before they jump right into it.

1)The NGS technologies generate a large amount of sequence but, for the platforms that produce short-sequence reads, greater than half of this sequence is not usable. 
  • On average, 55% of the Illumina GA reads pass quality filters, of which approximately 77% align to the reference sequence
  • ABI SOLiD, approximately 35% of the reads pass quality filters, and subsequently 96% of the filtered reads align to the reference sequenc
  • n contrast to the platforms generating short-read lengths, approximately 95% of the Roche 454 reads uniquely align to the target sequence.

Admittedly, the numbers have changed for this now that Illumina has longer read lengths. (the paper tested 36 bp vs 35 bp )

2) For PCR-based targetted sequencing, they observed that the mapped sequences corresponding to the 50 bp at the ends and the overlapping intervals of the amplicons have extremely high coverage. 
  • These regions, representing about 2.3% (approximately 6 kb) of the targeted intervals, account for up to 56% of the sequenced base pairs for Illumina GA technology.
  • For the ABI SOLiD platform an amplicon end depletion protocol was employed to remove the overrepresented amplicon ends; this was partially successful and resulted in the ends accounting for up to 11% of the sequenced base pairs.  
  • For the Roche 454 technology, overrepresentation of amplicon ends versus internal bases is substantially less, with the ends composing only 5% of the total sequenced bases; this is likely due to library preparation process differences between Roche 454 and the short-read length platforms.
The overrepresentation of amplicon end sequences is not only wasteful for the sequencing yield but also decreases the expected average coverage depth across the targeted intervals. Therefore, to accurately assess the consequences of sequence coverage on data quality, we removed the 50 bp at the ends of the amplicons from subsequent analyses. 

I am not sure if this has changed since.

Note: Will update thoughts when i have more time.

Other Interesting papers
WGS vs exome seq
Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations.
Identification by whole-genome resequencing of gene defect responsible for severe hypercholesterolemia.

Exome sequencing: the sweet spot before whole genomes.
Whole human exome capture for high-throughput sequencing.
Screening the human exome: a comparison of whole genome and whole transcriptome sequencing.
Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing. 

Family-based analysis and exome seq
Molecular basis of a linkage peak: exome sequencing and family-based analysis identify a rare genetic variant in the ADIPOQ gene in the IRAS Family Study.

No comments:

Post a Comment

Datanami, Woe be me