Wednesday, 15 August 2012

Sequencing low diversity libraries on Illumina MiSeq

"Move over 454

Sadly, the pace of development of the 454 platform has stagnated in recent years following the Titanium upgrade in 2008. The long-promised upgrade to GS FLX+ "1kb reads" was late and under-delivered with reads more like 700-800 bases, and some users have reported dissatisfaction with the upgrade. Disappointingly the long read protocol is not supported when running unidirectional Lib-A sequencing, dramatically limiting its potential market. Nor is it available on the benchtop 454 GS Junior, although this may change in future.

But most critical is the apparent blind spot of Roche management to the rapidly dropping costs of sequencing on competitor platforms. The 454 has simply priced itself out of the market by being one to two orders of magnitude more expensive when costed per megabase compared to the Illumina and Life Technologies platforms.

New Platforms for Amplicon Sequencing

So for microbiologists wishing to do 16S sequencing, whether they are driven by cost-cutting, or by a desire to sequence more samples more deeply, it is now time to look around at alternatives. The MiSeq and the PGM are both promising platforms for 16S analysis given their competitive price points, and increasingly long reads (MiSeq 2x150bp, PGM 200bp – going to 2x250bp and 400bp respectively by the end of the year).

Sequencing low diversity libraries on Illumina MiSeq

We are moving to the Illumina MiSeq locally for 16S sequencing. For about £750 we generate over 5 million reads per run. By using paired-end sequencing at 150 bases we can design experiments which generate amplicons a little less than 300 bases and overlap them to generate long pseudo-reads. The error model is favourable compared to 454 as it does not suffer from frequent indel errors, meaning there is less need for expensive denoising steps such as PyroNoise.

However, there is a fly in the ointment. Amplicon sequencing on the Illumina platform has traditionally been problematic when sequencing so-called "low diversity" libraries such as 16S, resulting in low yields and lower per-base quality scores compared to sequencing more random libraries, e.g. from genomic DNA.

The good folks of Seqanswers have discussed this at length, and various work-arounds have been suggested. One commonly used approach is to spike in a genomic, higher-diversity sample, e.g. PhiX. The more PhiX spiked in, the better the results, but at the expense of the number of amplicon sequences generated. A second option is to add a sequence of N bases upstream of the 16S primer, resulting in the generation of random sequences. This however reduces the effective read length. "

No comments:

Post a Comment

Datanami, Woe be me