Sunday, 6 May 2012

@pathogenomics First look at Ion Torrent data: De novo assembly

Check out Nick's post on running an array of de novo assemblers to take their shot at doing de novo assembly of PGM data.

So – the results themselves. They aren't great – if you compare them with a 454 Titanium run or an Illumina paired-end run – but they are pretty good if you consider this is the first dataset available from the Ion Torrent, running with the lowest specification chip. In pure sequencing costs, it's a $500 bacterial whole-genome and the sequencing itself only takes 2 hours. So it's quite exciting.

And we must remember much of the software is not yet properly optimised for these data.

The assemblies are certainly good enough to predict and detect most of the genes in K-12. What I have not assessed yet is assembly quality, so the assemblers with the longest N50s may or may not be the best choice if you need accuracy (and you probably do).

I intend to take a close look at the extent of misassemblies, erroneous consensus base calls and issues with homopolymeric tracts in a future blog post.

So for now, the take home messages are:

  • bacterial de novo assembly is possible with Ion Torrent data, even with data from a single 314 chip
  • CLC Genomics Workbench is the fastest assembler and produces some of the longest contigs with default settings (but costs $5000 for a license)
  • Newbler also produces long contigs but takes days to run for some reason
  • MIRA and Ray are both promising open-source options

No comments:

Post a Comment

Datanami, Woe be me