Wednesday 24 July 2013

Illumina produces 3k of 8500 bp reads on HiSeq using Moleculo Technology

Keith blogged about how super long read sequencing methods would be a threat to Illumina in Jan 2013. Today, Illumina can now openly acknowledge the shortcomings of their short reads for various applications like
  • assembly of complex genomes (polyploid, containing excessive long repeat regions, etc.), 
  • accurate transcript assembly, 
  • metagenomics of complex communities, 
  • and phasing of long haplotype blocks.


the reason?
This latest set of data released on BaseSpace
Read length distribution of synthetic long reads for a D. melanogaster library
The data set, available as a single project in BaseSpace, can be accessed here.

image source: http://blog.basespace.illumina.com/2013/07/22/first-data-set-from-fasttrack-long-reads-early-access-service/

with the integration of Moleculo they have managed to generate ~30 gb of raw sequence data. They have refrained from talking about 'key analysis metrics' that's available in the pdf report. Perhaps it's much easier to let the blogosphere and data scientists dissect the new data themselves.

Am wondering when the 454 versus Illumina Long Reads side-by-side comparison will pop up

UPDATE:

Can't find the 'key analysis metrics' in the pdf report files. Perhaps it's still being uploaded? *shrugs*
so please update me if you see it  otherwise I just have to run something on it


These are the files that I have now

total 512M
 259M Jul 18 01:01 mol-32-2832.fastq.gz
  44K Jul 24  2013 FastTrackLongReads_dmelanogaster_281c.pdf
 149K Jul 24  2013 mol-32-281c-scaffolds.txt
  44K Jul 24  2013 FastTrackLongReads_dmelanogaster_2832.pdf
 151K Jul 24  2013 mol-32-2832-scaffolds.txt
 253M Jul 24  2013 mol-32-281c.fastq.gz

md5sums
6845fc3a4da9f93efc3a52f288e2d7a0  FastTrackLongReads_dmelanogaster_281c.pdf
02f5de4f7e15bbcd96ada6e78f659fdb  FastTrackLongReads_dmelanogaster_2832.pdf
586599bb7fca3c20ba82a82921e8ba3f  mol-32-281c-scaffolds.txt
b25010e9e5e13dc7befc43b5dff8c3d6  mol-32-281c.fastq.gz
6822cfbd3eb2a535a38a5022c1d3c336  mol-32-2832-scaffolds.txt
873f09080cdf59ed37b3676cddcbe26f  mol-32-2832.fastq.gz


I have ran FastQC (FastQC v0.10.1) on both samples the images below are from 281c.
you can download the full HTML report here
https://www.dropbox.com/sh/5unu3zba9u21ywj/JT4HdkzfOP/mol-32-281c_fastqc.zip
https://www.dropbox.com/s/mpxa5wx51iqmiz3/mol-32-2832_fastqc.zip

Reading about the Moleculo sample prep method, it seems like it's just a rather ingenious way to stitch short reads which are barcoded to form a single long contig. if that is the case, then I am not sure if the base quality scores here are meaningful anymore since it's a mini-assembly. Also this takes out any quantitative value of the number of reads I presume. So accurate quantification of long RNA molecules or splice variants isn't possible. Nevertheless it's an interesting development on the Illumina platform. Looking forward to seeing more news about it.













Other links

Illumina Long-Read Sequencing Service
Moleculo technology: synthetic long reads for genome phasing, de novo sequencing
CoreGenomics: Genome partitioning: my moleculo-esque idea
Moleculo and Haplotype Phasing - The Next Generation TechnologistNext Generation Technologist
Abstract: Production Of Long (1.5kb – 15.0kb), Accurate, DNA Sequencing Reads Using An Illumina HiSeq2000 To Support De Novo Assembly Of The Blue Catfish Genome (Plant and Animal Genome XXI Conference)
http://www.moleculo.com/ (no info on this page though)
Illumina Announces Phasing Analysis Service for Human Whole-Genome Sequencing - MarketWatch
Patent information on the Long Read technology
https://docs.google.com/viewer?url=patentimages.storage.googleapis.com/pdfs/US20130079231.pdf









Friday 5 July 2013

Windows 8.1 Preview ISOs are available for download

Ah! I didn't know there's Windows 8.1
The ISOs should be helpful if you wish to 'futureproof' your spanking new application in the latest windows or test exisiting apps to see if they might break in the new win8.1

and *cough*usingtheisosasVMsinyourpreferredLinuxenvbutyoukindaneedawindozemachinetodothosetasksthatyoucan'tdoinlinuxcosotherprogrammershaven'theardofbuildingformultiplatformmachines*cough*

well another good reason to use it is that I am pretty sure this ain't happening in Mac or Linux

Microsoft is adding native support for 3D printing as part of the Windows 8.1 update, making it possible to print directly from an app to a 3D printer. The company is announcing the new feature this morning, working with partners including MakerBot Industries, 3D Systems, Afinia, AutoDesk, Netfabb and others.
http://www.geekwire.com/2013/dimension-windows-microsoft-adds-3d-printing-support/


:)

Go http://msdn.microsoft.com/en-us/windows/apps/bg182409 now!
loving the 1.5 Mb/s download here

Wednesday 3 July 2013

The $1000 myth | opiniomics

*chuckles* I guess only people outside of sequencing needs to be educated on this .. 


Now, it's possible Broad, BGI, Sanger etc can get below $1000 for the reagents due to sheer economies of scale and special deals they have with sequencing companies – but then remember they have to add in those extra charges (2-5) above.

Obviously, Illumina don't charge themselves list price for reagents, and nor do LifeTech, so it's possible that they themselves can sequence 30x human genomes and just pay whatever it costs to make the reagents and build the machines; but this is not reality and it's not really how sequencing is done today.  These guys want to sell machines and reagents, they don't want to be sequencing facilities, plus they still have to pay the staff, pay the bills, make a profit and return money to investors.

http://biomickwatson.wordpress.com/2013/06/18/the-1000-myth/

Tuesday 2 July 2013

A cloud platform for virtual screening?

Chanced on a blog post inspired by https://www.quantconnect.com/ . Overington suggests the possibility of adopting the model for virtual screening informatics.

For our field I think a SV detection platform would be most 'lucrative'. I would argue that SVs hold an even more important position in contributing to human diversity and disease than the relatively-easier-to-characterize SNPs. I think that  workflow/pipelines might be able to provide a basic framework to build on for people to try to tackle characterization of SVs across multiple sources of sequence data that's publicly available. The problem is who will host the data and platform


The ChEMBL-og - Open Data For Drug Discovery: A cloud platform for virtual screening?Anyway, have a look, it looks really nice, and got me thinking that it would be a great model for a virtual screening/drug repositioning informatics biotech, where the platform 'brokers' a tournament of approaches coded against data held on a data/compute cloud

Datanami, Woe be me