- assembly of complex genomes (polyploid, containing excessive long repeat regions, etc.),
- accurate transcript assembly,
- metagenomics of complex communities,
- and phasing of long haplotype blocks.
the reason?
This latest set of data released on BaseSpace
Read length distribution of synthetic long reads for a D. melanogaster library |
The data set, available as a single project in BaseSpace, can be accessed here.
image source: http://blog.basespace.illumina.com/2013/07/22/first-data-set-from-fasttrack-long-reads-early-access-service/
with the integration of Moleculo they have managed to generate ~30 gb of raw sequence data. They have refrained from talking about 'key analysis metrics' that's available in the pdf report. Perhaps it's much easier to let the blogosphere and data scientists dissect the new data themselves.
Am wondering when the 454 versus Illumina Long Reads side-by-side comparison will pop up
UPDATE:
Can't find the 'key analysis metrics' in the pdf report files. Perhaps it's still being uploaded? *shrugs*so please update me if you see it otherwise I just have to run something on it
These are the files that I have now
total 512M
259M Jul 18 01:01 mol-32-2832.fastq.gz
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_281c.pdf
149K Jul 24 2013 mol-32-281c-scaffolds.txt
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_2832.pdf
151K Jul 24 2013 mol-32-2832-scaffolds.txt
253M Jul 24 2013 mol-32-281c.fastq.gz
md5sums
6845fc3a4da9f93efc3a52f288e2d7a0 FastTrackLongReads_dmelanogaster_281c.pdf
02f5de4f7e15bbcd96ada6e78f659fdb FastTrackLongReads_dmelanogaster_2832.pdf
586599bb7fca3c20ba82a82921e8ba3f mol-32-281c-scaffolds.txt
b25010e9e5e13dc7befc43b5dff8c3d6 mol-32-281c.fastq.gz
6822cfbd3eb2a535a38a5022c1d3c336 mol-32-2832-scaffolds.txt
873f09080cdf59ed37b3676cddcbe26f mol-32-2832.fastq.gz
I have ran FastQC (FastQC v0.10.1) on both samples the images below are from 281c.
you can download the full HTML report here
https://www.dropbox.com/sh/5unu3zba9u21ywj/JT4HdkzfOP/mol-32-281c_fastqc.zip
https://www.dropbox.com/s/mpxa5wx51iqmiz3/mol-32-2832_fastqc.zip
Reading about the Moleculo sample prep method, it seems like it's just a rather ingenious way to stitch short reads which are barcoded to form a single long contig. if that is the case, then I am not sure if the base quality scores here are meaningful anymore since it's a mini-assembly. Also this takes out any quantitative value of the number of reads I presume. So accurate quantification of long RNA molecules or splice variants isn't possible. Nevertheless it's an interesting development on the Illumina platform. Looking forward to seeing more news about it.
Other links
Illumina Long-Read Sequencing ServiceMoleculo technology: synthetic long reads for genome phasing, de novo sequencing
CoreGenomics: Genome partitioning: my moleculo-esque idea
Moleculo and Haplotype Phasing - The Next Generation TechnologistNext Generation Technologist
Abstract: Production Of Long (1.5kb – 15.0kb), Accurate, DNA Sequencing Reads Using An Illumina HiSeq2000 To Support De Novo Assembly Of The Blue Catfish Genome (Plant and Animal Genome XXI Conference)
http://www.moleculo.com/ (no info on this page though)
Illumina Announces Phasing Analysis Service for Human Whole-Genome Sequencing - MarketWatch
Illumina Announces Moleculo Long Read Technology and Phasing As Service
First publication using the Long Read Seq (LRseq) The genome sequence of the colonial chordate, Botryllus schlosseri | eLife Contains a diagram explaining the LRSeq protocol. This experiment yielded ~1000 6.3kb fragments
Patent information on the Long Read technologyFirst publication using the Long Read Seq (LRseq) The genome sequence of the colonial chordate, Botryllus schlosseri | eLife Contains a diagram explaining the LRSeq protocol. This experiment yielded ~1000 6.3kb fragments
https://docs.google.com/viewer?url=patentimages.storage.googleapis.com/pdfs/US20130079231.pdf