Showing posts with label SOLiD. Show all posts

Tuesday, 6 December 2011

Complete Khoisan and Bantu genomes from southern Africa : Article : Nature

http://www.nature.com/nature/journal/v463/n7283/full/nature08795.html

Just attended a very good lecture by Stephan Schuster, entitled "African Genomes: Charting Human Diversity"

He offered unbiased views / charts on the platform differences between 454, GAIIx, HiSeq, SOLiD for NGS sequencing coverage (which I think I should not repeat here). It points to the need to do sequencing on 2 different platforms to get a more accurate SNP list.

He also gave compelling reasons for getting a 20x coverage of Human genome done in 454 to complete the human genome (457 gaps in hg19).

Yes, I often forget that the media / lay person thinks that the human genome is 'complete'. It's a often ignored 'secret' that actually it isn't. Maybe the next marketing ploy(i mean strategy) for emerging sequencing platforms would be to be THE ONE that actually finishes the human genome.

Saturday, 4 June 2011

Posting of Ion Torrent protocols online is a violation of Terms and Conditions

http://seqanswers.com/forums/showthread.php?t=10400

Just got to know of this rather disturbing fact that seqanswers admins were informed (nicely) to take down online posted protocols for the Ion Torrent.

I wished to post the adaptor sequences for RNA multiplex libraries online before as a help for bioinformaticians that might have gotten their data from a service provider or have problems getting a prompt response from the ever friendly FAS. I mean if it's online, I need not bother them yeah?

Now I wonder if I might be violating terms and conditions somewhere out there.

I would argue for posting of protocols online.
Lab Protocols are meant to be optimised in every lab.
case in point? you promote active discussion on the product and once you have that, it is an active support community that beats a whole army of FAS with trained responses to problems in protocols.
see this imaginary conversation

Researcher A: making that incubation step longer for 10 secs improves your yield? good for you! but it didn't work for me, any advice on where else I can do it?
Researcher B: yeah sure, you see page 15 step 8A ? don't over do that step as it affects yield but be warned it might affect the quality of the final output but let's solve one problem at a time. . I tried that last week!

Friday, 4 March 2011

Guide/tutorial for the analysis of RNA-seq data

link in seqanswers

Excellent starting point for those confused about the RNA-seq data analysis procedure.

Hello,

I've written a guide to the analysis of RNA-seq data, for the purpose of differential expression analysis. It currently lives on our internal wiki that can't be viewed outside of our division, although printouts have been used at workshops. It is by no means perfect and very much a work in progress, but a number of people have found it helpful, so I thought it would useful to have it somewhere more publicly accessible.

I've attached a pdf version of the guide, although really what I was hoping was that someone here could suggest somewhere where it could be publicly hosted as a wiki. This area is so multifaceted and fast-moving that the only way such a guide can remain useful is if it can be constantly extended and updated.

If anyone has any suggestions about potential hosting, they can contact me at myoung @wehi.edu.au

Cheers

Matt

Update: I've put a few extra things on our local Wiki and seeing as people here seem to be finding this useful I thought I'd post an updated version. I'm also an author on a review paper on Differential Expression using RNA-seq which people who find the guide useful, might also find relevant...

RNA-seq Review

Saturday, 15 January 2011

DNAvision offers Human WGS for 7,500 euros

I had imagined exome sequencing would still have a good run for the next 2-3 years but seeing how commercial service providers are throwing caution to the wind and offering Whole genome sequencing at ever decreasing costs, I think many will soon revert to WGS instead. Exome sequencing kits will have a lot to catch up in terms of price and useful data if they are to match up with quickly plummeting WGS prices.
DNAVision to Offer $10K Human Genome Sequencing Services; Purchases Four SOLiDs
Landed: First Illumina HiSeq Machines Advertised (By Nick Loman on February 10, 2010)

Tuesday, 16 November 2010

Uniqueome a uniquely ... omics word

Spotted this post on the Tree of Life blog

Another good paper, but bad omics word of the day: uniqueome

From "The Uniqueome: A mappability resource for short-tag sequencing

Ryan Koehler, Hadar Issac , Nicole Cloonan,*, and Sean M. Grimmond." Bioinformatics (2010) doi: 10.1093/bioinformatics

Paper does look interesting though!

Summary: Quantification applications of short-tag sequencing data (such as CNVseq and RNAseq) depend on knowing the uniqueness of specific genomic regions at a given threshold of error. Here we present the “uniqueome”, a genomic resource for understanding the uniquely mappable proportion of genomic sequences. Pre-computed data is available for human, mouse, fly, and worm genomes in both color-space and nucletotide-space, and we demonstrate the utility of this resource as applied to the quantification of RNAseq data.

Availability: Files, scripts, and supplementary data is available from http://grimmond.imb.uq.edu.au/uniqueome/; the ISAS uniqueome aligner is freely available from http://www.imagenix.com/

Exome Sequencing hints at sporadic mutations as cause for mental retardation.

1st spotted off Genomeweb
NEW YORK (GenomeWeb News) – De novo mutations that spring up in children but are absent in their parents are likely the culprit in many unexplained cases of mental retardation, according to a Dutch team.
Using exome sequencing in 10 parent-child trios, the researchers found nine non-synonymous mutations in children with mental retardation that were not found in their parents, including half a dozen mutations that appear to be pathogenic. The research, which appeared online yesterday in Nature Genetics, hints at an under-appreciated role for sporadic mutations in mental retardation — and underscores the notion that mental retardation can stem from changes in a wide variety of genes.

I think it's fascinating to find so many new mutations and changes in DNA that may affect one's quality of life, simply by sequencing the coding regions (and not all of it if I may add). This paper is fascinating as it raises the question if deleterious sporadic mutations are unlikely culprits for a whole variety of diseases that have a genetic risk.
it is certainly more likely that such an event will occur in coding regions but I do not doubt that for some diseases, perhaps the non-coding regions (that play a regulatory role) might have the same effect. If it was a clear cut mutation that results in a dysfunctional protein, and there's no redundancy in the system, it is likely the system will crash. whereas, if it was changes in the expression levels, it might lead to a slightly wobbly system that just doesn't function as well.

While everyone else is waiting for Whole Genome Sequencing to drop in price. There are groups already publishing with exome data. I think in 6 months time, we will see more WGS papers coming up... It's an exciting time for Genomics science!

See the full paper below

A de novo paradigm for mental retardation Nature Genetics | Letter

Tuesday, 9 November 2010

SOLiD™ BioScope™ Software v1.3 releasing soon

v1.3 is due for release soon! How do I know other than the fact that you can register for v1.3 video tutorials , e.g. SOLiD™ Targeted ReSeq Data Analysis featuring BioScope 1.3 (1 hour)
The clue comes from new documentation that is being uploaded on to solidsoftwaretools.com.

BioScope™ Software v1.3 adds/enhances support for following:

Targeted Resequencing analysis (enrichment statistics and target
filtering)
BFAST integration
Annotation, reporting and statistics generation
Methylation analysis
75 bp read length support
Mapping and Pairing speed improvements

It also fixes a long list of bugs I won't repeat all of them here.
But the important ones are

Bug – Pairing: In BAM file, readPaired and firstOfPair/secondOfPair flags set incorrectly for reads with missing mates.
Bug – diBayes: Defunct java processes continue when bioscope exits
Bug – Mapping: When the last batch of the processing has the number of reads less than the value of the key mapping.np.per.node, the ma file contains duplicated entries.

Have fun playing with the new version when it's up!
here's some impt notes:

It is advised that a user runs BioScope using the user’s own user
account. Then if Control-C is used to interrupt bioscope.sh which
spawns many other processes, user can use following OS commands
to find the pid of the left-over processes, and clean them up.
ps –efl | grep bioscope.sh | grep username
ps –efl | grep java_app.sh | grep username
ps –efl | grep map | grep username
ps –efl | grep java | grep username
ps –efl | grep mapreads | grep username
ps –efl | grep pairing | grep username
kill -9 PID

Oh but I would use the command highlighted in bold carefully as basically it kills all process that have the name java in it

My suggestion to the team is to have a db table to keep the PID of launched processes instead of depending on non-unique names. Ensembl's pipeline uses perl with less overhead to track jobs and it is much cleaner to clear up.

Monday, 8 November 2010

Trimming adaptor seq in colorspace (SOLiD)

Needed to do research on small RNA seq using SOLiD.
Wasn't clear of the adaptor trimming procedure (its dead easy in basespace fastq files but oh well, SOLiD has directionality and read lengths dont' really matter for small RNA)

novoalign suggests the use of cutadapt as a colorspace adaptor trimming tool
was going to script one in python if it didn't exist
Check their wiki page

Sadly on CentOS I most probably will get this

If you get this error:

   File "./cutadapt", line 62
    print("# There are %7d sequences in this data set." % stats.n, file=outfile)
                                                                       ^
SyntaxError: invalid syntax

Then your Python is too old. At least Python 2.6 is needed for cutadapt.

have to dig up how to have two versions of Python on a CentOS box..

Wednesday, 3 November 2010

Life Technologies Launches New SOLiD Sequencer to Drive Advances in Cancer Biology and Genetic Disease Research

It's official! The web is crawling with the news reports. Read their press release here
My previous coverage on the preview launch is here
There's a discussion in the seqanswers forum on the new machine.

The Life Tech cmsXXX.pdfs with the useful specs are out too. you can google them or search on the website
The specs
solid.appliedbiosystems.com/solid5500

Monday, 25 October 2010

AB releases 4 HQ and PI as 5500xl and 5500 SOLiD

Was lucky to be part of the 1st group to view the specs and info on the new SOLiD 4 hq.
For reasons unknown,
They have renamed it to 5500XL and 5500 solid system which is your familiar 4 HQ and PI
Or if you prefer formulas.
5500xl = 4 hq
5500 = PI

One can only fathom their obession with these 4 digits judging by similar instruments named
AB Sciex Triple Quad 5500 and the AB Sciex QTrap 5500

Honestly the 5500 numbers are of no numerical significance AFAIK.

outlook wise both looks like the PI system
I DO NOT see the computer cluster anymore, that's something I am curious about.

Finally we are at 75 bp though.
Of notable importance, there is a new Exact Call Chemistry module (ECC) which promises 99.99% accuracy which is optional as it increases the run time.
the new solid system is co-developed with the Hitachi-Hi Technologies.
Instead of the familiar slides, they use 'flowchips' now. with 6 individual lanes to allow for more mixing of samples of different reads.
for the 5500xl
throughput per day is 20-30 Gb
per run you have 180 Gb or 2.8 B tags (paired ends or mate pairs)

Contrary to most rumours, 5500xl is upgradeable from SOLiD 4 although I suspect it is a trade in program. No mention about the 5500 (which i guess is basically a downgrade).

The specs should be up soon
solid.appliedbiosystems.com/solid5500

Update from seqanswers from truthseqr
http://seqanswers.com/forums/showthread.php?t=6761&goto=newpost

Here is the message that has just been posted:
***************
AB is premiering two new instruments at ASHG next week.

Mobile ASHG calendar: http://m.appliedbiosystems.com/ashg/ (http://solid.community.appliedbiosystems.com/)

Twitter account: @SOLiDSequencing (http://twitter.com/SOLiDSequencing)

SOLiD Community: http://solid.community.appliedbiosystems.com/

More info soon at: solid.appliedbiosystems.com/solid5500/ (http://solid.appliedbiosystems.com/solid5500)

Tuesday, 12 October 2010

SRMA: tool for Improved variant discovery through local re-alignment of short-read next-generation sequencing data

Have a look at this tool http://genomebiology.com/2010/11/10/R99/abstract
it is a realigner for NGS reads, that doesn't use a lot of ram. Not too sure how it compares to GATK's Local realignment around indels as it is not mentioned. but the authors used reads that were aligned with the popular BWA or BFAST as input. (Bowtie was left out though.)

Excerpted

SRMA was able to improve the ultimate variant calling using a variety of measures on the simulated data from two different popular aligners (BWA and BFAST. These aligners were selected based on their sensitivity to insertions and deletions (BFAST and BWA), since a property of SRMA is that it produces a better consensus around indel positions. The initial alignments from BFAST allow local SRMA re-alignment using the original color sequence and qualities to be assessed as BFAST retains this color space information. This further reduces the bias towards calling the reference allele at SNP positions in ABI SOLiD data, and reduces the false discovery rate of new variants. Thus, local re-alignment is a powerful approach to improving genomic sequencing with next generation sequencing technologies. The alignments to the reference genome were implicitly split into 1Mb regions and processed in parallel on a large computer cluster; the re-alignments from each region were then merged in a hierarchical fashion. This allows for the utilization of multi-core computers, with one re-alignment per core, as well as parallelization across a computer cluster or a cloud. The average peak memory utilization per process was 876Mb (on a single-core), with a maximum peak memory utilization of 1.25GB. On average, each 1Mb region required approximately 2.58 minutes to complete, requiring approximately 86.17 hours total running time for the whole U87MG genome. SRMA also supports re- alignment within user-specified regions for efficiency, so that only regions of interest need to be re-aligned. This is particularly useful for exome-sequencing or targeted re-sequencing data.

Monday, 11 October 2010

Installing SOLiD™ System de novo Accessory Tools 2.0 with Velvet and MUMmer

howto install on CentOS 5.4

wget http://solidsoftwaretools.com/gf/project/denovo/ #just to keep a record
wget http://www.ebi.ac.uk/~zerbino/velvet/velvet_0.7.55.tgz
wget http://downloads.sourceforge.net/project/mummer/mummer/3.22/MUMmer3.22.tar.gz
tar zxvf denovo2.tgz
cp velvet_0.7.55.tgz denovo2 #you can use mv if you don’t mind downloading again
cp MUMmer3.22.tar.gz denovo2
cd denovo2
tar zxvf velvet_0.7.55.tgz
tar zxvf MUMmer3.22.tar.gz
cd MUMmer3.22/src/kurtz #this was the part where I deviated from instructions
gmake mummer #Might be redundant but running gmake at root dir gave no binary
gmake |tee gmake-install.log

Next step:
download the example data to run through the pipeline
http://solidsoftwaretools.com/gf/project/ecoli50x50/
http://download.solidsoftwaretools.com/denovo/ecoli_600x_F3.csfasta.gz
http://download.solidsoftwaretools.com/denovo/ecoli_600x_F3.qual.gz
http://download.solidsoftwaretools.com/denovo/ecoli_600x_R3.csfasta.gz
http://download.solidsoftwaretools.com/denovo/ecoli_600x_R3.qual.gz

Description
This is a 50X50 Mate pair library from DH10B produced by SOLiD™ system.The set includes .csfasta and .qual files for F3 and R3. The insert size of the library is 1300bp and it is about 600X coverage of the DH10B genome. The results from MP library in the de novo documents are generated from this dataset.

YMMV

pitfalls for SAET for de novo assembly

Spotted in manual for

SOLiD™ System de novo Accessory Tools 2.0

Usage of pre-assembly error correction: This is an optional tool which was
demonstrated to increase contigs length in de novo assembly by factor of 2 to 3. Do not use this tool if coverage is less than 20x. Overcorrection and under-correction are equally bad for de novo assembly; therefore use balanced number of local and global rounds of error correction. For example, the pipeline will use 1 global and 3 local rounds if reads are 25bp long, and 2 global and 5 local rounds if reads are 50bp long.

Is it just me? I would think it is trivial to implement the correction tool to correct only when the coverage is > 20x. Not sure why you would need human intervention.

Tuesday, 21 September 2010

Whole transcriptome of a single cell using NGS

I think the holy grail of gene profiling has to be single molecule sequencing from a single cell. Imagine the amount of de novo transcriptomics projects that will spring up when that becomes an eventuality!

Posted by Alison Leon on Aug 12, 2010 2:00:42 PM

Are you struggling with conducting gene expression analysis from limited sample amounts? Or perhaps you're trying to keep up with new developments in stem cell research. If so, you might be interested in a recent Cell Stem Cell publication discussing single-cell RNA-Seq analysis (Tang et al., Cell Stem Cell, 2010). In their research, Tang and colleagues trace the conversion of mouse embryonic stem cells from the inner cell mass (ICM) to pluripotent embryonic stem cells (ESCs), revealing molecular changes in the process. This is a follow-on to two previous papers, in which the proof of concept (Tang et al., Nature Methods, 2009) and protocols (Tang et al., Nature Protocols, 2010) for these experiments were detailed.

Using the SOLiD™ System (16-plex, 50-base reads) and whole transcriptome software tools, researchers performed whole transcriptome analysis at the single-cell level (an unprecedented resolution!) to determine gene expression levels and identify novel splice junctions. 385 genes in 74 individual cells were monitored during their transition from ICM to ESC. Validation with TaqMan® assays corroborated this method’s high sensitivity and reproducibility.

According to Tang et al., this research could form the basis for future studies involving regulation and differentiation of stem cells in adults. In addition, further knowledge about developing stem cells could lead to information about how disease tissues, including cancers, develop.

Wednesday, 25 August 2010

howto do BWA mapping in colorspace

Here's what I use for bwa alignment (without removing PCR dups).
You can replace the paths with your own and put into a bash script for automation
comments or corrections welcome!

#Visit kevin-gattaca.blogspot.com to see updates of this template!
#http://kevin-gattaca.blogspot.com/2010/08/howto-do-bwa-mapping-in-colorspace.html
#updated 16th Mar 2011
#Creates colorspace index
bwa index -a bwtsw -c hg18.fasta

#convert to fastq.gz
perl /opt/bwa-0.5.7/solid2fastq.pl Sample-input-prefix-name Sample

#aln using 4 threads
#-l 25        seed length
#-k 2         mismatches allowed in seed
#-n 10      total mismatches allowed

bwa aln -c -t 4 -l 25 -k 2 -n 10 /data/public/bwa-color-index/hg18.fasta Sample.single.fastq.gz > Sample.bwa.hg18.sai

#for bwa samse
bwa samse /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sai Sample.single.fastq.gz > Sample.bwa.hg18.sam

#creates bam file from pre-generated .fai file

samtools view -bt /data/public/hg18.fasta.fai -o Sample.bwa.hg18.sam.bam Sample.bwa.hg18.sam

#sorts bam file

samtools sort Sample.bwa.hg18.sam.bam{,.sorted}

#From a sorted BAM alignment, raw SNP and indel calls are acquired by:

samtools pileup -vcf /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup

#resultant output should be further filtered by:

/opt/samtools/misc/samtools.pl varFilter Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup | awk '$6>=20' > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup.final.pileup

#new section using mpileup and bcftools to generate vcf files
samtools mpileup -ugf hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

Do note the helpful comments below! Repost here for clarity.

Different anon here. But try -n 3 and -e 10 and see how that works for you. Then filter out low quality alignments (MAPQ < 10) before you do any variant calling.

Also, depending on your task, you might consider disabling seeding altogether to get an even more sensitive alignment. -l 1000 should do that.

Also:

1) bwa is a global aligner with respect to reads, so consider trimming low-quality bases off the end of your reads with "bwa aln -q 10".

2) For user comprehension, it's easier if you replace "samtools view -bt /data/public/hg18.fasta.fai ..." with "samtools view -bT /data/public/hg18.fasta ..."

The T option handles reference files directly rather than having to deal with a .fai index file (which you haven't told people how to create in this guide).

2) Use "samtools view -F 4 -q 10" to get rid of unaligned reads (which are still in double-encoded color space) and dodgy alignments.

3) Use "samtools calmd" to correct MD and NM tags. (However, I'm not sure if this is necessary/helpful.)

4) Use Picard's SortSam and MarkDuplicates to take care of PCR duplicates.

5) View the alignments with samtools tview.

Monday, 16 August 2010

Annovar:a easy way to automate variant reduction procedure AKA exome sequencing or whole genome sequencing

The authors' description of the software is that
"ANNOVAR is is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes."

whilst I am not a good enough programmer to comment on how efficient the code is, in the context of a 'why reinvent the wheel' ANNOVAR is definitely an efficient way to get started on doing exome / whole genome resequencing for variant discovery.

basically it is a collection of perl scripts that can do
1)take in variant information from popular tools like samtools-pileup , Complete Genomics, GFF3-SOLID (?), etc.
2) Do Annotation that is

The other nice thing is that the download already comes with excellent examples that you would be able to get going fast.
there's also annotation datasets avail for download from the developers
I only see hg18 for now though.

This page is a nice summary for beginners doing exome / whole genome resequencing.

note:
UPDATE (2010Aug11): Users have reported a bug (double-counting of splicing) in the auto_annovar.pl program in the 2010Aug06 version. An updated version is provided here.

Wednesday, 14 July 2010

Shiny new tool to index NGS reads G-SQZ

This is a long over due tool for those trying to do non-typical analysis with your reads.
Finally you can index and compress your NGS reads

http://www.ncbi.nlm.nih.gov/pubmed/20605925

Bioinformatics. 2010 Jul 6. [Epub ahead of print]
G-SQZ: Compact Encoding of Genomic Sequence and Quality Data.

Tembe W, Lowey J, Suh E.

Translational Genomics Research Institute, 445 N 5th Street, Phoenix, AZ 85004, USA.
Abstract

SUMMARY: Large volumes of data generated by high-throughput sequencing instruments present non-trivial challenges in data storage, content access, and transfer. We present G-SQZ, a Huffman coding-based sequencing-reads specific representation scheme that compresses data without altering the relative order. G-SQZ has achieved from 65% to 81% compression on benchmark datasets, and it allows selective access without scanning and decoding from start. This paper focuses on describing the underlying encoding scheme and its software implementation, and a more theoretical problem of optimal compression is out of scope. The immediate practical benefits include reduced infrastructure and informatics costs in managing and analyzing large sequencing data. AVAILABILITY: http://public.tgen.org/sqz Academic/non-profit: Source: available at no cost under a non-open-source license by requesting from the web-site; Binary: available for direct download at no cost. For-Profit: Submit request for for-profit license from the web-site. CONTACT: Waibhav Tembe (wtembe@tgen.org).

read the discussion thread in seqanswers for more tips and benchmarks

I am not affliated with the author btw

the nuts and bolts behind ABI's SAET

I really do not like to use tools that I have no idea what they are trying to do.

ABI's SAET SOLiD™ Accuracy Enhancer Tool (SAET) is a one example that had extremely brief documentation except what it promised to do

The SOLiD™ Accuracy Enhancer Tool (SAET) uses raw data generated by SOLiD™ Analyzer to correct miscalls within reads prior to mapping or contig assembly.
Use of SAET, on various datasets of whole or sub-genomes of < 200 Mbp in size and of varying complexities, readlengths, and sequence coverages, has demonstrated improvements in mapping, SNP calling, and de novo assembly results.
For denovo applications, the tool reduces miscall rate substantially

Recently attended an ABI's talk and finally someone explained it in a nice diagram. It is akin to Softgenetic's condensation tool.( I made the link ). Basically, it groups reads by similarity and where they find a mismatch that is not supported by high quality reads they correct the low quality read to reach a 'consensus'. I see it as a batch correction of sequencing errors which one can typically do by eye (for small regions). This correction isn't without its flaws. I now understand why such an error correction isn't implemented on the instrument. And is presented as a user choice. My rough experience with this tool is that it increases mapping by ~ 10% how this 10% would affect your results is debatable.

Friday, 11 June 2010

Sunday, 30 May 2010

Cofactor genomics on the different NGS platforms

Original post here

They are a commercial company that offers NGS on ABI and Illumina platforms and since this is on their company page I guess its their official stand on what rocks on each platform

Excerpted.

Applied Biosystems SOLiD 3

The Applied Biosystems SOLiD 3 has the shortest but also the highest quantity of reads. The SOLiD produces up to 240 million 50bp reads per slide per end. As with the Illumina, Mate-Pairs produce double the output by duplicating the read length on each end, and the SOLiD supports a variety of insert lengths like the 454. The SOLiD can also run 2 slides at once to again double the output. SOLiD has the lowest *raw* base qualities but the highest processed base qualities when using a reference due to its 2-base encoding. Because of the number of reads and more advanced library types, we recommend the SOLiD for all RNA and bisulfite sequencing projects.

Solexa/Illumina

The Solexa/Illumina generates shorter reads at 36-75bp but produces up to 160 million reads per run. All reads are of similar length. The Illumina has the highest *raw* quality scores and its errors are mostly base substitutions. Paired-end reads with ~200 bp inserts are possible with high efficiency and double the output of the machine by duplicating the read length on each end. Paired-end Illumina reads are suitable for de novo assemblies, especially in combination with 454. The large number of reads makes the Illumina appropriate for de novo transcriptome studies with simultaneous discovery and quantification of RNAs at qRT-PCR accuracy.

Roche/454 FLX

The Roche/454 FLX with Titanium chemistry generates the longest reads (350-500bp) and the most contiguous assemblies, can phase SNPs or other features into blocks, and has the shortest run times. However, 454 also produces the fewest total reads (~1 million) at the highest cost per base. Read lengths are variable. Errors occur mostly at the ends of long same-nucleotide stretches. Libraries can be constructed with many insert sizes (8kb - 20kb) but at half of the read length for each end and with low efficiency.