Monday, 12 December 2011

How much coverage / throughput for my RNA-seq?

One of the earliest questions to bug anyone planning an RNA-seq experiment has to be the throughput (how many reads do I need?)

If you are dealing with human samples, you have the benefit of extensive publications with example coverages and some papers that test the limits of detection. All of this info is nicely summarised here in experimental design considerations in RNA-Seq.

Bashir et al. have  concluded that more than 90% of the transcripts in human samples are adequately covered with just one million  sequence reads.  Wang et al. showed that 8 million reads are sufficient to reach RNA-Seq saturation for most  samples

The ENCODE consortium also has published a Guidelines for Experiments within you can read RNA Standards v1.0 (May 2011) and also RNA-seq Best Practices (2009)

Experiments whose purpose is to evaluate the similarity between the
transcriptional profiles of two polyA+ samples may require only modest depths of
sequencing (e.g. 30M pair-end reads of length > 30NT, of which 20-25M are
mappable to the genome or known transcriptome, Experiments whose purpose is
discovery of novel transcribed elements and strong quantification of known
transcript isoforms requires more extensive sequencing. The ability to detect
reliably low copy number transcripts/isoforms depends upon the depth of
sequencing and on a sufficiently complex library.

RNA-seq blog also covers this issue in How Many Reads are Enough? Where they cited an article on RNA-seq in chicken lungs

The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes.

There are also papers that showed that RNA-seq gives reproducible results when sequenced from the same RNA-seq library which means that if coverage isn't enough, it is possible to sequence more using the same library and not have it affect your results. The real issue then becomes whether  you have planned for additional sequencing with your budget.

Au, K.F., Jiang, H., Lin, L., Xing, Y. & Wong, W.H. Detection of splice junctions from paired-end RNA-seq data by  SpliceMap. Nucleic acids research 38, 4570-4578 (2010).

Maher, C.A., Palanisamy, N., Brenner, J.C., Cao, X., Kalyana-Sundaram, S., Luo, S., Khrebtukova, I., Barrette, T.R.,  Grasso, C., Yu, J., Lonigro, R.J., Schroth, G., Kumar-Sinha, C. & Chinnaiyan, A.M. Chimeric transcript discovery by  paired-end transcriptome sequencing. Proceedings of the National Academy of Sciences of the United States of America   106, 12353-12358 (2009).

Bashir, A., Bansal, V. & Bafna, V. Designing deep sequencing experiments: detecting structural variation and estimating  transcript abundance. BMC genomics 11, 385 (2010).

Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews 10, 57-63 (2009).

Wang Y, Ghaffari N, Johnson CD, Braga-Neto UM, Wang H, Chen R, Zhou H. (2011) Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens. BMC Bioinformatics Proceedings of the Eighth Annual MCBIOS Conference. Computational Biology and Bioinformatics for a New Decade, College Station, TX, USA. 1-2 April 2011. [article

No comments:

Post a Comment

Datanami, Woe be me