Showing posts with label whole transcriptome. Show all posts

Monday, 12 December 2011

How much coverage / throughput for my RNA-seq?

One of the earliest questions to bug anyone planning an RNA-seq experiment has to be the throughput (how many reads do I need?)

If you are dealing with human samples, you have the benefit of extensive publications with example coverages and some papers that test the limits of detection. All of this info is nicely summarised here in experimental design considerations in RNA-Seq.

Bashir et al. have concluded that more than 90% of the transcripts in human samples are adequately covered with just one million sequence reads. Wang et al. showed that 8 million reads are sufficient to reach RNA-Seq saturation for most samples

The ENCODE consortium also has published a Guidelines for Experiments within you can read RNA Standards v1.0 (May 2011) and also RNA-seq Best Practices (2009)

Experiments whose purpose is to evaluate the similarity between the
transcriptional profiles of two polyA+ samples may require only modest depths of
sequencing (e.g. 30M pair-end reads of length > 30NT, of which 20-25M are
mappable to the genome or known transcriptome, Experiments whose purpose is
discovery of novel transcribed elements and strong quantification of known
transcript isoforms requires more extensive sequencing. The ability to detect
reliably low copy number transcripts/isoforms depends upon the depth of
sequencing and on a sufficiently complex library.

RNA-seq blog also covers this issue in How Many Reads are Enough? Where they cited an article on RNA-seq in chicken lungs

The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes.

There are also papers that showed that RNA-seq gives reproducible results when sequenced from the same RNA-seq library which means that if coverage isn't enough, it is possible to sequence more using the same library and not have it affect your results. The real issue then becomes whether you have planned for additional sequencing with your budget.

References
Au, K.F., Jiang, H., Lin, L., Xing, Y. & Wong, W.H. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic acids research 38, 4570-4578 (2010).

Maher, C.A., Palanisamy, N., Brenner, J.C., Cao, X., Kalyana-Sundaram, S., Luo, S., Khrebtukova, I., Barrette, T.R., Grasso, C., Yu, J., Lonigro, R.J., Schroth, G., Kumar-Sinha, C. & Chinnaiyan, A.M. Chimeric transcript discovery by paired-end transcriptome sequencing. Proceedings of the National Academy of Sciences of the United States of America 106, 12353-12358 (2009).

Bashir, A., Bansal, V. & Bafna, V. Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance. BMC genomics 11, 385 (2010).

Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews 10, 57-63 (2009).

Wang Y, Ghaffari N, Johnson CD, Braga-Neto UM, Wang H, Chen R, Zhou H. (2011) Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens. BMC Bioinformatics Proceedings of the Eighth Annual MCBIOS Conference. Computational Biology and Bioinformatics for a New Decade, College Station, TX, USA. 1-2 April 2011. [article]

Friday, 12 August 2011

is 12 million 90 bp transcriptome reads enough for transcriptome assembly?

Posted a pubmed link recently, the authors "report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of the H. brasiliensis transcriptome. The sequencing output generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes (mean size = 436 bp, median size = 328 bp) were assembled through de novo transcriptome assembly."

Do you think such an assembly truly is useful for research? or would a higher coverage been better?

Tuesday, 12 July 2011

Pros and Cons of RNA-seq as spelt out by RNA-seq Blog

The Magic of RNA-Seq

from RNA-Seq Blog

Monday, 6 June 2011

Technical variability is too high to ignore | RNA-Seq Blog

ok this disturbing.
Will research more given time.
Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. Practical recommendations for dealing with the technical variability, without dramatic cost increases are provided. McIntyre, LM et. al. (2001) RNA-seq : technical variability and sampling. BMC Genomics [Epub ahead of print]. [article]
http://rna-seqblog.com/publications/technical-variability-is-too-high-to-ignore/

RNA-seq : technical variability and sampling.

BMC Genomics. 2011 Jun 6;12(1):293. [Epub ahead of print]

RNA-seq : technical variability and sampling.

McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ, Nuzhdin SV.

Abstract

ABSTRACT:

BACKGROUND:

RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript.

RESULTS:

In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage.

CONCLUSIONS:

Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.

PMID:: 21645359; [PubMed - as supplied by publisher]

Tuesday, 21 September 2010

Whole transcriptome of a single cell using NGS

I think the holy grail of gene profiling has to be single molecule sequencing from a single cell. Imagine the amount of de novo transcriptomics projects that will spring up when that becomes an eventuality!

Posted by Alison Leon on Aug 12, 2010 2:00:42 PM

Are you struggling with conducting gene expression analysis from limited sample amounts? Or perhaps you're trying to keep up with new developments in stem cell research. If so, you might be interested in a recent Cell Stem Cell publication discussing single-cell RNA-Seq analysis (Tang et al., Cell Stem Cell, 2010). In their research, Tang and colleagues trace the conversion of mouse embryonic stem cells from the inner cell mass (ICM) to pluripotent embryonic stem cells (ESCs), revealing molecular changes in the process. This is a follow-on to two previous papers, in which the proof of concept (Tang et al., Nature Methods, 2009) and protocols (Tang et al., Nature Protocols, 2010) for these experiments were detailed.

Using the SOLiD™ System (16-plex, 50-base reads) and whole transcriptome software tools, researchers performed whole transcriptome analysis at the single-cell level (an unprecedented resolution!) to determine gene expression levels and identify novel splice junctions. 385 genes in 74 individual cells were monitored during their transition from ICM to ESC. Validation with TaqMan® assays corroborated this method’s high sensitivity and reproducibility.

According to Tang et al., this research could form the basis for future studies involving regulation and differentiation of stem cells in adults. In addition, further knowledge about developing stem cells could lead to information about how disease tissues, including cancers, develop.

Kevin's GATTACA World