Kevin's GATTACA World: next-generation

Showing posts with label next-generation_sequencing. Show all posts

Sunday, 11 September 2011

Differential expression in RNA-seq: A matter of d... [Genome Res. 2011] - PubMed - NCBI

http://www.ncbi.nlm.nih.gov/pubmed/21903743
Abstract Next Generation Sequencing (NGS) technologies are revolutionizing genome research and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established and additional research is needed for understanding how these data respond to differential expression analysis. In this work we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach -NOISeq-that differs from existing methods in that it is data-adaptive and non-parametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the dataset and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data and the issue of replication.

PMID: 21903743 [PubMed -as supplied by publisher]

Saturday, 30 April 2011

Evaluation of next-generation sequencing software in mapping and assembly.

Evaluation of next-generation sequencing software in mapping and assembly.

J Hum Genet. 2011 Apr 28;

Authors: Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ

Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.Journal of Human Genetics advance online publication, 28 April 2011; doi:10.1038/jhg.2011.43.

PMID: 21525877 [PubMed - as supplied by publisher]

More...

Monday, 7 March 2011

Next-Generation Sequencing without a Reference: Interview w/ Frank You of U.C. Davis/USDA-ARS

Chanced upon this interview in the SOLiD community but the thread was missing two days after. Oh well here's the google cache link
P.S. if there's a valid reason why it was taken off please inform me, will do the same..
Update: the page is up again

Recently we had the chance to talk with Frank You from the Department of Plant Sciences at U. C. Davis and the Genomics and Gene Dsicovery Research Unit of the USDA-ARS about his publication in BMC Genomics, Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence.*

You are discovering genome wide SNPs in plant species. What is the reason you want to discover these SNPs?

Genome-wide SNPs are important resources for marker-assisted selection in breeding and high-dense genetic mapping construction which is required for map-based cloning and whole genome sequencing. In our study, we discovered genome-wide SNPs in Aegilops tauschii, the diploid ancestor of the wheat D genome, with a genome size of 4.02 Gb, of which 90% is repetitive sequences. We are using these SNPs to construct a high-dense genetic map for wheat D genome sequencing.

Please briefly explain the challenges you faced, and the solution you came to, in order to do this in the absence of a reference.

We have no reference sequences available for Ae. tauschii SNP discovery. For short reads generated by next-generation sequencing platforms, especially SOLiD™ and Solexa, the major challenge is mapping errors, when short reads in one genotype are mapped to short reads in another genotype in highly repetitive, complex genomes. Our idea is to reduce the complexity of the genome.

It is assumed that most genes are in a single-copy dose in a genome, and sequences of duplicated genes are usually diverged to such an extent that most of their reads do not cluster together. Therefore, the read depth (number of reads of the same nucleotide position) mapped to coding sequences of known genes estimates the expected read depth of all single-copy sequences in a genome. Sequences showing greater read depth are assumed to be from duplicated or repeated sequences. To implement this rationale, shallow genome coverage by long Roche 454 sequences is used to identify genic sequences by homology search against gene databases. Multiple genome coverages of short SOLiD™ or Solexa sequences are then used to estimate the read depth of genic sequences in a population of SOLiD™ or Solexa reads. The estimate is in turn used to identify (annotate) the remaining single-copy Roche 454 reads.

This combination of Roche 454 and SOLiD™ or Solexa platforms combines the long length of Roche 454 reads with the high coverage of the SOLiD™/Solexa sequencing platforms, thus reducing costs associated with the development of reference sequence. Short SOLiD™ or Solexa reads are mapped and aligned to the Roche 454 reads or contigs with short-read mapping tools. After the annotation of all sequences, SNPs are called and filtered.

An important part of your pipeline is the ability to call SNPs in repeat junctions. What is this, and why is it important?

Transposable elements (TE) make up large proportions of many eukaryotic genomes. For example, they represent ~35% of the rice genome, and ~90% of the hexaploid wheat genome, and significantly contribute to the size, organization and evolution of plant genomes. Repeat junctions (RJs) are created by insertions of TEs into each other, into genes, or into other DNA sequences. Previous studies showed that those repeat junctions are commonly unique and genome-specific. They can be therefore treated as single copy markers in the genome.

The genome specificity of TE junction-based markers makes them particularly useful for mapping of polyploid species including many important crops, such as wheat and cotton. Because repeat junctions are also abundant and randomly distributed along chromosomes, they have a great potential in development of genome-wide molecular markers for high-throughput mapping and diversity studies in large and complex genomes.

In this paper you used both base space and color space data. Did you find any challenges with mixing these data types?

We had difficulty using base space and color space data together in read mapping. No academic command-line-based programs for hybrid read mapping are currently available. Thus, this is still a challenge for hybrid data mapping. Instead, we can perform short read mapping separately for color-space and base-space data, and then merge the results in the pipeline.

Given the highly repetitive sequence you were working with, will this method work even better with less repetitive genomes?

Yes. I can expect that the method proposed in this paper will work even better with less repetitive genomes.

*Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. You FM, Huo N, Deal KR, Gu YQ, Luo MC, McGuire PE, Dvorak J, Anderson OD BMC Genomics 2011, 12:59 (25 January 2011)

Kevin's GATTACA World

Sunday, 11 September 2011

Differential expression in RNA-seq: A matter of d... [Genome Res. 2011] - PubMed - NCBI

Saturday, 30 April 2011

Evaluation of next-generation sequencing software in mapping and assembly.

Monday, 7 March 2011

Next-Generation Sequencing without a Reference: Interview w/ Frank You of U.C. Davis/USDA-ARS

Datanami, Woe be me

Analytics code

Contributors