Showing posts with label journal. Show all posts

Tuesday, 28 September 2021

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers | Nature Methods

https://www.nature.com/articles/s41592-021-01254-9

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers | Nature Methods

The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not ...

www.nature.com

https://github.com/GoekeLab/bioinformatics-workflows

GitHub - GoekeLab/bioinformatics-workflows: minimal example implementations for bioinformatics workflow managers

Workflow managers provide an easy and intuitive way to simplify pipeline development. Here we provide basic proof-of-concept implementations for selected workflow managers. The analysis workflow is based on a small portion of an RNA-seq pipeline, using fastqc for quality controls and salmon for ...

github.com

Friday, 28 June 2013

Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

Each time when I change jobs, I will have to go through the adventure (and sometimes pain) to relearn about the computing resources available to me (personal), lab (small sharing pool), and the entire institute/company/school (Not enought to go around usually).
Depending on the job scope / number of cores / length of the job I would then setup the computing resources to run on either of the 3 resources available to me.
Sometimes, grant money appears magically and I am asked by my boss what do I need to buy (ok TBH this is rare). Hence it's always nice to keep a lookout on what's available on the market and who's using what to do what. So that one day when grant money magically appears, I won't be stumped for an answer.

excerpted from the provisional PDF are three points which I agree fully

Three GiB of RAM per core is not enough
you won't believe the number of things I tried to do to outsmart the 'system' just to squeeze enough ram for my jobs. Like looking for parallel queues which often have a bigger amount of RAM allocation. Doing tests for small jobs to make sure it runs ok before scaling it up and have it fail after two days due to insufficient RAM.
MPI is not widely used in NGS analysis
A lot of the queues in the university shared resource has ample resources for my jobs but were reserved for MPI jobs. Hence I can't touch those at all.
A central ﬁle system helps keep redundancy to a minimum
balancing RAM / compute cores to make the job splitting efficient was one thing. The other pain in the aXX was having to move files out of the compute node as soon as the job is done and clear all intermediate files. There were times where the job might have failed but as I deleted the intermediate files in the last step of the pipeline bash script, I wasn't able to be sure it ran to completion. In the end I had to rerun the job and keeping the intermediate files

anyway for more info you can check out the below

http://www.gigasciencejournal.com/content/2/1/9/abstract

Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

Samuel Lampa, Martin Dahlö, Pall I Olason, Jonas Hagberg and Ola Spjuth

For all author emails, please log on.

GigaScience 2013, 2:9 doi:10.1186/2047-217X-2-9

Published: 25 June 2013

Abstract (provisional)

Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.

The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.

Monday, 10 September 2012

[pub]: Genome-Wide Association Analysis of Imputed Rare Variants: Application to Seven Common Complex Diseases

http://onlinelibrary.wiley.com/doi/10.1002/gepi.21675/abstract;jsessionid=0E67E391238867DA8CC7EDD1FAABCE88.d03t01

Genet Epidemiol. 2012 Sep 5. doi: 10.1002/gepi.21675. [Epub ahead of print]

Genome-Wide Association Analysis of Imputed Rare Variants: Application to Seven Common Complex Diseases.

Mägi R, Asimit JL, Day-Williams AG, Zeggini E, Morris AP.

Source

Estonian Genome Centre, University of Tartu, Tartu, Estonia.

Abstract

Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.

PMID:: 22951892; [PubMed - as supplied by publisher]

Tuesday, 4 September 2012

PLoS ONE: Adaptive Ridge Regression for Rare Variant Detection

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0044173

Abstract Top

It is widely believed that both common and rare variants contribute to the risks of common diseases or complex traits and the cumulative effects of multiple rare variants can explain a significant proportion of trait variances. Advances in high-throughput DNA sequencing technologies allow us to genotype rare causal variants and investigate the effects of such rare variants on complex traits. We developed an adaptive ridge regression method to analyze the collective effects of multiple variants in the same gene or the same functional unit. Our model focuses on continuous trait and incorporates covariate factors to remove potential confounding effects. The proposed method estimates and tests multiple rare variants collectively but does not depend on the assumption of same direction of each rare variant effect. Compared with the Bayesian hierarchical generalized linear model approach, the state-of-the-art method of rare variant detection, the proposed new method is easy to implement, yet it has higher statistical power. Application of the new method is demonstrated using the well-known data from the Dallas Heart Study.

Citation: Zhan H, Xu S (2012) Adaptive Ridge Regression for Rare Variant Detection. PLoS ONE 7(8): e44173. doi:10.1371/journal.pone.0044173

Editor: Momiao Xiong, University of Texas School of Public Health, United States of America

Received: May 28, 2012; Accepted: July 30, 2012; Published: August 28, 2012

We believe that the high power of the BhGLM is due to (1) appropriate combination of the individual rare variants (the new score) and (2) assignment of the hierarchical priors. After a thorough evaluation of these new methods, we found that there is still some room for improvement. The new score of the BhGLM method is a first moment parameter (shared effect). An alternative score may be a second moment parameter (shared variance). The Yi and Zhi's method requires prior distributions, and thus different priors may produce different results. The hyper-parameters involved in the prior distributions may also affect the results. In this study, we proposed to use a shared variance among rare variants as the new score. The method is originally called ridge regression [27]. It is further modified to discriminate against rare variants with small effects. This modified ridge regression is called the adaptive ridge regression [28]. The adaptive ridge regression (ARR) is performed under the maximum likelihood framework, and thus prior distributions of parameters are not required, equivalent to independent uniform priors for all parameters.

....

The method is called sequence kernel association test (SKAT). After reading this paper, we agreed that our approach is similar to SKAT. However, SKAT only gives the score test and no parameter estimation is provided. This explains why SKAT is fast computationally. There are three major advantages of the adaptive ridge regression. First, a high score test does not mean the effects are large. It may be caused by small effects but large sample size. The score test cannot tell the difference. Our method not only provides a test but also an estimate of the group variance. We can provide a total proportion of the phenotypic variance contributed by the rare variants. Secondly, we introduced an adaptive step to the original ridge regression. This step plays the role of "weighting" of the SKAT method but it can "homogenize" the effect of each rare variant within a group. The ridge regression performs better under the "homogenized" rare variant effect assumption. Thirdly, our method works for both rare and common variants. However, the SKAT method was particularly designed for rare variants because the "weights" for the common variants will be almost zero (excluded from the model), according to the authors of that paper. There is a possibility to use the score test under our adaptive ridge regression framework. The estimation procedure will remain the same, but we may simply replace the likelihood ratio test by the score test. The "weights" obtained from the adaptive ridge regression will be used in the score test. This needs to be further investigated.

The new method is developed for continuous traits under the linear mixed model framework [30]. In many situations, the trait of interest may be a binary trait. The generalized linear mixed model (GLMM), which is an extension of the linear mixed model, can be used to analyze the association of multiple rare variants and a binary trait. This extension is very straight forward because the methodology of GLMM has been well established. The simple extension includes the adaptive steps.
Finally, we performed all the analyses using an R program. The R package is called Adaptive Ridge Regression (ARR) which can be downloaded from the authors' personal website: www.statgen.ucr.edu.

Thursday, 2 August 2012

Geneticists eye the potential of arXiv

Population biologists turn to pre-publication server to gain wider readership and rapid review of results.
excerpted

The preprint server arXiv.org is perhaps best known as the preserve of theoretical physicists and astrophysicists. But 2008 saw an influx of submissions of unpublished manuscripts, or preprints, by condensed-matter physicists who wanted to stake claims to the fast-moving subject of iron-based superconductors called pnictides. Now the life sciences may be on the cusp of their own ‘pnictide moment’, with population geneticists leading the charge.
In the past month, leading research groups have posted to arXiv high-profile papers on the genetic history of southern Africans¹ and Europeans². Other prominent population geneticists have submitted methods-based papers to the server, which is hosted by Cornell University in Ithaca, New York. The number of biology papers on the server is still small in comparison with physical-sciences preprints (see ‘Biology opens up’), but Paul Ginsparg, a theoretical physicist at Cornell who founded arXiv in 1991 (ref. 3), welcomes what he hopes could be a sea change.
“It’s wonderful if biologists are belatedly joining the late twentieth century,” he quips. “Welcome to the party; better late than never.”
.
.
.
.
Another attention-grabbing submission by prominent geneticists, posted on 23 July, compares genomic variation in 22 African populations to suggest an ancient genetic link between people in southern and eastern Africa¹. One of the paper’s senior authors, geneticist David Reich of Harvard Medical School in Boston, Massachusetts, publishes routinely in Nature and the Public Library of Science journals, and co-author Carlos Bustamante, of Stanford University School of Medicine in California, is a leader in the field. Reich says that first author Joseph Pickrell, also at Harvard Medical School, suggested using arXiv. Reich and the other co-authors saw no good reason not to post the manuscript there. “It could be an example of the younger generation coming in and finding this sort of thing natural,” says Ginsparg....

http://www.nature.com/news/geneticists-eye-the-potential-of-arxiv-1.11091

Tuesday, 29 May 2012

How Not To Be A Bioinformatician Source Code for Biology and Medicine 2012, 7:3 doi:10.1186/1751-0473-7-3

How Not To Be A Bioinformatician

Source Code for Biology and Medicine 2012, 7:3 doi:10.1186/1751-0473-7-3

abstract

Although published material exists about the skills required for a successful bioinformatics career, strangely enough no work to date has addressed the matter of how to excel at not being a bioinformatician. A set of basic guidelines and a code of conduct is hereby presented to re-address that imbalance for fellow-practitioners whose aim is to not to succeed in their chosen bioinformatics field. By scrupulously following these guidelines one can be sure to regress at a highly satisfactory rate.

http://www.scfbm.org/content/pdf/1751-0473-7-3.pdf

LMAO

"Be unreachable and isolated. Configure your contact email to either bounce back or

permanently set it to vacation. Miss key meetings or seminars where other colleagues may be presenting their seminal results and never, ever make any attempt at remembering their names or where they work. Reinvent the wheel. Do not keep up with the literature on current methods of research if you possibly can. "

was this even neccessary to be in the paper?

Tuesday, 15 May 2012

NATURE BIOTECHNOLOGY | Performance comparison of benchtop high-throughput sequencing platforms

NATURE BIOTECHNOLOGY | RESEARCH | ANALYSIS

Performance comparison of benchtop high-throughput sequencing platforms

Nature Biotechnology

30,

434–439

(2012)

doi:10.1038/nbt.2198

Received: 19 December 2011
Accepted: 30 March 2012
Published online: 22 April 2012
Corrected online: 23 April 2012

Abstract

Three benchtop high-throughput sequencing instruments are now available. The 454 GS Junior (Roche), MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) are laser-printer sized and offer modest set-up and running costs. Each instrument can generate data required for a draft bacterial genome sequence in days, making them attractive for identifying and characterizing pathogens in the clinical setting. We compared the performance of these instruments by sequencing an isolate of Escherichia coli O104:H4, which caused an outbreak of food poisoning in Germany in 2011. The MiSeq had the highest throughput per run (1.6 Gb/run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated the longest reads (up to 600 bases) and most contiguous assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h). Run in 100-bp mode, the Ion Torrent PGM had the highest throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM and 454 GS Junior both produced homopolymer-associated indel errors (1.5 and 0.38 errors per 100 bases, respectively).

Figures at a glance

left

Thursday, 16 February 2012

How Identical are Identical Twins? | Read Through Transcription

How Identical are Identical Twins?
December 19, 2011 by Ramesh Hariharan
We're looking at exome sequencing data on whole peripheral blood DNA of monozygotic twins (this data was generated by our collaborators, Jan Dumanski and his group at Uppsala University in Sweden). Monozygotic twins were earlier thought to be genetically identical; now we know that isn't completely true. How does one identify small mutations (SNPs and small InDels) that are present in one of the twins but not in the other? Or in general, how does one compare two different samples, for instance, to find somatic mutations that are present in a tumor sample but not present in the paired normal sample.
...
The 1000 Genomes Project estimated that a child has only around 50 new mutations relative to its parents. Monozygotic twins ought to be closer than that. And we are observing only the exomes (and some neighborhood) of these twins. So the real answer probably lies close to the bottom of the above table. However, as Jan Dumanski points out, much of the 1000 Genomes effort involved sequencing of oligoclonal/monoclonal lymphoblastoid cell lines, not quite directly comparable with whole peripheral blood.

http://blog.avadis-ngs.com/2011/12/how-identical-are-identical-twins/

gosh who knew calling SNPs on identical twins can be a complicated task??

update:
related links I found
Different sides of the same coin; twins and epigenetics

http://blogs.dnalc.org/2011/09/23/different-sides-of-the-same-coin-twins-and-epigenetics/

Tuesday, 12 July 2011

A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly

from Bioinformatics - current issue by Dinh, H., Rajasekaran, S.

Motivation: Exact-match overlap graphs have been broadly used in the context of DNA assembly and the shortest super string problem where the number of strings n ranges from thousands to billions. The length of the strings is from 25 to 1000, depending on the DNA sequencing technologies. However, many DNA assemblers using overlap graphs suffer from the need for too much time and space in constructing the graphs. It is nearly impossible for these DNA assemblers to handle the huge amount of data produced by the next-generation sequencing technologies where the number n of strings could be several billions. If the overlap graph is explicitly stored, it would require (n²) memory, which could be prohibitive in practice when n is greater than a hundred million. In this article, we propose a novel data structure using which the overlap graph can be compactly stored. This data structure requires only linear time to construct and and linear memory to store.
Results: For a given set of input strings (also called reads), we can informally define an exact-match overlap graph as follows. Each read is represented as a node in the graph and there is an edge between two nodes if the corresponding reads overlap sufficiently. A formal description follows. The maximal exact-match overlap of two strings x and y, denoted by ov_max(x, y), is the longest string which is a suffix of x and a prefix of y. The exact-match overlap graph of n given strings of length is an edge-weighted graph in which each vertex is associated with a string and there is an edge (x, y) of weight =–|ov_max(x, y)| if and only if ≤, where |ov_max(x, y)| is the length of ov_max(x, y) and is a given threshold. In this article, we show that the exact-match overlap graphs can be represented by a compact data structure that can be stored using at most (2–1)(2logn+log)n bits with a guarantee that the basic operation of accessing an edge takes O(log ) time. We also propose two algorithms for constructing the data structure for the exact-match overlap graph. The first algorithm runs in O(nlogn) worse-case time and requires O() extra memory. The second one runs in O(n) time and requires O(n) extra memory. Our experimental results on a huge amount of simulated data from sequence assembly show that the data structure can be constructed efficiently in time and memory.
Availability: Our DNA sequence assembler that incorporates the data structure is freely available on the web at http://www.engr.uconn.edu/~htd06001/assembler/leap.zip

Saturday, 30 April 2011

Evaluation of next-generation sequencing software in mapping and assembly.

Evaluation of next-generation sequencing software in mapping and assembly.

J Hum Genet. 2011 Apr 28;

Authors: Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ

Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.Journal of Human Genetics advance online publication, 28 April 2011; doi:10.1038/jhg.2011.43.

PMID: 21525877 [PubMed - as supplied by publisher]

More...

Thursday, 14 April 2011

Tumour evolution inferred by single-cell sequencing

from Nature - Issue - nature.com science feeds by Michael Wigler

4 people liked this

Tumour evolution inferred by single-cell sequencing
Nature 472, 7341 (2011). doi:10.1038/nature09807
Authors: Nicholas Navin, Jude Kendall, Jennifer Troge, Peter Andrews, Linda Rodgers, Jeanne McIndoo, Kerry Cook, Asya Stepansky, Dan Levy, Diane Esposito, Lakshmi Muthuswamy, Alex Krasnitz, W. Richard McCombie, James Hicks & Michael Wigler
Genomic analysis provides insights into the role of copy number variation in disease, but most methods are not designed to resolve mixed populations of cells. In tumours, where genetic heterogeneity is common, very important information may be lost that would be useful for reconstructing evolutionary history. Here we show that with flow-sorted nuclei, whole genome amplification and next generation sequencing we can accurately quantify genomic copy number within an individual nucleus. We apply single-nucleus sequencing to investigate tumour population structure and evolution in two human breast cancer cases. Analysis of 100 single cells from a polygenomic tumour revealed three distinct clonal subpopulations that probably represent sequential clonal expansions. Additional analysis of 100 single cells from a monogenomic primary tumour and its liver metastasis indicated that a single clonal expansion formed the primary tumour and seeded the metastasis. In both primary tumours, we also identified an unexpectedly abundant subpopulation of genetically diverse ‘pseudodiploid’ cells that do not travel to the metastatic site. In contrast to gradual models of tumour progression, our data indicate that tumours grow by punctuated clonal expansions with few persistent intermediates.

Wednesday, 23 March 2011

A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.

    A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.
    Zhang W, Chen J, Yang Y, Tang Y, Shang J, Shen B.
    PLoS One. 2011 Mar 14;6(3):e17915.
    PMID: 21423806 [PubMed - in process]

Should be interesting..

Thursday, 17 March 2011

Common numbers / statistics for Uniquely mapped reads?

Was asked if there was a commonly reported numbers for
uniquely mapped reads (which is troublesome to define with bowtie)
vs
total mapped reads

Not sure also if the numbers differ for applications
e.g.
WGS
exome reseq

human vs other organisms.
Got this figure from a 2009 paper. Not sure if anyone collates data like this
http://bioinformatics.oxfordjournals.org/content/25/7/969.full.pdf

Saturday, 12 March 2011

Quality control and preprocessing of metagenomic datasets

from Bioinformatics - current issue by Schmieder, R., Edwards, R.

Summary: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis.
Availability and Implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at http://prinseq.sourceforge.net/

Improving Detection of Genome Structural Variation

from MassGenomics by Dan Koboldt

Large-scale structural variation (SV) is pervasive in the human genome, both in healthy individuals and in tumor cells. Numerous methods have been developed to detect such variants, most of which rely on the information provided by molecularly paired reads. Even the most sophisticated methods, however, still generate numerous false positives. A new study in Nature Genetics describes an innovative, population-based method to improve the accuracy of SV calling. In their introduction, Handsaker et al offer four main causes underlying false positives in SV calls:

Sequencing errors, which occur more frequently in next-generation sequencing data and exhibit both random and platform-specific bias distributions.
Chimeric molecules, in which read pairs linking two non-continguous segments of DNA masquerade as SVs. Sequencing libraries can contain millions of such fragments, which represent ~1% of sequence reads.
Read depth variation, which fluctuates across the genome for both biological and technical reasons.
Genome repeats, which confound most short-read aligners even when read pairing information is available.

These issues are exacerbated in population-scale sequencing, which often yields lower coverage across large numbers of samples. As more genomes are sequenced, false positives accumulate faster than real variants do. However, the authors hypothesized that population-scale sequencing might enable new analytical approaches. Here, they describe three strategies to do just that: allele sharing, population heterogeneity, and allelic substitution.

References
Handsaker RE, Korn JM, Nemesh J, & McCarroll SA (2011). Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature genetics, 43 (3), 269-76 PMID: 21317889

Friday, 11 March 2011

A comparison of single molecule and amplification based sequencing of cancer transcriptomes.

1. PLoS One. 2011 Mar 1;6(3):e17305.

A comparison of single molecule and amplification based sequencing of cancer transcriptomes.
Sam LT, Lipson D, Raz T, Cao X, Thompson J, Milos PM, Robinson D, Chinnaiyan AM,
Kumar-Sinha C, Maher CA.

Michigan Center for Translational Pathology, University of Michigan, Ann Arbor,
Michigan, United States of America.

The second wave of next generation sequencing technologies, referred to as
single-molecule sequencing (SMS), carries the promise of profiling samples
directly without employing polymerase chain reaction steps used by
amplification-based sequencing (AS) methods. To examine the merits of both
technologies, we examine mRNA sequencing results from single-molecule and
amplification-based sequencing in a set of human cancer cell lines and tissues.
We observe a characteristic coverage bias towards high abundance transcripts in
amplification-based sequencing. A larger fraction of AS reads cover highly
expressed genes, such as those associated with translational processes and
housekeeping genes, resulting in relatively lower coverage of genes at low and
mid-level abundance. In contrast, the coverage of high abundance transcripts
plateaus off using SMS. Consequently, SMS is able to sequence lower- abundance
transcripts more thoroughly, including some that are undetected by AS methods;
however, these include many more mapping artifacts. A better understanding of the
technical and analytical factors introducing platform specific biases in high
throughput transcriptome sequencing applications will be critical in cross
platform meta-analytic studies.

PMID: 21390249 [PubMed - in process]

Wednesday, 2 March 2011

Papers on Comparison of microRNA profiling platforms

Systematic Evaluation of Three microRNA Profiling Platforms: Microarray, Beads Array, and Quantitative Real-Time PCR Array

Background

A number of gene-profiling methodologies have been applied to microRNA research. The diversity of the platforms and analytical methods makes the comparison and integration of cross-platform microRNA profiling data challenging. In this study, we systematically analyze three representative microRNA profiling platforms: Locked Nucleic Acid (LNA) microarray, beads array, and TaqMan quantitative real-time PCR Low Density Array (TLDA).

Systematic comparison of microarray profiling, real-time PCR, and next-generation sequencing technologies for measuring differential microRNA expression

Abstract
RNA abundance and DNA copy number are routinely measured in high-throughput using microarray and next-generation sequencing (NGS) technologies, and the attributes of different platforms have been extensively analyzed. Recently, the application of both microarrays and NGS has expanded to include microRNAs (miRNAs), but the relative performance of these methods has not been rigorously characterized. We analyzed three biological samples across six miRNA microarray platforms and compared their hybridization performance. We examined the utility of these platforms, as well as NGS, for the detection of differentially expressed miRNAs. We then validated the results for 89 miRNAs by real-time RT-PCR and challenged the use of this assay as a “gold standard.” Finally, we implemented a novel method to evaluate false-positive and false-negative rates for all methods in the absence of a reference method.

Friday, 25 February 2011

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries

In Genomeweb February 24, 2011

Broad Team IDs, Improves PCR Amplification Bias in Illumina Sequencing Libraries

A team of Broad Institute researchers has developed a panel of qPCR assays designed to identify the primary causes of PCR amplification bias in Illumina sequencing libraries, and built an optimized protocol that can help reduce such bias, according to a recent paper.
The scientists said they expect the new protocol to amplify sequencing libraries more evenly than the standard Illumina protocol, and to minimize bias introduced by factors such as choice of thermocycler and PCR amplification enzyme, and temperature ramp rate.
And although their optimized protocol does not minimize bias in all scenarios, it is expected to improve sequencing of GC-rich regions of the human genome, which contain important information for cancer and medical genetics studies, the researchers said.

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries

Daniel Aird, Michael G Ross, Wei-Sheng Chen, Maxwell Danielsson, Timothy Fennell, Carsten Russ, David B Jaffe, Chad Nusbaum and Andreas Gnirke

Genome Biology 2011, 12:R18 doi:10.1186/gb-2011-12-2-r18
Published: 21 February 2011

Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by qPCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate.

Tuesday, 28 September 2021

Friday, 28 June 2013

Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

Abstract (provisional)

The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.

Monday, 10 September 2012

Genome-Wide Association Analysis of Imputed Rare Variants: Application to Seven Common Complex Diseases.

Source

Abstract

Tuesday, 4 September 2012

Abstract Top

Thursday, 2 August 2012

Tuesday, 29 May 2012

Tuesday, 15 May 2012

Abstract

Figures at a glance

Thursday, 16 February 2012

Tuesday, 12 July 2011

Saturday, 30 April 2011

Thursday, 14 April 2011

Wednesday, 23 March 2011

Thursday, 17 March 2011

Saturday, 12 March 2011

Friday, 11 March 2011

Wednesday, 2 March 2011

Background

Friday, 25 February 2011

In Genomeweb February 24, 2011

Broad Team IDs, Improves PCR Amplification Bias in Illumina Sequencing Libraries

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries

Datanami, Woe be me