Wednesday, 31 August 2011

PLoS Ghostwriting Collection

Ghostwriting Collection

Ghostwriting occurs when someone has made substantial contributions to writing a manuscript but this role is unacknowledged. In medicine, ghostwriting is problematical because it often involves pharmaceutical companies (or the medical communication companies that work for them) producing articles that promote the benefits of their health-care products while playing down their harm, and then masking their involvement in the development of the articles by recruiting academic "guest authors" to lend false credibility and independence. Because ghostwriting misrepresents authorship credit and accountability, it is considered to be unethical, dishonest, and a threat to the integrity of the medical literature. Fortunately, this previously "hidden" problem has been the focus of increasing research and commentary, including potential solutions to the problem of ghostwriting. Much of this research and commentary has appeared in PLoS journals, which we collect below.

More information can be found at the Wyeth ghostwriting archive, which was developed after PLoS Medicine andThe New York Times intervened in 2009 into litigation brought against the pharmaceutical company Wyeth by thousands of women who developed breast cancer taking hormone therapy drugs, resulting in the public release of 1500 documents extensively detailing the company's ghostwriting.

In addition, the topic of ghostwriting is frequently covered on PLoS Medicine's blogsite, Speaking of Medicine.

We will update the Collection with new content periodically at

Sunday, 28 August 2011

Parallelized short read assembly of large... [BMC Bioinformatics. 2011] - PubMed - NCBI

. Abstract ABSTRACT: Next-generation sequencing technologies have given rise to the explosive increase in DNA sequencing throughput, and have promoted the recent development of de novo short read assemblers. However, existing assemblers require high execution times and a large amount of compute resources to assemble large genomes from quantities of short reads. We present PASHA, a parallelized short read assembler using de Bruijn graphs, which takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. Evaluation using three small-scale real paired-end datasets shows that PASHA is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. PASHA's scalability for large genome datasets is demonstrated with human genome assembly. Compared to ABySS, PASHA achieves competitive assembly quality with faster execution speed on the same compute resources, yielding an NG50 contig size of 503 with the longest correct contig size of 18,252, and an NG50 scaffold size of 2,294. Moreover, the human assembly is completed in about 21 hours with only modest compute resources. Developing parallel assemblers for large genomes has been garnering significant research efforts due to the explosive size growth of high-throughput short read datasets. By employing hybrid parallelism consisting of multi-threading on multi-core CPUs and message passing on compute clusters, PASHA is able to assemble the human genome with high quality and in reasonable time using modest compute resources.

Friday, 26 August 2011

Data Production | USC Epigenome Center

This is an excellent faq page written candidly to explain what you should expect From your ngs run and what to do with it

Thursday, 25 August 2011

Samsung Genome SDS: Not content with just creating your mobile phone

Samsung has been in the news recently for being in a patent war with Apple. Now they are creating waves of a different kind.

They have just launched a web based analysis for Illumina and Solid data.

For RNA-seq, Solid is analysed using Bioscope and Tophat/Cufflinks is used for Illumina.

One thing they have over the edge over competitors is their ability to do web transfer of data at 10x the speed of FTP

They have some preliminary benchmark data here

SOLiD 5500 and 454 data is notably missing from the platforms supported. Nevertheless still worth a browse I guess.

Wednesday, 24 August 2011

Ion Torrent public data 250+ bp! but protocol won't be out till Oct :(

Keith blogs about the publicly avail 314 chip data with tantalizing 250+ bp read lengths

Ion Throws A Long Punch At MiSeq

The benchtop sequencer wars are heating up!  Illumina and Life are engaged in a fierce war of pamphlets and datasets to convince the world that they have the edge.  I won't attempt to give a complete play-by-play, but hit on the latest developments, which includes Ion releasing a dataset of 250+ bp reads.
Ion is coming out slugging with a new 314 dataset and application note, showing read lengths of over 250 bases (they assure me that it will work also on the 316, but nothing would beat a dataset!).  This is long awaited; as I've noted before for applications such as amplicon sequencing the current fragment sizes of <150 bases and read lengths mostly in the 70s are challenging to work with.  For my own applications, 250 is actually a very good number. When working with formalin-fixed, paraffin-embedded cancer samples, the recovered DNA is generally already sheared to about 450 bases, so you need to design smaller amplicons for good luck.  A bit of 200 will cover many exons quite well (some common longer ones will require two amplicons; there are of course monsters out there that are special cases requiring many) and still leave room for a barcode.  In my analysis of this dataset, Ion has an aligned read depth of over 200K at position 240 (below).


Genome sequence and global sequence variation map with 5.5 million SNPs in Chinese rhesus macaque.

Genome sequence and global sequence variation map with 5.5 million SNPs in Chinese rhesus macaque.

Genome Biol. 2011 Jul 6;12(7):R63. [Epub ahead of print]

Genome sequence and global sequence variation map with 5.5 million SNPs in Chinese rhesus macaque.




Rhesus macaque (Macaca mulatta) is the most widely used nonhuman primate animal in biomedical research. A global map of genetic variations in rhesus macaque is valuable for both evolutionary and functional studies.


Using next-generation sequencing technology, we sequenced a Chinese rhesus macaque genome with 11.56-fold coverage. In total, 96% of the reference Indian macaque genome was covered by at least one read, and we identified 2.56 million homozygous and 2.94 million heterozygous SNPs. We also detected a total of 125,150 structural variations, of which 123,610 were deletions with a median length of 184 bp (ranging from 25bp to 10kb); 63% of these deletions were located in intergenic regions and 35% in intronic regions. We further annotated 5,187 and 962 nonsynonymous SNPs to the macaque orthologs of human disease and drug-target genes, respectively. Finally, we set up a genome-wide genetic variation database with the use of Gbrowse.


Genome sequencing and construction of a global sequence variation map in Chinese rhesus macaque with the concomitant database provide applicable resources for evolutionary and biomedical research.

[PubMed - as supplied by publisher]

Click here to read

Comparative analysis of algorithms for next-genera... [Bioinformatics. 2011] - PubMed result

. Abstract The advent of next-generation sequencing (NGS) techniques presents many novel opportunities for many applications in life sciences. The vast number of short reads produced by these techniques, however, pose significant computational challenges. The first step in many types of genomic analysis is the mapping of short reads to a reference genome, and several groups have developed dedicated algorithms and software packages to perform this function. As the developers of these packages optimize their algorithms with respect to various considerations, the relative merits of different software packages remain unclear. However, for scientists who generate and use NGS data for their specific research projects, an important consideration is choosing the software that is most suitable for their application. With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels, and coverage. We also develop criteria to compare the performances of software with disparate output structure (e.g., some packages return a single alignment while some return multiple possible alignments). Using these criteria, we comprehensively evaluate the performances of Bowtie, BWA, mr-and mrsFAST, Novoalign, SHRiMP and SOAPv2, with regard to accuracy and runtime. Conclusion: We expect that the results presented here will be useful to investigators in choosing the alignment software that is most suitable for their specific research aims. Our results also provide insights into the factors that should be considered to use alignment results effectively. Seal can also be used to evaluate the performance of algorithms that use deep sequencing data for various purposes (e.g., identification of genomic variants). Seal is available as open-source at PMID: 21856737 [PubMed -as supplied by publisher]

LOCAS - A Low Coverage Assembly Tool for Resequenc... [PLoS One. 2011] - PubMed result

Abstract Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking. We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime. LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario. PMID: 21858125 [PubMed -in process]

Friday, 19 August 2011

Digital gene expression for non-model organisms.

Genome Res. 2011 Aug 15. [Epub ahead of print]
Click here to read

Digital gene expression for non-model organisms.


Department of Genetics, Stanford University;


Next-generation sequencing technologies offer new approaches for global measurements of gene expression, but are mostly limited to organisms for which a high-quality assembled reference genome sequence is available. We present a method for gene expression profiling called EDGE, or EcoP15I-tagged Digital Gene Expression, based on ultra high-throughput sequencing of 27 bp cDNA fragments that uniquely tag the corresponding gene, thereby allowing direct quantification of transcript abundance. We show that EDGE is capable of assaying for expression in >99% of genes in the genome and achieves saturation after 6 - 8 million reads. EDGE exhibits very little technical noise, reveals a large (106) dynamic range of gene expression, and is particularly suited for quantification of transcript abundance in non-model organisms where a high quality annotated genome is not available. In a direct comparison with RNA-seq, both methods provide similar assessments of relative transcript abundance, but EDGE does better at detecting gene expression differences for poorly expressed genes, and does not exhibit transcript length bias. Applying EDGE to laboratory mice, we show that a loss-of-function mutation in the Melanocortin 1 receptor (Mc1r), recognized as a Mendelian determinant of yellow hair color in many different mammals, also causes reduced expression of genes involved in the interferon response. To illustrate the application of EDGE to a non-model organism, we examine skin biopsy samples from a cheetah (Acinonyx jubatus), and identify genes likely to control differences in the color of spot vs. non-spotted regions.

[PubMed - as supplied by publisher]

Genetic basis behind immigration-delay disease ..

I've got you under my finger … or not
We usually think of fingerprints in terms of their use in forensics: The unique character of each person's fingerprints serves as a useful means of identification whether it be for security purposes or for reliably connecting a suspect with fingerprint evidence left at a crime scene. But, why did we evolve to have fingerprints, and what are the biological processes involved in their creation? Insight into addressing these questions may be provided by studying a group of individuals who do not have fingerprints. This condition, known as adermatoglyphia, can be isolated or accompanied by additional syndromic features. In this issue, Nousbeck and colleagues report that a mutation in SMARCAD1 causes adermatoglyphia in the affected individuals of a large Swiss family.

The fingerprint research was published August 12 in the American Journal of Human Genetics.

Also featured on NatGeo

Thursday, 18 August 2011

Are similarity- or phylogeny-based methods more appropriate for classifying internal transcribed spacer (ITS) metagenomic amplicons?

New Phytol. 2011 Aug 2. doi: 10.1111/j.1469-8137.2011.03838.x. [Epub ahead of print]

Are similarity- or phylogeny-based methods more appropriate for classifying internal transcribed spacer (ITS) metagenomic amplicons?


Department of Biology, McMaster University, Hamilton, ON L8S 4L8, Canada.


• The internal transcribed spacer (ITS) of the nuclear ribosomal DNA region is a widely used species marker for plants and fungi. Recent metagenomic studies using next-generation sequencing, however, generate only partial ITS sequences. Here we compare the performance of partial and full-length ITS sequences with several classification methods. • We compiled a full-length ITS data set and created short fragments to simulate the read lengths commonly recovered from current next-generation sequencing platforms. We compared recovery, erroneous recovery, and coverage for the following methods: best BLAST hit classification, MEGAN classification, and automated phylogenetic assignment using the Statistical Assignment Program (SAP). • We found that summarizing results with more inclusive taxonomic ranks increased recovery and reduced erroneous recovery. The similarity-based methods BLAST and MEGAN performed consistently across most fragment lengths. Using a phylogeny-based method, SAP runs with queries 400 bp or longer worked best. Overall, BLAST had the highest recovery rates and MEGAN had the lowest erroneous recovery rates. • A high-throughput ITS classification method should be selected, taking into consideration read length, an acceptable tradeoff between maximizing the total number of classifications and minimizing the number of erroneous classifications, and the computational speed of the assignment method.

No claim to original US government works. New Phytologist © 2011 New Phytologist Trust.

[PubMed - as supplied by publisher]

Deep sequencing of small RNAs from human skin reveals major alterations in the psoriasis miRNAome.

Hum Mol Genet. 2011 Aug 12. [Epub ahead of print]

Deep sequencing of small RNAs from human skin reveals major alterations in the psoriasis miRNAome.


Department of Genetics.


Psoriasis is a chronic and complex inflammatory skin disease with lesions displaying dramatically altered mRNA expression profiles. However, much less is known about the expression of small RNAs. Here, we describe a comprehensive analysis of the normal and psoriatic skin miRNAome with next-generation sequencing in a large patient cohort. We generated 6.7 × 10(8) small RNA reads representing 717 known and 284 putative novel microRNAs (miRNAs). We also observed widespread expression of isomiRs and miRNA*s derived from known and novel miRNA loci, and a low frequency of miRNA editing in normal and psoriatic skin. The expression and processing of selected novel miRNAs were confirmed with qRT-PCR in skin and other human tissues or cell lines. Eighty known and 18 novel miRNAs were 2-42-fold differentially expressed in psoriatic skin. Of particular significance was the 2.7-fold upregulation of a validated novel miRNA derived from the antisense strand of the miR-203 locus, which plays a role in epithelial differentiation. Other differentially expressed miRNAs included hematopoietic-specific miRNAs such as miR-142-3p and miR-223/223*, and angiogenic miRNAs such as miR-21, miR-378, miR-100 and miR-31, which was the most highly upregulated miRNA in psoriatic skin. The functions of these miRNAs are consistent with the inflammatory and hyperproliferative phenotype of psoriatic lesions. In situ hybridization of differentially expressed miRNAs revealed stratified epidermal expression of an uncharacterized keratinocyte-derived miRNA, miR-135b, as well as the epidermal infiltration of the hematopoietic-specific miRNA, miR-142-3p, in psoriatic lesions. This study lays a critical framework for functional characterization of miRNAs in healthy and diseased skin.

[PubMed - as supplied by publisher]

Overview of the transcriptome profiles identified in hagfish, shark, and bichir: current issues arising from some nonmodel vertebrate taxa.

J Exp Zool B Mol Dev Evol. 2011 Aug 1. doi: 10.1002/jez.b.21427. [Epub ahead of print]

Overview of the transcriptome profiles identified in hagfish, shark, and bichir: current issues arising from some nonmodel vertebrate taxa.


Laboratory for Evolutionary Morphology, Center for Developmental Biology, RIKEN, Chuo-ku, Kobe, Japan.


Because of their crucial phylogenetic positions, hagfishes, sharks, and bichirs are recognized as key taxa in our understanding of vertebrate evolution. The expression patterns of the regulatory genes involved in developmental patterning have been analyzed in the context of evolutionary developmental studies. However, in a survey of public sequence databases, we found that the large-scale sequence data for these taxa are still limited. To address this deficit, we used conventional Sanger DNA sequencing and a next-generation sequencing technology based on 454 GS FLX sequencing to obtain expressed sequence tags (ESTs) of the Japanese inshore hagfish (Eptatretus burgeri; 161,482 ESTs), cloudy catshark (Scyliorhinus torazame; 165,819 ESTs), and gray bichir (Polypterus senegalus; 34,336 ESTs). We deposited the ESTs in a newly constructed database, designated the "Vertebrate TimeCapsule." The ESTs include sequences from genes that can be effectively used in evolutionary developmental studies; for instance, several encode cartilaginous extracellular matrix proteins, which are central to an understanding of the ways in which evolutionary processes affected the skeletal elements, whereas others encode regulatory genes involved in craniofacial development and early embryogenesis. Here, we discuss how hagfishes, sharks, and bichirs contribute to our understanding of vertebrate evolution, we review the current status of the publicly available sequence data for these three taxa, and we introduce our EST projects and newly developed database. © 2011 Wiley-Liss, Inc.

© 2011 Wiley-Liss, Inc.

[PubMed - as supplied by publisher]

Marijuana DNA Sequenced by Startup

Kevin McKernan was leading Life
Technologies Corp.'s Ion Torrent DNA-sequencing research when a
new business opportunity caught his eye: marijuana.

A year later, McKernan, 38, has quit his job, formed a startup run from his house in Marblehead, Massachusetts, and announced today that the company had sequenced the entire genome of the cannabis plant.

The project, which cost about $200,000, may lead to the development of treatments for cancer, pain and inflammatory diseases, he said. McKernan's company, Medicinal Genomics, is making the data public using Inc. (AMZN)'s EC2 cloud- computing system. McKernan called the work a "draft assembly," and it hasn't yet been published in a peer-reviewed academic journal.

Torrent Suite Software Mobile Browser


Working with Reference Genomes in Torrent Suite Software 1.4

Wednesday, 17 August 2011

Application of next-generation sequencing technology to profile the circulating microRNAs in the serum of preeclampsia versus normal pregnant women.

Clin Chim Acta. 2011 Aug 5. [Epub ahead of print]

Application of next-generation sequencing technology to profile the circulating microRNAs in the serum of preeclampsia versus normal pregnant women.



Circulating miRNAs, as a new family of miRNAs existing in plasma and serum, had shown great potential to serve as a novel biomarker in body fluid for non-invasive diagnosis and prognosis of plenty kinds of disease, such as cancer and prenatal screening.


In this present study, we analyzed the expression profiles of circulating miRNAs in the serum of four pregnant women with preeclampsia (PE) and one normal control of pregnant women, by the next generation sequencing technology.


By annotated the raw sequence reads with the databases of miRNA, genome and others small RNA library, miRNA was found to be the major composition of those small RNA-annotated reads. In the result of circulating miRNA profiles in serum, up to 573 distinct miRNAs were annotated to miRBase. The biological features of circulating miRNA in serum were consistent with those tissue/cell based miRNA in the database. Notably, 22 miRNAs were found to be dys-regulated expressed with PE. Compared to the normal control, 15 and 7 miRNAs were up-regulated and down-regulated respectively in each four PE samples. Among these 22 miRNAs, 3 dys-regulated miRNAs have been reported to be dys-regulated in the placentas of PE pregnancies.


Results showed that circulating miRNAs in serum of pregnant women could be detected more comprehensive by the next generation sequencing technology. It also suggested that those PE-related miRNAs obtained in this study might be used as notable biomarkers for diagnosis and prognosis of PE.

Copyright © 2011. Published by Elsevier B.V.

[PubMed - as supplied by publisher]

ChimeraScan: A tool for identifying chimeric transcription in sequencing data.

Bioinformatics. 2011 Aug 11. [Epub ahead of print]

ChimeraScan: A tool for identifying chimeric transcription in sequencing data.


Michigan Center for Translational Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.



Next Generation Sequencing (NGS) technologies have enabled de novo gene fusion discovery that could reveal candidates with therapeutic significance in cancer. Here we present an open-source software package, ChimeraScan, for the discovery of chimeric transcription between two independent transcripts in high-throughput transcriptome sequencing data.



Christopher A. Maher,


Available at Bioinformatics online.

[PubMed - as supplied by publisher]

Tuesday, 16 August 2011

microRNA Profiling: Platform Comparison

Found this gem of a presentation online ...

microRNA Profiling:
Platform Comparison
ABRF Microarray Research Group
Don Baldwin – Penn Microarray Facility,
University of Pennsylvania

Examine multiple microarray and next-gen
sequencer platforms for performance in miRNA
•  Provide information on sensitivity, reproducibility,
and concordance among platforms
•  Make data available for reference in selecting &
running miRNA profiling assays

Study begun as 2009 MARG project in collaboration with
•  Preliminary report on microarray component of study and
single miRNA seq result presented at ABRF 2009
•  Over last year, additional sequencing data has been
generated on Illumina and ABI SOliD platforms in MARG
member labs


microRNA Profiling: Platform Comparison

Friday, 12 August 2011

Time for humour

How do people in science see each other ...

a friend commented that bioinfomaticians are in the technicians' shoes in this region

is 12 million 90 bp transcriptome reads enough for transcriptome assembly?

Posted a pubmed link recently, the authors "report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of the H. brasiliensis transcriptome. The sequencing output generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes (mean size = 436 bp, median size = 328 bp) were assembled through de novo transcriptome assembly."

Do you think such an assembly truly is useful for research? or would a higher coverage been better? 

Wednesday, 10 August 2011

Pauline Ng expects a genome analysis to cost $500.

Pauline Ng is planning open source, open access analytics for the genomes to come.
By Allison Proffitt
August 2, 2011 | SINGAPORE—Pauline Ng’s office is the Genome building of the Biopolis science park in Singapore, a fitting home for one of the authors of the first published personal genome, that of J. Craig Venter, published in 2007 while Ng was a senior scientist at the J. Craig Venter Institute.
Now Ng leads an expanding group of three bioinformaticists (she’s hiring!) at the Genome Institute of Singapore (GIS). Before her stint at the Venter Institute, Ng worked for Illumina as well as the Fred Hutchinson Cancer Center in Seattle, where she wrote the powerful SIFT algorithm (, a widely used tool to predict the effect of a given amino acid substitution on protein function. 
But sequencing and analysis—today at least—cost the same. “The problem is that right now, companies like Knome are actually charging the same amount for bioinformatics as they are for sequencing. If you sequence more individuals, I’d expect the bioinformatics to go down, but it’s the same price. That means the price is double! If we can make these tools online, accessible for free or at least at cost, I think I can get it to a tenth of the cost.”
Ng plans to do the computation on the Amazon Cloud and, at today’s rates, expects a genome analysis to cost $500. She hopes that these price points will enable doctors and individuals to use genomics. “If we could say, OK, outsource [the sequencing] to these companies. You’re going to get a hard disk. Mail it to Amazon and get your results in a week.”
Ng is not promising a magic cure, and doesn’t even think that this model should be the only one. She just hopes to drive prices down and open the market. “There’s never a guarantee of an answer,” she says. “Even with the software we write, there may not be a guarantee of an answer, but at least…” she pauses and begins again, emphatically. “We can definitely give you the basic annotation and provide the tools that everyone uses. And if it doesn’t work, then you go to an expensive company that really uses the same tools as the academics but with a couple of more bells and whistles. If you try our stuff first, at least you’ve invested only $500 instead of $5,000.” 

"fresh mutations" in DNA are involved in at least half of schizophrenia cases, when there is no family history of the illness
Fresh mutations are involved in at least half of schizophrenia cases when there is no family history of the illness

A report in the journal Nature Genetics showed that "fresh mutations" in DNA are involved in at least half of schizophrenia cases, when there is no family history of the illness.

Researchers found mutations in 40 different genes.

Lead researcher Dr Maria Karayiorgou said: "The fact that the mutations are all from different genes is particularly fascinating.

"It suggests that many more mutations than we suspected may contribute to schizophrenia. This is probably because of the complexity of the neural circuits that are affected by the disease; many genes are needed for their development and function."

The report argues that this "provides a plausible explanation for both the high global incidence and the persistence of schizophrenia despite extremely variable environmental factors."

Brain may still be living a minute after decapitation

Brain may still be living a minute after decapitation
Web edition : Friday, July 22nd, 2011
Text Size
Electrodes capture a large wave of activity moving through rat brains about 50 seconds after decapitation.C. van Rijn et al/PLoS One 2011

Almost a minute after a rat's head is severed from its body, an eerie shudder of activity ripples through the animal's brain. Some researchers think this post-decapitation wave marks the border between life and death. But the phenomenon can be explained by electrical changes that, in some cases, are reversible, researchers report online July 13 in PLoS ONE.

Clinical significance is not the same as statistical significance

Clinical significance is not the same as statistical significance

A great example of why details and context always, always matter, from the surgeon/blogger at The Skeptical Scalpel:

Twelve patients who served as their own controls wore compression stockings for a week and then no stockings for a week alternating. The stockings lowered the amount of fluid in the neck by 60%, a statistically significant difference. So far, so good.

This resulted in another highly statistically significant finding, which was a 36% reduction in episodes of apnea [cessation of breathing] and hypopnea [inadequate breathing]. Sounds good, right? The problem is that the average number of episodes of apnea/hypopnea decreased from 48 per hour to 31 per hour. Patients experiencing more than 30 episodes of apnea/hypopnea per hour are classified as having severe obstructive sleep apnea. This means that the treatment only put the patients in the low range of severe obstructive sleep apnea. They still would require maximum therapy.

Via Ivan Oransky

Tuesday, 9 August 2011

Affymetrix Commercializes Next-Generation Transcriptome Array for Large-Scale Clinical Studies - MarketWatch

In typical large-scale studies of 5,000 samples, Stanford researchers estimated it would take RNA-Seq 10 times longer to analyze one percent of the number of genes processed by the new array and 20 times longer to analyze one-half percent of exons. Moreover, to achieve the same level of reproducibility as the new array, RNA-Seq would require 150 million mappable reads for genes and 200 million for exons (3711).(1) Based on this level of power and performance, the researchers concluded the Human Transcriptome Array is more reproducible, faster, and cost-effective than RNA-Seq for detecting and characterizing low-level expression changes of clinically relevant transcripts.

Monday, 8 August 2011

Braintrust: What Neuroscience Tells Us about Morality.

What can science tell us about morality?
While the slippery and subjective nature of morality makes it a troubling specimen, it remains a crucial part of our lives—and therefore a topic ripe for scientific research.
However, scientists are skilled at describing what is—the circumstances under which people are more likely to lie, for instance—which is not the same as describing how we ought to live our lives, like when it’s OK to lie. So it’s not entirely clear what scientists can offer here without overstepping their bounds.
Yet in Braintrust: What Neuroscience Tells Us about Morality, Patricia Churchland carefully leads the reader through scientific findings with implications for morality and ethics, well aware of the pitfalls and rewards she may encounter along the way. Churchland, a professor of philosophy at the University of California, San Diego, quickly informs the reader that science cannot tell us what we ought to do to be moral, but that a review of findings from psychology and biology may explain how or why we do it. Her goal is to draw on these findings to build an objective framework in which to understand human morality. 

I think the challenge is to actually link human genomics with neurochemistry. Although, I am not sure if anyone is prepared to face the ramifications of the studies. 
Full review article here

Identification and Differential Expression of MicroRNAs during Metamorphosis of the Japanese Flounder (Paralichthys olivaceus).

PLoS One. 2011;6(7):e22957. Epub 2011 Jul 27.

Identification and Differential Expression of MicroRNAs during Metamorphosis of the Japanese Flounder (Paralichthys olivaceus).

Fu Y, Shi Z, Wu M, Zhang J, Jia L, Chen X.


Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Shanghai Ocean University, Ministry of Education, Shanghai, People's Republic of China.



MicroRNAs (miRNAs) are a class of endogenous small non-coding RNAs of 20-25 nucleotides that play a key role in diverse biological processes. Japanese flounder undergo dramatic metamorphosis in their early development. The metamorphosis is characterized by morphological transformation from a bilaterally symmetrical to an asymmetrical body shape concomitant with extensive morphological and physiological remodeling of organs. So far, only a few miRNAs have been identified in fish and there are very few reports about the Japanese flounder miRNA.


Solexa sequencing technology was used to perform high throughput sequencing of the small RNA library from the metamorphic period of Japanese flounder. Subsequently, aligning these sequencing data with metazoan known miRNAs, we characterized 140 conserved miRNAs and 57 miRNA: miRNA* pairs from the small RNA library. Among these 57 miRNA: miRNA* pairs, twenty flounder miRNA precursors were amplified from genomic DNA. We also demonstrated evolutionary conservation of Japanese flounder miRNAs and miRNA* in the animal evolution process. Using miRNA microarrays, we identified 66 differentially expressed miRNAs at two metamorphic stages (17 and 29 days post hatching) of Japanese flounder. The results show that miRNAs might play a key role in regulating gene expression during Japanese flounder metamorphosis.


We identified a large number of miRNAs during flounder metamorphosis, some of which are differentially expressed at two different metamorphic stages. The study provides an opportunity for further understanding of miRNA function in the regulation of flounder metamorphosis and gives us clues for further studies of the mechanisms of metamorphosis in Japanese flounder.

[PubMed - in process]

Thursday, 4 August 2011

Pubmed: RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis.

1. RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis.
Xia Z, Xu H, Zhai J, Li D, Luo H, He C, Huang X.
Plant Mol Biol. 2011 Aug 3. [Epub ahead of print]
PMID: 21811850 [PubMed - as supplied by publisher]

Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources/Institute of BioScience and Technology, College of Agriculture, Hainan University, Haikou, 570228, People's Republic of China.


Hevea brasiliensis, being the only source of commercial natural rubber, is an extremely economically important crop. In an effort to facilitate biological, biochemical and molecular research in rubber biosynthesis, here we report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of the H. brasiliensis transcriptome. The sequencing output generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes (mean size = 436 bp, median size = 328 bp) were assembled through de novo transcriptome assembly. Out of 13,807 H. brasiliensis cDNA sequences deposited in Genbank of the National Center for Biotechnology Information (NCBI) (as of Feb 2011), 11,746 sequences (84.5%) could be matched with the assembled unigenes through nucleotide BLAST. The assembled sequences were annotated with gene descriptions, Gene Ontology (GO) and Clusters of Orthologous Group (COG) terms. In all, 37,432 unigenes were successfully annotated, of which 24,545 (65.5%) aligned to Ricinus communis proteins. Furthermore, the annotated uingenes were functionally classified according to the GO, COG and Kyoto Encyclopedia of Genes and Genomes databases. Our data provides the most comprehensive sequence resource available for the study of rubber trees as well as demonstrates effective use of Illumina sequencing and de novo transcriptome assembly in a species lacking genomic information.

Wednesday, 3 August 2011

SOLiD™ Sequencing - single cancer cell Life Tech Grand Challenge

SOLiD™ Sequencing - single cancer cell

Next Generation Sequencing, (NGS) technology has allowed scientific researchers to exponentially reduce both the cost of sequencing and the amount of sample / nucleic acids being interrogated. This technological revolution has allowed researchers to propose and execute experiments at the single cell level that could not be attempted before.   Along this journey researchers have gained insights into cell regulation and repair mechanisms, along with cellular responses when processes are disrupted.

Although scientists have successfully sequenced the entire transcriptome of a single murine cell using the SOLiD™ System, as documented in the May 7, 2010 issue of Cell Stem Cell, they have yet to sequence the entire genome of one cell.  The SOLiD Single Human Cancer Cell Grand Challenge is asking researchers to do just that – sequence the genome and all RNA content derived from a single cancer cell using the 5500 Series SOLiD™ Sequencers

Successful achievement of this latest Grand Challenge will, therefore, double what is currently possible by sequencing both the entire genome and all RNA, including mRNA, microRNAs and other types of RNA molecules expressed in a single cancer cell, using the SOLiD System.  Results must be validated using alternative techniques, such as capillary electrophoresis sequencing and quantitative PCR.

The output of the SOLiD Single Human Cancer Cell Grand Challenge should be a detailed protocol that accomplishes the following:

  • Isolates a single cell from a human solid or liquid tumor (i.e. not cell culture)
  • Extracts genomic DNA and total RNA from that single cancer cell
  • Sequences the genomic DNA in a single run using either the 5500 Series SOLiD™ System or the SOLiD™ 4 System
  • Analyzes the genomic DNA data using LifeScope™ Genetic Analysis Software in conjunction with other analysis & visualization software
  • Sequences the total RNA in a single run using either the 5500 Series SOLiD™ System or the SOLiD™ 4 System
  • Analyzes the total RNA data using LifeScope™ Genetic Analysis Software in conjunction with other analysis & visualization software

Cancer accounts for nearly one out of every four deaths in the U.S., as reported by the American Cancer Society.  Variation in DNA and RNA sequence between tumor cells can dramatically affect how individual cells respond to therapies. Life Technologies is proud to be at the forefront of the genomics revolution, and is pleased to support scientific progress in cancer research via the SOLiD Single Human Cancer Cell Grand Challenge.

True single-molecule DNA sequencing of a Pleistocene horse bone.

PubMed: True single-molecule DNA sequencing of a Pleistocene horse bone.

Syndicated from PubMed RSS Feeds

True single-molecule DNA sequencing of a Pleistocene horse bone.

Genome Res. 2011 Jul 29;

Authors: Orlando L, Ginolhac A, Raghavan M, Vilstrup J, Rasmussen M, Magnussen K, Steinmann K, Kapranov P, Thompson JF, Zazula G, Froese D, Moltke I, Shapiro B, Hofreiter M, Al-Rasheid KA, Gilbert MT, Willerslev E

Second-generation sequencing platforms have revolutionized the field of ancient DNA, opening access to complete genomes of past individuals and extinct species. However, these platforms are dependent on library construction and amplification steps that may result in sequences that do not reflect the original DNA template composition. This is particularly true for ancient DNA, where templates have undergone extensive damage post-mortem. Here, we report the results of the first "true single molecule sequencing" of ancient DNA. We generated 115.9Mb and 76.9Mb of DNA sequences from a permafrost-preserved Pleistocene horse bone using the Helicos HeliScope and Illumina GAIIx platforms, respectively. We find that the percentage of endogenous DNA sequences derived from the horse is higher among the Helicos data than Illumina data. This result indicates that the molecular biology tools used to generate sequencing libraries of ancient DNA molecules as required for second-generation sequencing introduce biases into the data, that reduce the efficiency of the sequencing process and limit our ability to fully explore the molecular complexity of ancient DNA extracts. We demonstrate that simple modifications to the standard Helicos DNA template preparation protocol further increase the proportion of horse DNA for this sample by 3-fold. Comparison of Helicos-specific biases and sequence errors in modern DNA with those in ancient DNA also reveals extensive cytosine deamination damage at the 3' ends of ancient templates, indicating the presence of 3'-sequence overhangs. Our results suggest that paleogenomes could be sequenced in an unprecedented manner by combining current second- and third- generation sequencing approaches.

PMID: 21803858 [PubMed - as supplied by publisher]


Genetic key found to the Elephant Man's condition, a century later.

Genetic key found to the Elephant Man's condition

A century after the death of the Elephant Man, whose hideous disfigurement turned him into a medical curiosity, scientists believe they may have solved the puzzle of what caused his condition.

American researchers have identified a genetic mutation that causes Proteus syndrome, a rare disorder in which tissue and bone grow massively out of proportion. About 500 people are known to to be living with the condition, which stems from a spontaneous mutation in the embryo during pregnancy. Sufferers endure gross deformity, typically to their head, hands and feet, and become excluded from society and lead isolated lives as a result.

Scientists from the National Institutes of Health in Washington who made the discovery say they hope it will lead to the development of treatments for the condition. It could also yield agents effective against cancer, which is caused by the overgrowth of cells in different regions of the body.

Now they plan to test DNA from the skeleton of Joseph Merrick, who briefly gained celebrity and earned his living by being displayed across England and Europe, to establish whether Proteus syndrome was the cause of his deformity. His story gained wide public attention in 1980 through the play and film The Elephant Man. Mr Merrick's abnormalities appeared early in childhood in the form of thick and lumpy skin, enlarged lips and a bony protuberance on his forehead. One of his arms and both feet became enlarged and by adulthood he had to sleep sitting up because of the size and weight of his head.

Tuesday, 2 August 2011

[BioRuby] rq: Zero configuration job scheduler for computer clusters and multi-core

Spotted this 'gem' in the bioruby mailling list.. so sharing it here.

---------- Forwarded message ----------
From: Pjotr Prins <>
Date: Sun, Jul 24, 2011 at 7:09 PM
Subject: [BioRuby] rq: Zero configuration job scheduler for computer clusters and multi-core

Use those cores!

I just created a functional gem for rq, a job scheduler created by
Ara Howard. It allows running jobs in parallel, without any
configuration (just a shared directory between processes/machines).


 install rq using rubygems, after installing sqlite 2.x (dev version,
 on Debian apt-get install libsqlite0-dev)

   gem1.8 install rq-ruby1.8

 the binary is in /var/lib/gems/1.8/bin/, so add that to the path,
 or create a symbolic link

   ln -sf `gem1.8 contents rq-ruby1.8|grep bin/rq$` /usr/local/bin/rq

 now rq should work

   rq --help

 run the integration test


 set up a directory for your queue - this can be a local, or an NFS/sshfs
 mounted drive:

   rq dir create

 on every node create a queue runner, specifying the number of cores (here 8)

   rq dir feed --daemon --log=rq.log --max_feed=8

 submit two jobs - shell style

   rq dir submit 'sleep 10'
   rq dir submit 'sleep 9'

 check status

   rq dir status


         pending: 0
         holding: 0
         running: 2
         finished: 0
         dead: 0
         total: 2
           min: {2: 00h00m03.49s}
           max: {1: 00h00m03.60s}
         avg_time_per_job: 00h00m00.00s
           1: 0
           12: 0
           24: 0
         successes: 0
         failures: 0
         ok: 0

 Now, that was easy!!

rq will be a standard feature of the BioLinux VMs.

I also ported the code to Ruby1.9 - it is sitting in the ruby1.9 branch on github:

it would be good if others were to test that in some production setup,
before I push that to rubygems.

BioRuby Project -
BioRuby mailing list

Nick Loman blogs PGM 316 1st Impressions

 While eagerly waiting for our own PGM to be installed for 316 chip support, I came across Nick's review on their own 316 run. Which has pretty impressive output! 

"Our first two runs of 316 chips yielded an impressive 251Mb and 209Mb respectively! Mean read length was about 110bp."

Other interesting factoids for the impatient 

we've loaded the chips way higher than we are used to with the 314 – densities of 76-82%

This reflects a change to the protocol – when we were running 314 chips we were told to load fewer beads to get better coverage – and from our trials when we loaded at 41, 43 and 46% density on the 314 chip the 41% run did do best. The 314 chip has about 1.2m wells, so we were filling about 550k wells. About two-thirds of those wells were live spheres (meaning they have DNA on them) and out of those about two-thirds pass the quality filter – about 200k reads in all (~20Mb data).

The 316 chip has 6.3m wells and we're filling about 5m of these. A little under half are passing the quality filters, meaning we're getting about 2.25m reads.


Do hop on over to the original post to see FASTQC plots of the reads

Ion Torrent 316 First Impressions

Datanami, Woe be me