Showing posts with label news. Show all posts

Tuesday, 1 March 2011

NCBI To Discontinue Sequence Read Archive and Peptidome

About SRA

Due to budget constraints, NCBI will be discontinuing its Sequence Read Archive (SRA) and Trace Archive repositories for high-throughput sequence data. Closure of the databases will occur in phases. SRA and Trace will stop accepting some types of submissions in the coming weeks, and all submissions within the next 12 months. Over the next several months, NCBI will be working with staff from NIH Institutes that fund large-scale sequencing efforts to develop an approach for future access to and storage of the existing data. NCBI will continue to support and develop information resources for biological data derived from next-generation sequencing such as genotypes, common variations, rare variations, sequence assemblies and gene expression data. We therefore encourage the research community to continue submissions of these data to the applicable databases, including:

RNA-Seq and epigenomic data to GEO
Variants, genotypes, phased haplotypes, and polymorphisms to dbVar, dbGaP and dbSNP
Genomic assemblies to GenBank/WGS
Transcript assemblies to GenBank/TSA
16S ribosomal RNA and other targeted locus survey assemblies to GenBank

NCBI expects new applications will continue to emerge for next generation technology. We are excited to work with the community to develop strategies for archiving other summary experimental measures that are informative, efficient, and valuable to the biomedical research community.
For further information about submissions, contact NCBI's Help Desk.

About Peptidome

Due to budgetary constraints NCBI will be discontinuing the Peptidome Repository. Over the next few weeks, we will phase out the online browser, query, and display interfaces.
All existing data and metadata files will continue to be made available from our ftp server (ftp://ftp.ncbi.nih.gov/pub/peptidome/) indefinitely. Those files are named according to their Peptidome accession numbers, allowing cited data to still be identified and downloaded. Furthermore, we will endeavor to deposit all Peptidome data in a different public mass spectrometry repository; information about this effort will follow soon.
For those datasets that have been accessioned, but have not yet been made public, submitters have the option of withdrawing the data now and moving it to another repository. If we retain the data, it will move to the Peptidome FTP site on the date at which it is currently designated to go public.
If you have any specific questions, please feel free to contact us at peptidome@ncbi.nlm.nih.gov.

Thursday, 24 February 2011

Maybe we have to sequence everybody! Every fish! BGI Cloud

Bio-IT world ran an interesting article with this quote

“The data are growing so fast, the biologists have no idea how to handle this data,” says Li. “I think the Cloud will be the solution. We have to sequence more and more data. Maybe we have to sequence everybody! Every fish! The data keep growing and we need a lot of compute power to process.”
For Chen, there are three priorities for BGI Cloud:

Connectivity: With partners across China and the world, “we’ve connected all the people and resources—the sequencers, the samples, the ideas, the compute power, and the storage together to make a greater contribution.”
Scalability: Calling the explosion in next-gen sequencing (NGS) a “data tsunami,” Chen says BGI aims to provide the parallel computing resources to help users manage and process these datasets. “If you can’t do the analysis, it’s pointless. We use distributed computing technology in the bioinformatics area. We’re confident we can solve the scalability problem.”
Reproducibility: Chen says bioinformatics researchers are happy to show their data and their pet program—SOAP, BWA, and so on. “That’s fine. But analysis is very complicated. The methodology he is actually using is a homemade pipeline. It’s very difficult to reproduce that result. We built this platform not only to solve the capability and connectivity of computing, we want to resolve the problems in reproducing designs and procedures.”

With new NGS gene assembly and SNP calling programs such as Hecate and Gaea about to be released (see, “In the Name of Gods”), Li says it was essential to develop a “run-time environment, a Web-based platform for Cloud storage and reference data, with a feature-rich GUI, and effective bioinformatics analysis software.”

Kevin: It would be interesting to see how Amazon and other cloud providers together with Galaxy (usegalaxy.org) will take to BGI's offering to produce reproducible data analysis. (commercial software providers aside). Also their offering comes at a strange time when NCBI is discontinuing SRA. Might BGI cloud fill up the void where SRA left?

Everyone is trying to come up with a 'standard' workflow that everyone will adopt but I feel that the ecology of bioinformatics is that there's always another 'better' way to tweak that analysis. Custom analysis is a pet phrase of a lot of bench biologists.

Every bioinformatician will know and remember their treasure trove of throw away scripts that worked beautifully but only once for that set of data.

Friday, 18 February 2011

It's official:NCBI to discontinue SRA

What started as floating rumours in AGBT and posts by various bloggers. Now it's in NCBI news.http://www.ncbi.nlm.nih.gov/About/news/16feb2011

Saturday, 15 January 2011

DNAvision offers Human WGS for 7,500 euros

I had imagined exome sequencing would still have a good run for the next 2-3 years but seeing how commercial service providers are throwing caution to the wind and offering Whole genome sequencing at ever decreasing costs, I think many will soon revert to WGS instead. Exome sequencing kits will have a lot to catch up in terms of price and useful data if they are to match up with quickly plummeting WGS prices.
DNAVision to Offer $10K Human Genome Sequencing Services; Purchases Four SOLiDs
Landed: First Illumina HiSeq Machines Advertised (By Nick Loman on February 10, 2010)

Thursday, 13 January 2011

The $1,000 Genome Debate is 'Already ... Irrelevant'

excerpted from GenomeWeb
Matthew Herper and Daniel MacArthur are at odds over the $1,000 genome. Forbes' Herper argues that even though sequencing is becoming cheaper, analyzing a genome still costs much more than $1,000. Over at Genetic Future, MacArthur responds that as sequencing costs continue to fall, "a substantial niche will develop for innovators providing affordable, intuitive, accurate interpretation tools."

My thoughts on this later..

Monday, 20 December 2010

As R&D Budgets Shrink and Data Grows, Bioinformatics Service Providers Could Gain in Popularity

After this article in Nature Singapore's Salad days are over. There's another article in GenomeWeb that talks about shrinking R&D budgets and bioinformatics outsourcing. Are labs worldwide facing a cut in budgets? Or is it just year end panic? Hmmmm

Friday, 3 December 2010

When Playing games is working (if you are biologist that is)

Check out this flash game Phylo
If you are thinking it's related to phylogenetics then Bingo.. Kudos for excellent idea and excellent graphics and interface but wished they had a better name and less verbose introduction for laymen.
waiting eagerly for the iphone/ipod version to be out..

from http://phylo.cs.mcgill.ca/eng/about.html

What's Phylo all about?
Though it may appear to be just a game, Phylo is actually a framework for harnessing the computing power of mankind to solve a common problem; Multiple Sequence Alignments.

What is a Multiple Sequence Alignment? A sequence alignment is a way of arranging the sequences of D.N.A, R.N.A or protein to identify regions of similarity. These similarities may be consequences of functional, structural, or evolutionary relationships between the sequences.
From such an alignment, biologists may infer shared evolutionary origins, identify functionally important sites, and illustrate mutation events. More importantly, biologists can trace the source of certain genetic diseases.

The Problem Traditionally, multiple sequence alignment algorithms use computationally complex heuristics to align the sequences.
Unfortunately, the use of heuristics do not guarantee global optimization as it would be prohibitively computationally expensive to achieve an optimal alignment. This is due in part to the sheer size of the genome, which consists of roughly three billion base pairs, and the increasing computational complexity resulting from each additional sequence in an alignment.

Our Approach Humans have evolved to recognize patterns and solve visual problems efficiently.
By abstracting multiple sequence alignment to manipulating patterns consisting of coloured shapes, we have adapted the problem to benefit from human capabilities.
By taking data which has already been aligned by a heuristic algorithm, we allow the user to optimize where the algorithm may have failed.

The Data All alignments were generously made available through UCSC Genome Browser.
Infact, all alignments contain sections of human DNA which have been speculated to be linked to various genetic disorders, such as breast cancer.
Every alignment is received, analyzed, and stored in a database, where it will eventually be re-introduced back into the global alignment as an optimization.

Tuesday, 30 November 2010

Card Trick Leads to New Bound on Data Compression - Technology Review

http://www.technologyreview.com/blog/arxiv/26078/?ref=rss
Excerpted...
Here's a card trick to impress your friends. Give a deck of cards to a pal and ask him or her to cut the deck, draw six cards and list their colours. You then immediately name the cards that have been drawn.
Magic? Not quite. Instead, it's the next best thing: mathematics. The key is to arrange the deck in advance so that the sequence of the card colours follows a specific pattern called a binary De Bruijn cycle. A De Bruijn sequence is a set from an alphabet in which every possible subsequence appears exactly once.
So when a deck of cards meets this criteria, it uniquely defines any sequences of six consecutive cards. All you have to do to perform the trick is memorise the sequences.
Usually these kinds of tricks come about as the result of some new development in mathematical thinking. Today, Travis Gagie from the University of Chile in Santiago turns the tables. He says that this trick has led him to a new mathematical bound on data compression....

Neat!! I love how maths integrates with life..
wonder how would this be used 5 years down..

The actual paper is here
Ref: arxiv.org/abs/1011.4609: Bounds from a Card Trick

Tuesday, 16 November 2010

Uniqueome a uniquely ... omics word

Spotted this post on the Tree of Life blog

Another good paper, but bad omics word of the day: uniqueome

From "The Uniqueome: A mappability resource for short-tag sequencing

Ryan Koehler, Hadar Issac , Nicole Cloonan,*, and Sean M. Grimmond." Bioinformatics (2010) doi: 10.1093/bioinformatics

Paper does look interesting though!

Summary: Quantification applications of short-tag sequencing data (such as CNVseq and RNAseq) depend on knowing the uniqueness of specific genomic regions at a given threshold of error. Here we present the “uniqueome”, a genomic resource for understanding the uniquely mappable proportion of genomic sequences. Pre-computed data is available for human, mouse, fly, and worm genomes in both color-space and nucletotide-space, and we demonstrate the utility of this resource as applied to the quantification of RNAseq data.

Availability: Files, scripts, and supplementary data is available from http://grimmond.imb.uq.edu.au/uniqueome/; the ISAS uniqueome aligner is freely available from http://www.imagenix.com/

Pending release of Contrail, Hadoop de novo assembler?

Jermdemo on Twitter

Just noticed the source code for Contrail, the first Hadoop based de-novo assembler, has been uploaded http://bit.ly/96pSbw 26 days ago

Oh the suspense!

Exome Sequencing hints at sporadic mutations as cause for mental retardation.

1st spotted off Genomeweb
NEW YORK (GenomeWeb News) – De novo mutations that spring up in children but are absent in their parents are likely the culprit in many unexplained cases of mental retardation, according to a Dutch team.
Using exome sequencing in 10 parent-child trios, the researchers found nine non-synonymous mutations in children with mental retardation that were not found in their parents, including half a dozen mutations that appear to be pathogenic. The research, which appeared online yesterday in Nature Genetics, hints at an under-appreciated role for sporadic mutations in mental retardation — and underscores the notion that mental retardation can stem from changes in a wide variety of genes.

I think it's fascinating to find so many new mutations and changes in DNA that may affect one's quality of life, simply by sequencing the coding regions (and not all of it if I may add). This paper is fascinating as it raises the question if deleterious sporadic mutations are unlikely culprits for a whole variety of diseases that have a genetic risk.
it is certainly more likely that such an event will occur in coding regions but I do not doubt that for some diseases, perhaps the non-coding regions (that play a regulatory role) might have the same effect. If it was a clear cut mutation that results in a dysfunctional protein, and there's no redundancy in the system, it is likely the system will crash. whereas, if it was changes in the expression levels, it might lead to a slightly wobbly system that just doesn't function as well.

While everyone else is waiting for Whole Genome Sequencing to drop in price. There are groups already publishing with exome data. I think in 6 months time, we will see more WGS papers coming up... It's an exciting time for Genomics science!

See the full paper below

A de novo paradigm for mental retardation Nature Genetics | Letter

Thursday, 11 November 2010

Bioscope 1.3 is a whopping 6.6 Gb!! Officially released for download

Downloading v 1.3 now. Gosh it is a whopping 6.6 Gb download.(270 Mb for v 1.21)

Not sure where the bloat comes from. Guessing it's example data, hope the server doesn't crash under the load.

btw reason no. 5 for using Bioscope v 1.3 sounds quite flaky...

UPDATE: Argh. the md5sums match my download but I got this error
error [4462069.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)

UPDATE2: Finally unzipped the 6.6 Gb file in an xp box using 7zip (apparently linux zip is finicky for files > 4 Gb. )
Guess what's inside? tarred zip files. Oh what fun to transfer them back to a linux box!
BioScope-1.3-9.tar.gz           (Regular, application/x-compressed-tar) size 217743781 mode 0744
BioScope-1.3.rBS130-51653_20101021190735.examples.tar.gz                (Regular, application/x-compressed-tar) size 4206422209 mode 0744
BS130-resources.tar.gz          (Regular, application/x-compressed-tar) size 2632156337 mode 0744

UPDATE 2:
ABI has updated the downloads to a more reasonable
208Mb Nov 25 04:24 bioscope1.3.1installer_4464106.tar.gz
md5 checksum is b688a8ae7b620d7b2dc7f68c6ca41783

Dear Valued Customer,

It is with great pleasure and excitement that I announce the release and immediate availability of BioScope v1.3

BioScope, the modular SOLiD™ data analysis bioinformatics tool, is designed specifically to optimize the accuracy of your SOLiD™ colorspace data. In addition to streamlining the construction and maintenance of your SOLiD™ pipelines, BioScope provides a simple web interface allowing non command line users the power of running sophisticated NGS data analysis.

SOLiD™ BioScope provides workflow applications including:

Improved MaxMapper Mapping and Pairing
BFAST integration
Improved SAET Accuracy Enhancement
Resequencing Pipelines

SNP/diBayes
Inversion
CNV
Small Indel
Large Indel

Whole Transcriptome
Fusion Transcript and Splicing Detection
Target Resequencing
Support for ChIPSeq
Support for Methyl Miner
Annotation and Reporting
Improved BAM file compatibility
Improved BioScope™ Users Guide

Additional details can be found at the following blog:

http://solid.community.appliedbiosystems.com/community/about_solid/blog/2010/10/25/the-top-5-reasons-to-use-solid-bioscope-software-13

Also attached is an in-depth article about our new Target Resequencing pipeline in BioScope™.

Please coordinate with your IT admin, bioinformatician, lab manager, and PI to have BioScope v1.3 installed at your site.

To get your free copy of SOLiDBioScope please go to:

http://solidsoftwaretools.com/gf/project/bioscope

Please ensure that you have an activated account on solidsoftwaretools.com Rupert.Yip@lifetech.com before downloading. If you have problems downloading, please contact

If this is your first time installing BioScope, we strongly recommend working with the BioScope software installation team to ensure a proper installation and configuration of BioScope. Please contact Rupert.Yip@lifetech.com to inquire about our free BioScope software installation services.

For information BioScope training please contact your local bioinformatics FAS or go to http://learn.appliedbiosystems.com/solid

Monday, 8 November 2010

At ASHG, Ion Torrent Drums Up Interest; Provides Preliminary Specs for PGM

WASHINGTON, DC – Ion Torrent revealed some preliminary specs for its Personal Genome Sequencer, due to be launched later this year, as the Life Technologies business unit presented the instrument to potential customers at its booth at the American Society for Human Genetics meeting this week. The speed of the instrument — a run takes approximately two hours, and several runs can be performed in a day — is what appears to be most attractive to potential customers, Maneesh Jain, Ion Torrent's vice president of marketing and business development, told In Sequence.
The first version of the PGM will sell for $49,500, plus a $16,500 server to analyze the data.
Initially, the machine will produce about 10 megabases of data per run, or about 100,000 reads of 100 base pairs each, using the so-called 314 chip, which has about 1.5 million wells and will cost $250. Reagent kits for template preparation, library preparation, and sequencing will cost another $250, bringing the total consumables cost per run to approximately $500.
In the first half of 2011, Ion Torrent plans to launch the 316 chip, with about 6 million wells, which will increase the output per run to 100 megabases and which will cost about twice as much as the 314. Additional chip upgrades will follow, with details to be revealed next year.
Sample prep, which Jain said takes about a day and can be done in batches of six to eight samples, requires an emulsion PCR protocol, which will be simplified over time. "We focused on the sequencing initially," he said, adding that the next step will be to optimize the sample prep. Life Technologies said previously that sample prep for the PGM would eventually be able to use the EZ Bead system, which was originally developed for the SOLiD system.
Read full article here

Wednesday, 3 November 2010

Life Technologies Launches New SOLiD Sequencer to Drive Advances in Cancer Biology and Genetic Disease Research

It's official! The web is crawling with the news reports. Read their press release here
My previous coverage on the preview launch is here
There's a discussion in the seqanswers forum on the new machine.

The Life Tech cmsXXX.pdfs with the useful specs are out too. you can google them or search on the website
The specs
solid.appliedbiosystems.com/solid5500

Monday, 1 November 2010

NIH has 4 gene patents!

My voracious reading has led me to a blog post describing the recent events on gene patents
interesting snippets include
NIH holding 4 gene patents (hmmm I wonder what are those.. )

For years, the U.S. Patent Office has taken the position that extracted genes, or “isolated DNA,” can be patented. And, in fact, it has issued thousands of patents on human genes, with perhaps one of every five human genes now under patent. Patent rights to a gene, of course, give the owner the exclusive right to study, test and experiment on the gene to see how its natural characteristics work.
It has been more than 20 years since the Patnet Office began approving patents for human genes in the form of “isolated” DNA. Prior to that, the Office had issued patents for synthetic DNA, but then moved on to grant monopoly rights on the natural material when extracted directly from the body and not modified. The Obama Administration, in the brief it filed late Fridiay in the Federal Circuit, is not challenging patents on synthetic DNA, or on the process of extracting DNA, but only on unmodified genes themselves.
The Patent Office’s long-running approach to genetic patents was challenged in a lawsuit filed in May 2009 by the American Civil Liberties Union and the Public Patent Foundation, contending that locking up genes in the monopoly rights of a patent would inhibit research by other scientists on diseases that might be flagged by the coding or mutations of the genes. The lawsuit targeted both the Patent Office and Myriad Genetics, specifically because of patents that company was issued on human genes that have been labeled “BRCA1″ and “BRCA2.’ Mutations of those two genes are associted with significantly higher risks of breast cancer and ovarian cancer. .....

"U.S. government is actually the co-owner of four of the seven patents that are involved in the case. It has granted Myriad an exclusive license under those patents — contrary, it said, to NIH’s usual practice of not granting exclusive licenses under DNA patents for “diagnostic applications.” In the past, NIH and other government agencies have sought and obtained patents for human genes in the form of “isolated genomic DNA,” according to the brief.

The brief did not say which claims under the four patents co-owned by NIH would be invalid under its theory of patentability."

Thursday, 28 October 2010

1000 Genomes Pilot Paper Published

27 OCTOBER 2010

The 1000 Genomes Project Consortium has published the results of the pilot project analysis in the journal Nature in an article appearing on line today. The paper A map of human genome variation from population-scale sequencing is available from the Nature web site and is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share Alike licence to ensure wide distribution. Please share our paper appropriately.

Tuesday, 26 October 2010

Throwing the baby out with the bathwater:Non-Synonymous and Synonymous Coding SNPs Show Similar Likelihood and Effect Size of Human Disease Association

I was literally having a 'oh shoot' moment when i saw this news in GenomeWeb

Synonymous SNPs Shouldn't Be Discounted in Disease, Study Finds

NEW YORK (GenomeWeb News) – Synonymous SNPs that don't change the amino acid sequence encoded by a gene appear just as likely to influence human disease as non-synonymous SNPs that do, according to a paper appearing online recently in PLoS ONE by researchers from Stanford University and the Lucile Packard Children's Hospital.

from the abstract of the paper

The enrichment of disease-associated SNPs around the 80^th base in the first introns might provide an effective way to prioritize intronic SNPs for functional studies. We further found that the likelihood of disease association was positively associated with the effect size across different types of SNPs, and SNPs in the 3′untranslated regions, such as the microRNA binding sites, might be under-investigated. Our results suggest that sSNPs are just as likely to be involved in disease mechanisms, so we recommend that sSNPs discovered from GWAS should also be examined with functional studies.

Hmmmm how is this going to affect your carefully crafted pipeline now?

Monday, 25 October 2010

AB on Ion Torrent

There was a brief mention of the Ion Torrent at the 5500 presentation as well but nothing of great significance. I do wish they marketing fellows will push ion torrent out faster but i think they are trying to streamline production by testing if invitrogen kits can replace the ones at ion torrent. I do hope they do not sacrifice compatibility over performance.

For Bioscope, they are going to include base space support (hurray?) presumably so that they can use the same pipeline for analysis of their SMS and Ion Torrent technologies.

Stay Tuned!

AB releases 4 HQ and PI as 5500xl and 5500 SOLiD

Was lucky to be part of the 1st group to view the specs and info on the new SOLiD 4 hq.
For reasons unknown,
They have renamed it to 5500XL and 5500 solid system which is your familiar 4 HQ and PI
Or if you prefer formulas.
5500xl = 4 hq
5500 = PI

One can only fathom their obession with these 4 digits judging by similar instruments named
AB Sciex Triple Quad 5500 and the AB Sciex QTrap 5500

Honestly the 5500 numbers are of no numerical significance AFAIK.

outlook wise both looks like the PI system
I DO NOT see the computer cluster anymore, that's something I am curious about.

Finally we are at 75 bp though.
Of notable importance, there is a new Exact Call Chemistry module (ECC) which promises 99.99% accuracy which is optional as it increases the run time.
the new solid system is co-developed with the Hitachi-Hi Technologies.
Instead of the familiar slides, they use 'flowchips' now. with 6 individual lanes to allow for more mixing of samples of different reads.
for the 5500xl
throughput per day is 20-30 Gb
per run you have 180 Gb or 2.8 B tags (paired ends or mate pairs)

Contrary to most rumours, 5500xl is upgradeable from SOLiD 4 although I suspect it is a trade in program. No mention about the 5500 (which i guess is basically a downgrade).

The specs should be up soon
solid.appliedbiosystems.com/solid5500

Update from seqanswers from truthseqr
http://seqanswers.com/forums/showthread.php?t=6761&goto=newpost

Here is the message that has just been posted:
***************
AB is premiering two new instruments at ASHG next week.

Mobile ASHG calendar: http://m.appliedbiosystems.com/ashg/ (http://solid.community.appliedbiosystems.com/)

Twitter account: @SOLiDSequencing (http://twitter.com/SOLiDSequencing)

SOLiD Community: http://solid.community.appliedbiosystems.com/

More info soon at: solid.appliedbiosystems.com/solid5500/ (http://solid.appliedbiosystems.com/solid5500)

Tuesday, 19 October 2010

After a Decade, JGI Retires the Last of Its Sanger Sequencers

After a Decade, JGI Retires the Last of Its Sanger Sequencers

Time and tide waits for no man ...