Showing posts with label BGI. Show all posts
Showing posts with label BGI. Show all posts

Thursday, 7 July 2011

BGI Announces Cloud Genome Assembly Service

I am very excited about cloud solutions for de novo assembly as they are quite computational intensive and with parameters tweaking, you have a massive parallelization  problem that just begs for computer cores. I do wonder if there's a need for a cloud solution for resequencing pipelines, especially when it involves BWA which can be run rather efficiently on desktop or in house clusters. Only whole genome reseq might require more compute hours, but I would think that any center that does WGS regularly would at least have a genome reseq capable cluster at the very minimum to just store the data before it is analyzed.

Anyway let's see if BGI will change the computational cloud scene ...


By Allison Proffitt 
July 6, 2011 | SHENZHEN, CHINA—At the BGI Bioinformatics Software Release Conference today, researchers announced two new Cloud-based software-as-a-service offerings for next-gen data analysis. Hecate and Gaea (named for Greek gods) are “flexible computing” solutions for do novo assembly and genome resequencing.  
These are “cloud-based services for genetic researchers” so that researchers don’t need to “purchase your own cloud clusters,” said Evan Xiang, part of the flexible computing group at BGI Shenzhen. Hecate will do de novo assembly, and Gaea will run the SOAP2, BWA, Samtools, DIndel, and BGI’s realSFS algorithms. Xiang expects an updated version of Gaea to be released later this year with more algorithms available.  .......full article

Thursday, 24 February 2011

Maybe we have to sequence everybody! Every fish! BGI Cloud

Bio-IT world ran an interesting article with this quote


“The data are growing so fast, the biologists have no idea how to handle this data,” says Li. “I think the Cloud will be the solution. We have to sequence more and more data. Maybe we have to sequence everybody! Every fish! The data keep growing and we need a lot of compute power to process.”
For Chen, there are three priorities for BGI Cloud:
  • Connectivity: With partners across China and the world, “we’ve connected all the people and resources—the sequencers, the samples, the ideas, the compute power, and the storage together to make a greater contribution.”
  • Scalability: Calling the explosion in next-gen sequencing (NGS) a “data tsunami,” Chen says BGI aims to provide the parallel computing resources to help users manage and process these datasets. “If you can’t do the analysis, it’s pointless. We use distributed computing technology in the bioinformatics area. We’re confident we can solve the scalability problem.”
  • Reproducibility: Chen says bioinformatics researchers are happy to show their data and their pet program—SOAP, BWA, and so on. “That’s fine. But analysis is very complicated. The methodology he is actually using is a homemade pipeline. It’s very difficult to reproduce that result. We built this platform not only to solve the capability and connectivity of computing, we want to resolve the problems in reproducing designs and procedures.”
With new NGS gene assembly and SNP calling programs such as Hecate and Gaea about to be released (see, “In the Name of Gods”), Li says it was essential to develop a “run-time environment, a Web-based platform for Cloud storage and reference data, with a feature-rich GUI, and effective bioinformatics analysis software.”


Kevin: It would be interesting to see how Amazon and other cloud providers together with Galaxy (usegalaxy.org) will take to BGI's offering to produce reproducible data analysis. (commercial software providers aside). Also their offering comes at a strange time when NCBI is discontinuing SRA. Might BGI cloud fill up the void where SRA left? 
Everyone is trying to come up with a 'standard' workflow that everyone will adopt but I feel that the ecology of bioinformatics is that there's always another 'better' way to tweak that analysis. Custom analysis is a pet phrase of a lot of bench biologists. 
Every bioinformatician will know and remember their treasure trove of throw away scripts that worked beautifully but only once for that set of data. 

Datanami, Woe be me