The 60 Genomes dataset can be found here, as part of the public data that Bionimbus makes available to researchers. With the Bionimbus Community Cloud, the data is available via both the commodity Internet, as well as via high performance research networks, such as the National LambdaRail and Internet2.
The genomes in the dataset have on average more than 55x mapped read coverage, and the sequencing of these 60 genomes generated more than 12.2 terabases (Tb) of total mapped reads. This dataset will complement other publicly available whole genome data sets, such as the 1000 Genomes Project's recent publication of six high-coverage and 179 low-coverage human genomes. Forty of the sixty genomes are available now and the remainder will be available at the end of March.
The 60 genomes included in this dataset were drawn from two resources housed at the Coriell Institute for Medical Research: the National Institute of General Medical Sciences (NIGMS) Human Genetic Repository and the NHGRI Sample Repository for Human Genetic Research. Included in the release is a 17-member, three-generation CEPH pedigree from the NIGMS Repository and ethnically diverse samples from the NHGRI Repository that represent nine different populations. The samples selected are unrelated, with the exception of the three-generation CEPH pedigree, a Yoruba trio and a Puerto Rican trio. The majority of these samples have been previously analyzed as part of the International HapMap Project or 1000 Genomes Project.
We have just made a beta release of version 1.7 of Bionimbus. If you would like to host and operate your own Bionimbus cloud then you should consider this release. We expect to release version 1.8 in March/April, which will provide several additional features, including improved project management and the ability to edit an experiment's metadata.
A virtual machine image with common peak calling pipelines was made available on Amazon Web Services Elastic Cloud. Upon boot, it fetches pipeline library data, providing everything needed for processing user's data.
Amazon EC2 ID: ami-aead58c7
Startup command: ec2-run-instances -n 1 -t m1.large ami-aead58c7
Upon connecting to your instance, wait for /READY-PIPELINE-DATA file to appear before commencing pipelines. This file signifies that pipeline data libraries installed successfully on your instance.
For more information see the Bionimbus Machine Images section of the Using Bionimbus page.
The modENCODE Fly data produced by the White Lab is now available in the Bionimbus Simple Persistent Storage (BSPS) in the directory /glusterfs/fly.
All the data in BSPS is accessible to any virtual machine launched in the Bionimbus Elastic Compute Cloud (BEC2).
The Fly data produced by the White Lab can also be browsed, accessed and downloaded in bulk from Cistrack.
If you would like data added to BSPS, please send an email to support at bionimbus.org.
The Bionimbus Workspace (BWS) is a storage space that we have set up for those in the modENCODE fly/worm joint analysis group who would like to exchange data but do not want to use BEC2 and its associated storage. The Bionimbus Workspace (BWS) is accessed via ftp.
Here is a link to a tutorial about how to use BWS.
BWS is synced daily and on demand to the Bionimbus Simple Persistent Storage Space (BSPS), which is one of the storage services that is available to all the Bionimbus virtual machines that are run in the Bionimbus Elastic Compute Cloud (BEC2). In other words, the data that is moved by ftp to the BWS can be analyzed within the BEC2 using any of the Bionimbus supported machine images.
Please note that data in BSPS is not synced back to BWS. On the other hand, any user can manually write data to BWS assuming he or she has write permission to the target directory.
To set up an account, please send email to support at bionimbus.org.
Two of the donated racks will be used by Bionimbus, which is part of the OCC Open Science Data Cloud.