Saturday 29 September 2018

Koala Genome assembled on AWS



Excerpted from AWS blog 
 
Five years ago, a research team led by Dr. Rebecca Johnson (Director of the Australian Museum Research Institute) set out to learn more about koala populations, genetics, and diseases. As a biologically unique animal with a limited appetite, maintaining a healthy and genetically diverse population are both key elements of any conservation plan. In addition to characterizing the genetic diversity of koala populations, the team wanted to strengthen Australia’s ability to lead large-scale genome sequencing projects.
Inside the Koala Genome
Last month the team published their results in Nature Genetics. Their paper (Adaptation and Conservation Insights from the Koala Genome) identifies the genomic basis for the koala’s unique biology. 


This work was performed on AWS. The research team used cfnCluster to create multiple clusters, each with 500 to 1000 vCPUs, and running Falcon from Pacific Biosciences. All in all, the team used 3 million EC2 core hours, most of which were EC2 Spot Instances.

Tuesday 11 September 2018

BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters

https://academic.oup.com/bioinformatics/article/30/23/3402/207237
 

Bioinformatics, Volume 30, Issue 23, 1 December 2014, Pages 3402–3404,https://doi.org/10.1093/bioinformatics/btu558
Published:
 
20 August 2014
 

Abstract

Large datasets can be screened for sequences from a specific organism, quickly and with low memory requirements, by a data structure that supports time- and memory-efficient set membership queries. Bloom filters offer such queries but require that false positives be controlled. We present BioBloom Tools, a Bloom filter-based sequence-screening tool that is faster than BWA, Bowtie 2 (popular alignment algorithms) and FACS (a membership query algorithm). It delivers accuracies comparable with these tools, controls false positives and has low memory requirements.
 

Datanami, Woe be me