Wednesday 14 November 2012

Article: implementation of the "Space-efficient and exact de Bruijn graph representation based on a Bloom filter" article

Impressive that it turned a 337 gb ram Abyss job into a 5.7 gb ram requirement. To top it off it only used one core!

Number of final contigs differ though. 


implementation of the "Space-efficient and exact de Bruijn graph representation based on a Bloom filter" article
http://minia.genouest.org/

Sent via Flipboard

Article: Space-efficient and exact de Bruijn graph representation based on a Bloom filter

Abstract

The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (> 30 GB).

We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 Gb of memory in 23 hours.

Sent from my phone

No comments:

Post a Comment

Datanami, Woe be me