Thursday, 17 January 2013

Article: DSK: k-mer counting with very low memory usage


DSK: k-mer counting with very low memory usage
http://bioinformatics.oxfordjournals.org/content/early/2013/01/16/bioinformatics.btt020.short?buffer_share=64cbf&rss=1


We present a new streaming algorithm for k-mer counting, called DSK (diskstreaming of k-mers), which only requires a fixed, user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered.

DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 hours. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers.

Availability:http://minia.genouest.org/dsk



Sent via Flipboard


Sent from myPhone

No comments:

Post a Comment

Datanami, Woe be me