Sunday 13 March 2011

script 4 filter to unique FASTQ reads using a bloom-filter in front of a python set

from the hackmap blog 

a simple script that filters to unique FASTQ reads using a bloom-filter in front of a python set. Basically only stuff that is flagged as appearing in the bloom-filter is added to the set. This trades speed--it iterates over the file 3 times--for memory. The amount of memory is tuneable by the specified error-rate. It's not pretty, but it should be simple enough to demonstrate what's going on. It only reads from stdin and writes to stdout, with some information about total reads an number of false positives in the bloom-filter sent to stderr.
usage looks like:

python fastq-unique.py > in.fastq < out.unique.fastq

No comments:

Post a Comment

Datanami, Woe be me