Thursday, 19 July 2012

Slim-Filter: an interactive windows-based application for illumina genome analyzer data assessment and manipulation.

Slim-Filter: An interactive windows-based application for Illumina Genome Analyzer data assessment and manipulation


G. Golovko1,2, K. Khanipov1, M. Rojas1,2, A. Martinez-Alcántara1, J. J. Howard1, E. Ballesteros1, S. Gupta1, W. Widger1,3, and Y. Fofanov1,2,3

1Center for BioMedical and Environmental Genomics, University of Houston, Houston, TX, USA. Department of Computer Science2 and the Department of Biology and Biochemistry3, University of Houston, Houston, TX, USA 77204


    The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week.  Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps.  This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider.  Furthermore, such services are often performed using specialized computational facilities. 

    We present a windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files.  Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads.  Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process.Slim-Filter is written in C++ and is compatible with fastafastq, and specialized AS file formats. 

Slim-Filter Performance was estimated using following computer configurations:
OSWindows 2008 Server SP1 CentOS 5.6
CPUDual Quad Core Intel™ Xeon W5590 3.33GHz,8M L3AMD Magny Cours 6128 8-Core Processor, 2.0 GHz, 12MB Cache
RAM128GB, DDR3 RDIMM, 1066MHz, ECC512 GB DDR3 1333Mhz ECC
HDD2TB SATA 3.0Gb/s, 7200 RPM 2TB SATA 3.0Gb/s, 7200 RPM

Number of
reads 36 bases
RAM required
to perform
for Linux (Mb)
RAM required
to perform
Windows (Mb)
Time to
apply all
settings in
Time to apply all
possible filter
settings in Linux
1,000,000300600-800 6640

No comments:

Post a Comment

Datanami, Woe be me