Tuesday, 16 March 2010

Filtering error from SOLiD Output collection of Perl Scripts

Summary: Here, we report the development of a filtering framework designed for efficient identification of both polyclonal and independent errors within SOLiD sequence data. The filtering utilizes the quality values reported by SOLiD's primary analysis for the identification of the two different types of errors. The filtering framework facilitates the passage of high-quality data into a variety of functional genomics applications, including de novo assemblers and sequence matching programs for SNP calling, improving the output quality and reducing resources necessary for analysis.
Availability: This error analysis framework is written in Perl and runs on Mac OS and Linux/Unix systems. The filter, documentation and sample Excel files for quality analysis are available at http://hts.rutgers.edu/filter and are distributed as Open Source software under the GPLv3.0.


  1. Isnt the raw read accuracy of SOLiD 99.94 %. Why do you need to filter?

  2. Firstly do not equate accuracy with the quality of the reads.
    There are going to be reads that are of low quality
    some are possibly PCR artifacts as well.

    but back to the 99.94%
    Let's say you have 1 Gbase of sequencing reads. 0.06 % is still going to be a lot of nucleotides that are wrong.


Datanami, Woe be me