Kevin's GATTACA World: Filtering error from SOLiD Output collection of Perl Scripts

Tuesday, 16 March 2010

Filtering error from SOLiD Output collection of Perl Scripts

Abstract
Summary: Here, we report the development of a filtering frameworkdesigned for efficient identification of both polyclonal andindependent errors within SOLiD sequence data. The filteringutilizes the quality values reported by SOLiD's primary analysisfor the identification of the two different types of errors.The filtering framework facilitates the passage of high-qualitydata into a variety of functional genomics applications, includingde novo assemblers and sequence matching programs for SNP calling,improving the output quality and reducing resources necessaryfor analysis.
Availability: This error analysis framework is written in Perland runs on Mac OS and Linux/Unix systems. The filter, documentationand sample Excel files for quality analysis are available athttp://hts.rutgers.edu/filter and are distributed as Open Sourcesoftware under the GPLv3.0.

2 comments:

Anonymous13 April 2010 at 03:00
Isnt the raw read accuracy of SOLiD 99.94 %. Why do you need to filter?
ReplyDelete
Replies
Kevin13 April 2010 at 23:26
Firstly do not equate accuracy with the quality of the reads.
There are going to be reads that are of low quality
some are possibly PCR artifacts as well.

but back to the 99.94%
Let's say you have 1 Gbase of sequencing reads. 0.06 % is still going to be a lot of nucleotides that are wrong.
ReplyDelete
Replies

Add comment

Kevin's GATTACA World

Tuesday, 16 March 2010

Filtering error from SOLiD Output collection of Perl Scripts

2 comments:

Datanami, Woe be me

Analytics code

Contributors