Monday 12 December 2011

Would you use SOLID for de novo assembly?

http://seqanswers.com/forums/showthread.php?t=15422

There's an interesting 'debate' over here at recommendations for de novo assembly. 
interesting tidbits of info. 
e.g. 
PE 5500xl transcriptome datasets are available through solidsoftwaretools (http://solidsoftwaretools.com/gf/project/5500wtdataset/). 
(but don't be too excited, it's mapped data only)
 The data provided is the mapped colorspace output from LifeScope? v2.0 using default parameters.  This PDF contains a high level summary of the data.   

There's some discussion on whether to go for paired ends, ECC and that sort and comments on how the kit's chemistry introduces biases that work fine for RNA seq but might hurt assembly. 

my personal experience with SOLID 4 data is that the short reads coupled with high degree of 'noise' from reads that won't map anyway if u had a reference genome. This creates a big problem for assemblers especially if you have limited ram to work with. 
new assemblers that use less ram might be helpful in  resolving SOLID assemblies. SOLID is brilliant for RNA-seq quantification as they have a nice kit that allows for strand specificity. 

 454 or Sanger would undoubtedly be helpful for de novo assembly but realistically cost is always an issue. 

Your choice still boils down to a combination of factors. Combination of data from different platforms would be the best but it might not be viable till seq costs drop. 


No comments:

Post a Comment

Datanami, Woe be me