Friday, 26 February 2010

Illumina assembly using velvet

UC Davis has a wiki on this.
Covers Single End and Paired End

Wednesday, 10 February 2010

The sequence and de novo assembly of the giant panda genome

The sequence and de novo assembly of the giant panda genome

 Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.

 

SOAPdenovo uses the de Bruijn graph algorithm7 and applies a stepwise strategy to make it feasible to assemble the panda genome using a supercomputer (32 cores and 512Gb random access memory (RAM))

Amazing... 

See full paper

Monday, 1 February 2010

Cloudburst Contrail using cloud computing to speed up your NGS data analysis

Been having problems of all sorts trying to do de novo assembly of transcriptome data on my cluster. It might be possible that not enough RAM is the problem.. apparently at BGI they have 512 GB ram beasts.

I think it might be worthwhile to explore computing algorithm changes rather than hardware upgrades.
after all there comes a point when the cost far exceeds the "worthiness" of an experiment.

Contrail: Assembly of Large Genomes using Cloud Computing

[excerpt .... Preliminary results show Contrail’s contigs are of similar size and quality to those generated by Velvet when applied to small (bacterial) genomes, but provides vastly superior scaling capabilities when applied to large genomes....]

CloudBurst: Highly Sensitive Short Read Mapping with MapReduce
[excerpt ...CloudBurst's running time scales linearly with the number of reads mapped, and with near linear speedup as the number of processors increases. In a 24-processor core configuration, CloudBurst is up to 30 times faster than RMAP executing on a single core...]

Datanami, Woe be me