Friday, 30 April 2010

Dleon highlights tools worth a mention at ABRF 2010

from Dleon

Some of these are new to me as well!

It is a a web accessible database and LIMS to organize and track the generation of raw genomic data and associated downstream analysis.

It is a Genomic Annotation Publisher which stores information about each annotation in a database.

Intergrative Genome Browser:
The Integrated Genome Browser is an interactive, zoomable, scrollable software program you can use to visualize and explore genome-scale data sets, such as tiling array data, next-generation sequencing results, genome annotations, microarray designs, and the sequence itself.

Galaxy allows you to do analyses without the need to install or download software. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples, etc.

eRNA yet another category of RNA molecules

I wonder if this new name will stick, must find time to read this paper..
Full article here
read Michael Rhodes' commentary about the article here

BioTeam Inc. - Slides from HPC Trends Talk

BioITWorld – Slides from HPC Trends Talk

love the Storage war stories from 2009-2010! It's sooo true!
excerpt from the slides

#1 - Unchecked Enterprise Architects
 •  Scientist: “My work is priceless, I must be able to
    access it at all times”
 •  Storage Guru: “Hmmm…you want H/A, huh?”!
 •  System delivered:
    !  Small (< 50TB) Enterprise FC SAN
    !  Asynchronous replication to remote DR site
    !  Can’t scale, can’t do NFS easily
    !  ~$500K/year in support & operational costs

•  Lessons learned
•  Corporate storage architects may not fully
   understand the needs of HPC and research
   informatics users
•  End-users may not be precise with terms:
   !  “Extremely reliable” means “no data loss”, not
      99.999% uptime at a cost of millions
•  When true costs are explained:
   !  Many research users will trade a small amount of
      uptime or availability for more capacity or

BLAST on Intel vs. AMD CPUs

BLAST on Intel vs. AMD CPUs
Interesting comparison!
In summary,
1) when clock speeds are normalized, performance differences are not large
2) physical cores beat mix of physical and virtual cores
3) AMD "provided comparable performance to the former [Intel] at roughly 1/3 the power usage per core"

The results are not entirely surprising... in fact some might even ask what's the point.. still it's interesting that few people do benchmarks for bioinformatics when gamers and overclockers do benchmarking so often.

Caveats: Systems tested were not entirely similar (but that's a tough thing to do when you ARE comparing different platforms)

For more details.

You can download the white paper here.

Wednesday, 28 April 2010

Would you be a lab rat to seq your personal genome?

Iddo Friedberg at Byte Size Biology asks his readers whether they would like to have their genomes sequenced for free if the conditions were that it must be licensed for public use under a "liberal CC no attribution-like license," and personal information - age, height, and sex, among other things - would accompany the data.

I think the world in general isn't ready to embrace WG seq of themselves. Do read the article its interestin!

Tuesday, 13 April 2010

ABI's pipeline Bioscope possible cloud service?

Hmmm saw an obscure reference to possible cloud hosting for ABI's Bioscope software from a poster on the main page.

BioScope™ 1.2: An Applications Framework for SOLiD™ Sequence Data Analysis. (PDF, 172 KB) Suri, P., et al. (AGBT 2010)

"...Similar performance has been observed for BioScope™software deployed on cloud ("

But googling for the link yielded nothing..

Friday, 9 April 2010

bowtie build time statistics / benchmark

Recently Built colorspace index for hg19 (from

AMD Phenom II X4 955 chip


Wrote 822714402 bytes to primary EBWT file: hg19.rev.1.ebwt
Wrote 358098108 bytes to secondary EBWT file: hg19.rev.2.ebwt
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
    len: 2864784823
    bwtLen: 2864784824
    sz: 716196206
    bwtSz: 716196206
    lineRate: 6
    linesPerSide: 1
    offRate: 5
    offMask: 0xffffffe0
    isaRate: -1
    isaMask: 0xffffffff
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 89524526
    offsSz: 358098104
    isaLen: 0
    isaSz: 0
    lineSz: 64
    sideSz: 64
    sideBwtSz: 56
    sideBwtLen: 224
    numSidePairs: 6394609
    numSides: 12789218
    numLines: 12789218
    ebwtTotLen: 818509952
    ebwtTotSz: 818509952
Total time for backward call to driver() for mirror index: 01:24:19

Gosh and I thought it will take hours!

Wednesday, 7 April 2010

Next-Next Gen Seq

Rofl I like the title!

Read all about 3rd gen sequencing or single molecule sequencing at the methagora blog

"While the technology feature, “DNA sequencing: generation next-next”, was at press, Pacific Biosciences of Menlo Park, California stunned the community with their announcement of a single molecule sequencing technology they claim will provide a complete human genome in 15 minutes by the year 2013. Although Pacific Biosciences was founded in 2004, the company had been very ‘hush hush’ about their technology development. But that veil of secrecy was lifted during the Advances in Genome Biology and Technology meeting held February 6th to 9th at Marco Island, Florida where Stephen Turner, chief technology officer, presented the first preliminary data on the system."

Comparing NGS platforms, 454, Solexa, SOLiD

Inspired by Albert's work at

Please post discrepancies or views in comments

Datanami, Woe be me