Monday, 29 April 2013

Watch "Illumina on AWS - Customer Success Story" on YouTube

Illumina was featured in AWS customer success story recently.
I was initially surprised at the fact that Miseq data is the majority of the data analyzed by users of Basespace (it was mentioned in the video that Illumina is beginning to hook Hiseq machines to Basespace)
I was less surprised at their claims that 90% of DNA bases sequenced were performed on Illumina machines

Illumina, a California-based leading provider of DNA sequencing instruments, uses AWS to enable researchers to process and store massive amounts of data in the AWS Cloud. AWS offers the scalability and power in a secure environment that Illumina needs to help researchers collaborate while sequencing and analyzing large amounts of data.
Watch the Video »

Saturday, 27 April 2013 Bayesian Computation with R (Use R) eBook: Jim Albert: Kindle Store

The first edition Kindle ebook is free (source R bloggers)  but unfortunately not for Asia Pacific region

 The R scripts  from the book (2nd edition but there's a lot of overlap) can be obtained on Jim Albert's web site

Tuesday, 23 April 2013

Life Technologies hiring software engineers with cloud computing experience

Life Technologies is also moving into cloud computing. Would be interesting to see if they can bring anything new to the rather crowded table.

Thursday, 11 April 2013

Cufflinks 2.1.0 released

From: Cole Trapnell <cole at>
Date: 11 April, 2013 12:52:18 AM GMT+08:00
To: "bowtie-bio-announce
Subject: [Bowtie-bio-announce] Cufflinks 2.1.0 released

2.1.0 release - 4/10/2013

This release substantially improves the accuracy, speed, and memory footprint of Cufflinks and Cuffdiff. It is recommended for all users. Those who wish to see the impact of the accuracy improvements should look at the new benchmarking section. In addition to numerous bugfixes, the main changes are as follows:

  • Cuffdiff now includes a new statistical test. Prior versions used a delta method-based test, which lacked power for designs with more than a few replicates. The new test directly samples from the beta negative binomial model for each transcript in each condition in order to estimate the null distribution of its log fold change under the null hypothesis. This test is substantially more powerful, resulting in improved accuracy over all experimental designs, particularly those with more than three replicates. A similarly improved test is now used for isoform-switching. The benchmarking page shows the improvements in detail.
  • Prior versions of Cuffdiff reported the FPKM for each gene and transcript that maximizes the joint likelihood of the reads from all replicates pooled together. In version 2.1, Cuffdiff instead reports the mean of the maximum likelihood estimates from each replicate processed independently. As shown in the benchmarking section, these two methods report nearly identical values. However, the new method is faster and simpler to compute, and will enable new features for future releases.
  • The high and low confidence intervals reported by Cufflinks and Cuffdiff are now constructed from the samples generated from the beta negative binomial model, rather than estimated as twice the standard deviation. This better reflects the underlying distribution of the FPKM.
  • The library normalization system in Cuffdiff 2 has been overhauled, and several new normalization-related options have been added:
    • The new --library-norm-method option now sets which method should be used to compute scaling factors for the library sizes. The default method geometric is the same as prior releases of Cuffdiff (and the same as DESeq). The optional modes quartile and classic-fpkm are also supported.
    • The new --dispersion-method option controls how the variance model should be computed for each condition. The default mode pooled computes a mean-variance model for each condition that has multiple replicates, averages these models to generate a "pooled" average, and uses it for all conditions. This policy is borrowed from DESeq. Alternative models blind and per-condition are also supported. Prior versions of Cuffdiff used the method per-condition.
    • Several bugs for quartile normalization have been fixed.
    • Quartile normalization is no longer supported in Cufflinks, just in Cuffdiff. Cufflinks only supports the classic-fpkm mode.
    • All library size normalization is now conducted through the internal scaling factor. The external scaling factor should always be set to 1.0.
    • Library sizes and dispersions are now computed only on fragment counts from compatible fragments. Prior versions counted intronic and other structurally incompatible fragments in some sections of the code.
  • An optimized sampling procedure drastically improves running time for Cuffdiff. Cufflinks also benefits from this change. The improvements are particularly noticeable on deeply sequenced libraries.
  • The range of p-values that users should expect from Cuffdiff has changed. Because the test is now based on explicit sampling from the beta negative binomial, users will not see values less than 10^-5 by default. The test_stat field of Cuffdiff's output still contains the delta method's test statistic, but this test statistic is not used to compute p-values. It is preserved for backward compatibility with some functions in CummeRbund.
  • Some extraneous temporary output files have been removed after Cuffmerge runs.

Bowtie-bio-announce mailing list

Tuesday, 2 April 2013

TIL 'with open' in Python

I have always delayed learning about writing better code and what you don't know you often won't put into practice but when I do come across tips and optimizations, I use them heavily.
I am slightly embarrassed to say  that today I learnt about 'with open' for reading files in python from Zen of Python.

Read From a File

Use the with open syntax to read from files. This will automatically close files for you.
f = open('file.txt')
a =
print a
with open('file.txt') as f:
    for line in f:
        print line
The with statement is better because it will ensure you always close the file, even if an exception is raised.

There's some good reading on styles and preferred 'Pythonic' ways of coding if you follow the code style guide in The Hitchhiker's Guide to Python
One other good source is the plethora of online courses available now on Coursera or other sites e.g.

Learn to Program: Crafting Quality Code 
by Jennifer Campbell, Paul Gries

Datanami, Woe be me