Tuesday, 28 January 2014
Growth slows in cloud business
Tuesday, 21 January 2014
That "$1000 genome" is going to cost you $72M
Lol.
Tongue in cheek look on the lowered cost of sequencing http://blog.allseq.com/1000-genome-72m
Wednesday, 13 November 2013
Consumer Grade HDD are OK for Data backup
The folks at Backblaze published their own field report on HDD failure rate which is interesting for any data center.
Earlier I had read about Google's study on how temperature doesn't affect HDD failure rate and promptly removed the noisy HDD cooling fans in my Linux box.
Their latest blog post at http://blog.backblaze.com/2013/11/12/how-long-do-disk-drives-last/ has me thinking that some of my colleagues elsewhere that are doing Backblaze like setups should switch to consumer grade HDDs to save on cost.
I do have a 80 Gb Seagate HDD that has survived the years. Admittedly I am not sure what to do with it anymore as it is too small(80 Gb) to be useful and too big (3.5") to be portable. It was used as a main HDD until it's size rendered it obselete hence it's sitting in a USB HDD dock that I use occasionally.
maybe you can find out the age by looking up the serial number but I use the SMART data info that you can see from the Disk Utility in Ubuntu.
![]() |
My ancient 3.5" HDD |
![]() |
Powered on only for 314 days! |
Hmm pretty low mileage for a 80 Gb HDD eh?
Check out this 320 Gb IDE HDD
Completely anecdotal but I have 3 Seagate 1 Tb HDD dying within a year from a software 5x HDD RAID array from within CentOS. When I checked on the powered on days it says it has been running for 3 years. So I am
1) confused how SMART data records HDD age
2) in agreement with Backblaze that HDD have specific failure phases. (where usage patterns play less of a role perhaps)
3)guessing that most of the data on Backblaze are archival in nature i.e. write once and forget until disaster strikes. So it would be great if Backblaze can 'normalize' the lifespan of the HDD with data access patterns per HDD to make it more relevant for a crowd that has a slightly different usage pattern than pure data archival needs.
That said I think it's an excellent piece of reading if you are concerned about using consumer grade HDD. Kudos the Backblaze team who managed to 'shuck' 5.5 Petabytes of raw HDD to weather the Thailand crisis (wonder how that affected their economics of using consumer grade HDD)
As usual YMMV applies here. Feel free to use consumer grade HDD for your archival needs but be sure to build in redundancy and resilience into your system like the folks in Backblaze.
Monday, 28 October 2013
Illumina Webinar Series: Sequencing Difficult Templates - Why Quality is Everything
Sequencing Difficult Templates - Why Quality is Everything
Date:
Wednesday, November 6
Register Now
Time: 1:00 pm A.E.D.T
Speaker:
Josquin Tibbits, PhD,
Senior Research Scientist, Dept of Environment and Primary Industries
Abstract
For most applications, sequence quality (low error rates, correct library size, even coverage etc), stands out as the key metric for the downstream utility of data from NGS platforms. I investigate the quality and utility of data generated from a range of platforms (454, HiSeq, MiSeq and PGM) for the reference initiated assembly of homopolymer, repeat and low complexity plant plastid genomes. These types of sequences are a good proxy for the more difficult sequence regions found when exploring larger genomes in both agricultural and human sequencing projects. The analysis will show in detail how the different platforms cope with these challenging regions
Wednesday, 2 October 2013
Biome | Q&A with Rich Roberts on single-molecule sequencing technology
"As the previously rapid climb in cost efficiency brought about by next-generation sequencing plateaus, the failure of single-molecule sequencing to deliver might leave some genomics aficionados despondent about the prospects for their field. But a recentCorrespondence article in Genome Biology saw Nobel laureate Richard Roberts, together with Cold Spring Harbor’s Mike Schatz and Mauricio Carneiro of the Broad Institute, argue that the latest iteration of Pacific Biosciences’ SMRT platform is a powerful tool, whose value should be reassessed by a skeptical community.
http://www.biomedcentral.com/biome/rich-roberts-discusses-single-molecule-sequencing-technology/
Correspondence

The advantages of SMRT sequencing
Roberts RJ, Carneiro MO and Schatz MC
Genome Biology 2013, 14:405

Wednesday, 28 August 2013
Case-Based Introduction to Biostatistics with Scott Zeger
https://www.coursera.org/course/casebasedbiostat
About the Course
The course is organized into 3 self-contained modules. Each module except the first is built around an important health problem. The first module reviews the scientific method and the role of experimentation and observation to generate data, or evidence, relevant to selecting among competing hypotheses about the natural world. Bayes theorem is used to quantify the concept of evidence. Then, we will discuss what is meant by the notion of “cause.”
In the second module, we use a national survey dataset to estimate the costs of smoking and smoking-caused disease in American society. The concepts of point and interval estimation are introduced. Students will master the use of confidence intervals to draw inferences about population means and differences of means. They will use stratification and weighted averages to compare subgroups that are otherwise similar in an attempt to estimate the effects of smoking and smoking-caused diseases on medical expenditures.
In the final module, we will study what factors influence child-survival in Nepal using data from the Nepal Nutritional Intervention Study Sarlahi or NNIPPS. Students will estimate and obtain confidence intervals for infant survival rates, relative rates and odds ratios within strata defined by gestational period, singleton vs twin births, and parental characteristics.