Tuesday, 28 January 2014

Growth slows in cloud business

Has the cloud hype died down? Seagate reportedly misses analyst's estimates as growth slowed in it's cloud storage business. With the exception of Galaxy or Basespace, it's hard to get researchers to buy into the cloud analysis paradigm even though their sporadic usage pattern would fit nicely for the economics of cloud usage. Maybe the cost of storage space / moving files into the cloud is the biggest hurdle.  My two cents.


Tuesday, 21 January 2014

That "$1000 genome" is going to cost you $72M

Lol.
Tongue in cheek look on the lowered cost of sequencing http://blog.allseq.com/1000-genome-72m

Wednesday, 13 November 2013

I have been coding in C with no luck [joke]



Consumer Grade HDD are OK for Data backup

Ah storage, who doesn't need more of it? Cheaply I might add.
The folks at Backblaze published their own field report on HDD failure rate which is interesting for any data center.
Earlier I had read about Google's study on how temperature doesn't affect HDD failure rate and promptly removed the noisy HDD cooling fans in my Linux box.
Their latest blog post at http://blog.backblaze.com/2013/11/12/how-long-do-disk-drives-last/ has me thinking that some of my colleagues elsewhere that are doing Backblaze like setups should switch to consumer grade HDDs to save on cost.
I do have a 80 Gb Seagate HDD that has survived the years. Admittedly I am not sure what to do with it anymore as it is too small(80 Gb) to be useful and too big (3.5") to be portable. It was used as a main HDD until it's size rendered it obselete hence it's sitting in a USB HDD dock that I use occasionally.
maybe you can find out the age by looking up the serial number but I use the SMART data info that you can see from the Disk Utility in Ubuntu.

My ancient 3.5" HDD




However the age as you can see from the screen shot is an estimate of the days it has been powered on.
Powered on only for 314 days!

Hmm pretty low mileage for a 80 Gb HDD eh?
Check out this 320 Gb IDE HDD



Even lower mileage! Not too sure of the history of this drive so can't really comment here.

Completely anecdotal but I have 3 Seagate 1 Tb HDD dying within a year from a software 5x HDD RAID array from within CentOS. When I checked on the powered on days it says it has been running for 3 years. So I am
1) confused how SMART data records HDD age
2) in agreement with Backblaze that HDD have specific failure phases. (where usage patterns play less of a role perhaps)
3)guessing that most of the data on Backblaze are archival in nature i.e. write once and forget until disaster strikes. So it would be great if  Backblaze can 'normalize' the lifespan of the HDD with data access patterns per HDD to make it more relevant for a crowd that has a slightly different usage pattern than pure data archival needs.

That said I think it's an excellent piece of reading if you are concerned about using consumer grade HDD. Kudos the Backblaze team who managed to 'shuck' 5.5 Petabytes of raw HDD to weather the Thailand crisis (wonder how that affected their economics of using consumer grade HDD)

As usual YMMV applies here. Feel free to use consumer grade HDD for your archival needs but be sure to build in redundancy and resilience into your system like the folks in Backblaze.

Monday, 28 October 2013

Illumina Webinar Series: Sequencing Difficult Templates - Why Quality is Everything

http://mkt.illumina.com/index.php/email/emailWebview?mkt_tok=3RkMMJWWfF9wsRohuqzKZKXonjHpfsXw6ekoW6Gg38431UFwdcjKPmjr1YYGT8p0aPyQAgobGp5I5FEOSrDYRKV4t6wPXA%3D%3D


Sequencing Difficult Templates - Why Quality is Everything

Date:
Wednesday, November 6
Register Now
Time: 1:00 pm A.E.D.T
Speaker:
Josquin Tibbits, PhD,
Senior Research Scientist, Dept of Environment and Primary Industries


Abstract

For most applications, sequence quality (low error rates, correct library size, even coverage etc), stands out as the key metric for the downstream utility of data from NGS platforms. I investigate the quality and utility of data generated from a range of platforms (454, HiSeq, MiSeq and PGM) for the reference initiated assembly of homopolymer, repeat and low complexity plant plastid genomes. These types of sequences are a good proxy for the more difficult sequence regions found when exploring larger genomes in both agricultural and human sequencing projects. The analysis will show in detail how the different platforms cope with these challenging regions

Wednesday, 2 October 2013

Biome | Q&A with Rich Roberts on single-molecule sequencing technology

The exciting part about single molecule sequencing for me was the ability to sequence low abundance transcripts or have phased haplotypes for human sequencing. Having a high error rate nullifies any advantage in these areas. But I guess it's a tool in the end and how you use it to get meaning results.


"As the previously rapid climb in cost efficiency brought about by next-generation sequencing plateaus, the failure of single-molecule sequencing to deliver might leave some genomics aficionados despondent about the prospects for their field. But a recentCorrespondence article in Genome Biology saw Nobel laureate Richard Roberts, together with Cold Spring Harbor’s Mike Schatz and Mauricio Carneiro of the Broad Institute, argue that the latest iteration of Pacific Biosciences’ SMRT platform is a powerful tool, whose value should be reassessed by a skeptical community.
In this Q&A, Roberts tells us why he thinks there’s a need for re-evaluation, and what sparked his interest in genomics in the first place."

http://www.biomedcentral.com/biome/rich-roberts-discusses-single-molecule-sequencing-technology/


Correspondence
Article has an altmetric score of 55

The advantages of SMRT sequencing

Roberts RJ, Carneiro MO and Schatz MC
Genome Biology 2013, 14:405

Go to article >>

Wednesday, 28 August 2013

Case-Based Introduction to Biostatistics with Scott Zeger

For those that might be keen this course is on Coursera
https://www.coursera.org/course/casebasedbiostat




About the Course

The course objective is to enable each student to enhance his or her quantitative scientific reasoning about problems related to human health. Biostatistics is about quantitative approaches - ideas and skills - to address bioscience and health problems. To achieve mastery of biostatistics skills, a student must “see one, do one, teach one.” Therefore, the course is organized to promote regular practice of new ideas and methods.

The course is organized into 3 self-contained modules. Each module except the first is built around an important health problem. The first module reviews the scientific method and the role of experimentation and observation to generate data, or evidence, relevant to selecting among competing hypotheses about the natural world. Bayes theorem is used to quantify the concept of evidence. Then, we will discuss what is meant by the notion of “cause.”
 
In the second module, we use a national survey dataset to estimate the costs of smoking and smoking-caused disease in American society. The concepts of point and interval estimation are introduced. Students will master the use of confidence intervals to draw inferences about population means and differences of means. They will use stratification and weighted averages to compare subgroups that are otherwise similar in an attempt to estimate the effects of smoking and smoking-caused diseases on medical expenditures.

In the final module, we will study what factors influence child-survival in Nepal using data from the Nepal Nutritional Intervention Study Sarlahi or NNIPPS. Students will estimate and obtain confidence intervals for infant survival rates, relative rates and odds ratios within strata defined by gestational period, singleton vs twin births, and parental characteristics.

Recommended Background

Interest in the scientific method as broadly related to human health. Ability to reason precisely. Mathematics through pre-calculus.

Suggested Readings

The Mismeasure of Man by Stephen J. Gould is an outstanding resource, but it is not a required text for this course.

Datanami, Woe be me