Kevin's GATTACA World: coursera

Showing posts with label coursera. Show all posts

Saturday, 12 September 2015

Mining Massive Datasets by Stanford University on Coursera.

Mining Massive Datasets by Stanford University on Coursera.

Let the fun begin ...

In the next seven weeks, we will present to you many of the important tools for extracting information from very large datasets. Each week there will be a number of videos to watch, and one or more homeworks to do. The materials are backed up by a free on-line textbook, also published by Cambridge University Press, also called "Mining of Massive Datasets." You can download the book athttp://www.mmds.org

The first week is devoted to two topics:

MapReduce: A programming system for easily implementing parallel algorithms on commodity clusters. This material is in the first four videos available for the week.
Link Analysis: The remaining seven videos discuss the PageRank algorithm that made Google more effective than previous search engines.

There is also a single homework covering both topics. This homework is classified as "Basic." See below for an explanation of basic vs. advanced work, and the significance.

Wednesday, 28 August 2013

Case-Based Introduction to Biostatistics with Scott Zeger

For those that might be keen this course is on Coursera
https://www.coursera.org/course/casebasedbiostat

About the Course

The course objective is to enable each student to enhance his or her quantitative scientific reasoning about problems related to human health. Biostatistics is about quantitative approaches - ideas and skills - to address bioscience and health problems. To achieve mastery of biostatistics skills, a student must “see one, do one, teach one.” Therefore, the course is organized to promote regular practice of new ideas and methods.

The course is organized into 3 self-contained modules. Each module except the first is built around an important health problem. The first module reviews the scientific method and the role of experimentation and observation to generate data, or evidence, relevant to selecting among competing hypotheses about the natural world. Bayes theorem is used to quantify the concept of evidence. Then, we will discuss what is meant by the notion of “cause.”

In the second module, we use a national survey dataset to estimate the costs of smoking and smoking-caused disease in American society. The concepts of point and interval estimation are introduced. Students will master the use of confidence intervals to draw inferences about population means and differences of means. They will use stratification and weighted averages to compare subgroups that are otherwise similar in an attempt to estimate the effects of smoking and smoking-caused diseases on medical expenditures.

In the final module, we will study what factors influence child-survival in Nepal using data from the Nepal Nutritional Intervention Study Sarlahi or NNIPPS. Students will estimate and obtain confidence intervals for infant survival rates, relative rates and odds ratios within strata defined by gestational period, singleton vs twin births, and parental characteristics.

Recommended Background

Interest in the scientific method as broadly related to human health. Ability to reason precisely. Mathematics through pre-calculus.

Kevin's GATTACA World

Saturday, 12 September 2015

Mining Massive Datasets by Stanford University on Coursera.

Wednesday, 28 August 2013

Case-Based Introduction to Biostatistics with Scott Zeger

About the Course

Recommended Background

Suggested Readings

Datanami, Woe be me

Analytics code

Contributors