Showing posts with label rstat. Show all posts
Showing posts with label rstat. Show all posts

Wednesday, 6 February 2013

Handling R packages Feb 2013 issue Linux Journal

The kind folks at http://www.linuxjournal.com/ have provided me an 2013 Feb issue. Can't tell you how much of Linux I have picked up from there with its easy prose and graphical howtos. In the Feb 2013 issue, they have focused on the theme sys admin. Definitely useful things inside for the starting bioinformatician who wishes to dabble with working directly off a *nix machine :)

Other topics in this issue includes


In the February 2013 issue:
  • Manage Your Virtual Deployment with ConVirt
  • Use Fabric for Sysadmin Tasks on Remote Machines
  • Spin up Linux VMs on Azure
  • Make Your Android Device Play with Your Linux Box
  • Create a Colocated Server with Raspberry Pi


You can check out a preview of the contents here

February 2013 Issue of Linux Journal: System Administration

Tuesday, 4 September 2012

R one-liners : do you have any to contribute?

NOT sure if installing packages to use one-liners count but heck .. anything to improve productivity eh?

excerpted from Jeffrey's blog http://jeffreybreen.wordpress.com/2011/07/21/one-liners-twitter/



One-liners which make me love R: twitteR’s searchTwitter() #rstats


R reminds me a lot of English. It’s easy to get started, but very difficult to master. So for all those times I’ve spent… well, forever… trying to figure out the “R way” of doing something, I’m glad to share these quick wins.
My recent R tutorial on mining Twitter for consumer sentiment wouldn’t have been possible without Jeff Gentry’s amazing twitteR package (available on CRAN). It does so much of the behind-the-scenes heavy lifting to access Twitter’s REST APIs, that one line of code is all you need to perform a search and retrieve the (even paginated) results:
library(twitteR)
tweets = searchTwitter("#rstats", n=1500)
You can search for anything, of course, “#rstats” is just an example. (And if you’re really into that hashtag, the twitteR package even provides an Rtweets() function which hardcodes that search string for you.) The n=1500 specifies the maximum number of tweets supported by the Search API, though you may retrieve fewer as Twitter’s search indices contain only a couple of days’ tweets.


Tuesday, 21 August 2012

Book: Applied Statistic Genetics with R with data & code!

Check this out! http://people.umass.edu/foulkes/asg/examples.html
This book is intended to provide fundamental statistical concepts and R tools relevant to the analysis of genetic data arising from population-based association studies.  The statistical methods described are broadly relevant to the field of statistical genetics and include a large array of tools for a wide variety of medical and public health applications.  Data analytic methods include approaches to handling multiplicity, ambiguity in haplotypic phase and underlying gene-gene and gene-environment interactions.  Several publicly available data sets are used for illustration
  •           Chapter 1
                #  1.1:  Identifying the minor allele and its frequency

                Chapter 2
                
  •             #  2.1: Chi-squared test for association
                
    #  2.2: Fisher's exact test for association
                
    #  2.3: Chochran-Armitage (C-A) trend test for association
                #  2.4: Two-sample tests for association for a quantitative trait

                #  2.5: M-sample tests of association for a quantitative trait
                #  2.6: Linear Regression
  •             Chapter 3
  •             #  3.1: Measuring LD using D-prime
                #  3.2: Measuring LD for a group of SNPs
  •             #  3.3: Measuring LD based on r^2 and the \chi^2-statistic
                #  3.4: Determining average LD across multiple SNPs
  •             #  3.5: Population substructure and LD
                #  3.6: Testing for HWE using Pearsons \chi^2-test
  •             #  3.7:  Testing for HWE using Fishers exact test
                #  3.8:  HWE and geographic origin
  •             #  3.9: Generating a similarity matrix
                #  3.10: Multidimensional scaling (MDS) for identifying population substructure
  •             #  3.11: Principal components analysis (PCA) for identifying population substructure 
  •             Chapter 4
  •             #  4.1: Bonferroni adjustment
                #  4.2: Tukeys single-step method
  •             #  4.3: Banjamini and Hochberg (B-H) adjustment            #  4.4: Benjamini and Yekutieli (B-Y) adjustment
  •             #  4.5: Calculation of the q-value
  •             #  4.6: Free step down resampling adjustment
  •             #  4.7: Null unrestricted bootstrap approach
  •             Chapter 5
                #  5.1: EM approach to haplotype frequency estimation
  •             #  5.2Calculating posterior haplotype probabilities 
                #  5.3:
     Testing hypotheses about haplotype frequencies within the EM framework
  •             #  5.4: Application of haplotype trend regression (HTR)
                #  5.5: Multiple imputation for haplotype effect estimation and testing
  •             #  5.6: EM for estimation and testing of haplotype-trait association
  •             Chapter 6
                #  6.2: Creating a classification tree
  •             #  6.3: Generating a regression tree
                #  6.4: Categorical and ordinal predictors in a tree
                #  6.5: Cost-complexity pruning
  •             Chapter 7
                #  7.1: An application of random forests
  •             #  7.2: RF with missing SNP data - single imputation
                #  7.3: RF with missing SNP data - multiple imputation
  •             #  7.4: MIRF
  •             #  7.5: Application of logic regression
  •             #  7.6: Monte Carol logic regression
  •             #  7.7: An application of MARS

Getting Started with R and Hadoop

Getting Started with R and Hadoop: (This article was first published on Revolutions, and kindly contr... 


For newcomers to map-reduce programming with R and Hadoop, Jeffrey's presentation includes a step-by-step example of computing flight times from air traffic data. The last few slides some advanced features: how to work directly with files in HDFS from R with the rhdfs package; and how to simulate a Hadoop cluster on the local machine (useful for development, testing and learning RHadoop). Jeffrey also mentions that the RHadoop tutorial is a good resource for new users.
You can find Jeffrey's slides embedded below, and a video of the presentation is also available. You might also want to check out Jeffrey's older presentation Big Data Step-by-Step for tips on setting up a compute environment with Hadoop and R.



Datanami, Woe be me