Wednesday, 18 April 2012

R Function for Stratified Sampling « Adam On Analytics

got this in my email .. without trying to sound snobbish or whatever, I realise that 'extremely large' datasets can be a very relative descriptor .. (30k data rows here are considered extremely large)

Feedback on R function for stratified sampling of extremely large datasets, with many groups to sample from.

Hey guys,

I am constantly trying to improve my R code. I ran into an issue today where I had to draw random samples from several groups, with equal size from each group, from an extremely large dataset. I tried several functions I found online and I received errors about R memory issues. Thus, I had to write my own function. I got it to work, but I'd appreciate any feedback on how to improve my code. I attached it via the link below.

No comments:

Post a Comment

Datanami, Woe be me