Tuesday 11 September 2012

sqldf - SQL select on R data frames - Google Project Hosting

Love the quote! 
To write it, it took three months; to conceive it – three minutes; to collect the data in it – all my life. F. Scott Fitzgerald

https://code.google.com/p/sqldf/

sqldf is an R package for runing SQL statements on R data frames, optimized for convenience. The user simply specifies an SQL statement in R using data frame names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is performed, the result is read back into R and the database is deleted all automatically behind the scenes making the database's existence transparent to the user who only specifies the SQL statement. Surprisingly this can at times be even faster than the corresponding pure R calculation (although the purpose of the project is convenience and not speed). (There are some additional benchmarks here which suggest that sqldf might be faster than R on aggregates but slower on joins.)sqldf is free software published under the GNU General Public License that can be downloaded from CRAN.


sqldf supports (1) the SQLite backend database (by default), (2) the H2 java database, (3) the PostgreSQL database and (4) sqldf 0.4-0 onwards also supports MySQL. SQLite, H2, MySQL and PostgreSQL are free software. SQLite and H2 are embedded serverless zero administration databases that are included right in the R driver packages

No comments:

Post a Comment

Datanami, Woe be me