One thing that is done very well in the R language is saving "data sets" compactly while preserving metadata.
The basic R structure corresponding to a data set in, say, SPSS or SAS, or a table in an SQL data base or a sheet in a spreadsheet is the data.frame.
Recently Hadley Wickham tweeted about building an R package containing the injury data from NEISS and Jenny Bryant asked about CSV files. This prompted a discussion about saving data frames in the .rds or .rda/.RData formats, as opposed to a quasi-format like a CSV file. (Die spreadsheets! Die!)
I completely agree that .rds
or .rda
is the way to go, which is why I wrote the read_rda
function for the Julia DataFrames package and later, with others, wrote the RCall package.
To successfully install the RCall
package you must have a version of R accessible but, as R is an open source project, that should not be an impediment.
To obtain the data from the neiss package for R as a Julia dataframe, you install the R package as described at that repository then use
In [1]:
using RCall
population = rcopy("neiss::population");
products = rcopy("neiss::products");
injuries = rcopy("neiss::injuries");
size(injuries)
Out[1]:
In [2]:
dump(injuries) # sort of like R's str function but not as polished
In [3]:
names(injuries)
Out[3]:
That's it.
If you want to be a little more fancy you can import the entire R package as a Julia module
In [4]:
@rimport neiss
The objects in this module are Julia types corresponding to the underlying R objects called SEXPRECs.
In [5]:
whos(neiss) # reads like Santa Claus deciding who's naughty and ..
You can work with the R objects in Julia, calling R functions, etc. but most of the time it is simpler to copy the R data frame to Julia.
A point of interest for those who may have tried something like this in other languages, there is no "glue" code written in C or C++ in the RCall package. It is all done in Julia.