Reading saved R data frames in Julia

One thing that is done very well in the R language is saving "data sets" compactly while preserving metadata.

The basic R structure corresponding to a data set in, say, SPSS or SAS, or a table in an SQL data base or a sheet in a spreadsheet is the data.frame.

Recently Hadley Wickham tweeted about building an R package containing the injury data from NEISS and Jenny Bryant asked about CSV files. This prompted a discussion about saving data frames in the .rds or .rda/.RData formats, as opposed to a quasi-format like a CSV file. (Die spreadsheets! Die!)

I completely agree that .rds or .rda is the way to go, which is why I wrote the read_rda function for the Julia DataFrames package and later, with others, wrote the RCall package.

To successfully install the RCall package you must have a version of R accessible but, as R is an open source project, that should not be an impediment.

To obtain the data from the neiss package for R as a Julia dataframe, you install the R package as described at that repository then use


In [1]:
using RCall
population = rcopy("neiss::population");
products = rcopy("neiss::products");
injuries = rcopy("neiss::injuries");
size(injuries)


Out[1]:
(2332957,18)

In [2]:
dump(injuries)  # sort of like R's str function but not as polished


DataFrames.DataFrame  2332957 observations of 18 variables
  case_num: DataArrays.DataArray{Int32,1}(2332957) Int32[90101432,90101434,90101435,90101436]
  trmt_date: DataArrays.DataArray{Float64,1}(2332957) [14245.0,14245.0,14245.0,14245.0]
  psu: DataArrays.DataArray{Float64,1}(2332957) [61.0,61.0,61.0,61.0]
  weight: DataArrays.DataArray{Float64,1}(2332957) [15.3491,15.3491,15.3491,15.3491]
  stratum: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["V","V","V","V"]
  age: DataArrays.DataArray{Float64,1}(2332957) [5.0,51.0,2.0,20.0]
  sex: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["Male","Male","Female","Male"]
  race: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["Other / Mixed Race","White","White","White"]
  race_other: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["hispanic",NA,NA,NA]
  diag: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["Strain, Sprain","Contusion Or Abrasion","Laceration","Contusion Or Abrasion"]
  diag_other: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[NA,NA,NA,NA]
  body_part: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["Neck","Eyeball","Face","Toe"]
  disposition: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["Released","Released","Released","Released"]
  location: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["Home","Home","Home","Home"]
  fmv: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["No fire/flame/smoke","No fire/flame/smoke","No fire/flame/smoke","No fire/flame/smoke"]
  prod1: DataArrays.DataArray{Float64,1}(2332957) [1807.0,899.0,4057.0,1884.0]
  prod2: DataArrays.DataArray{Float64,1}(2332957) [NA,NA,NA,NA]
  narrative: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString["5 YOM ROLLING ON FLOOR DOING A SOMERSAULT AND SUSTAINED A CERVICAL STRA IN","51 YOM C/O PAIN AND IRRITATION TO RIGHT EYE, HAD BEEN GRINDING METAL AT HOME AND POSSIBLY THE CAUSE, FOUND TO HAVE METAL IN EYE, CORNEAL ABRAS","2 YOF WAS RUNNING THROUGH HOUSE AND FELL INTO CORNER OF TABLE SUSTAININ G A LACERATION TO FACE NEAR INSIDE CORNER OF RIGHT EYE ALONGSIDE NOSE","20 YOM PUNCHED AND KICKED A WALL D/T DRINKING TOO MUCH LAST NIGHT, SUST AINED CONTUSIONS AND ABRASIONS TO RIGHT MIDDLE TOE, RIGHT HAND"]

In [3]:
names(injuries)


Out[3]:
18-element Array{Symbol,1}:
 :case_num   
 :trmt_date  
 :psu        
 :weight     
 :stratum    
 :age        
 :sex        
 :race       
 :race_other 
 :diag       
 :diag_other 
 :body_part  
 :disposition
 :location   
 :fmv        
 :prod1      
 :prod2      
 :narrative  

That's it.

If you want to be a little more fancy you can import the entire R package as a Julia module


In [4]:
@rimport neiss

The objects in this module are Julia types corresponding to the underlying R objects called SEXPRECs.


In [5]:
whos(neiss)    # reads like Santa Claus deciding who's naughty and ..


                      injuries      8 bytes  RCall.RObject{RCall.VecSxp}
                         neiss     53 bytes  Module
                    population      8 bytes  RCall.RObject{RCall.VecSxp}
                      products      8 bytes  RCall.RObject{RCall.VecSxp}

You can work with the R objects in Julia, calling R functions, etc. but most of the time it is simpler to copy the R data frame to Julia.

A point of interest for those who may have tried something like this in other languages, there is no "glue" code written in C or C++ in the RCall package. It is all done in Julia.