RCall: Embedding R in Julia

Douglas Bates

February 24, 2015

Julia, a recently developed language for technical computing, has many favorable features. However, it is still in the early development phase and many techiques for statistical analysis are not yet implemented in Julia.

In contrast R, the most widely used language in statistics, has a mature infrastructure including access to several thousand user-contributed packages that implement a wide variety of statistical techniques. Obviously this wealth of software is not going to be reimplemented overnight. It is desirable to be able to access R and R packages from within Julia.

Furthermore, one of the admirable properties of R is that it has a sophisticated system for storing data sets and their metadata compactly. It is expected that R packages will provide data sets to illustrate techniques and to provide reference results against which to compare other implementations. Even though a data archival format for Julia is available in the HDF5 package, it does not by itself provide access to all the data sets that already exist in R packages.

The RCall package for Julia provides the ability to run an embedded R within Julia. One basic usage is to create a copy in Julia of a dataset from an R package.

In addition to showing some basic usage of RCall this notebook describes some of the implementation details.

Installing and attaching the RCall package

The Rcall package is installed by running

Pkg.add("RCall")

In a typical installation the shell commands R RHOME and Rscript are called to determine the location of R's shared library and the values of some environment variables. A successful installation requires R and Rscript to be on the user's search path.

The RCall package can be configured to use another version of R by setting several environment variables; R_HOME, R_DOC_DIR, R_INCLUDE_DIR, R_SHARE_DIR and LD_LIBRARY_PATH to appropriate values then running

Pkg.build("RCall")

The RCall package is attached to the current Julia session with

using RCall

It is a good idea to also attach the DataArrays and DataFrames packages.


In [1]:
using DataArrays,DataFrames,RCall

Accessing data sets in R packages from Julia

The Rcall package adds methods for DataFrame and DataArray that evaluate a Julia Symbol or ASCIIString in R and create the indicated Julia type from the result, if possible.

For example


In [2]:
attenu = DataFrame(:attenu)


Out[2]:
eventmagstationdistaccel
11.07.011712.00.359
22.07.41083148.00.014
32.07.4109542.00.196
42.07.428385.00.135
52.07.4135107.00.062
62.07.4475109.00.054
72.07.4113156.00.014
82.07.41008224.00.018
92.07.41028293.00.01
102.07.42001359.00.004
112.07.4117370.00.004
123.05.311178.00.127
134.06.1143816.10.411
144.06.1108363.60.018
154.06.110136.60.509
164.06.110149.30.467
174.06.1101513.00.279
184.06.1101617.30.072
194.06.11095105.00.012
204.06.11011112.00.006
214.06.11028123.00.003
225.06.6270105.00.018
235.06.6280122.00.048
245.06.6116141.00.011
255.06.6266200.00.007
265.06.611745.00.142
275.06.6113130.00.031
285.06.6112147.00.006
295.06.6130187.00.01
305.06.6475197.00.01
&vellip&vellip&vellip&vellip&vellip&vellip

creates a copy in Julia of the attenu dataset. It is possible to use a symbol here because the attenu data frame is in the R datasets package, which is, by default, loaded at startup.

To access data frames from an R package that is not loaded at startup, you can use the :: notation in a character string


In [3]:
DataFrame("ggplot2::diamonds")


Out[3]:
caratcutcolorclaritydepthtablepricexyz
10.23IdealESI261.555.03263.953.982.43
20.21PremiumESI159.861.03263.893.842.31
30.23GoodEVS156.965.03274.054.072.31
40.29PremiumIVS262.458.03344.24.232.63
50.31GoodJSI263.358.03354.344.352.75
60.24Very GoodJVVS262.857.03363.943.962.48
70.24Very GoodIVVS162.357.03363.953.982.47
80.26Very GoodHSI161.955.03374.074.112.53
90.22FairEVS265.161.03373.873.782.49
100.23Very GoodHVS159.461.03384.04.052.39
110.3GoodJSI164.055.03394.254.282.73
120.23IdealJVS162.856.03403.933.92.46
130.22PremiumFSI160.461.03423.883.842.33
140.31IdealJSI262.254.03444.354.372.71
150.2PremiumESI260.262.03453.793.752.27
160.32PremiumEI160.958.03454.384.422.68
170.3IdealISI262.054.03484.314.342.68
180.3GoodJSI163.454.03514.234.292.7
190.3GoodJSI163.856.03514.234.262.71
200.3Very GoodJSI162.759.03514.214.272.66
210.3GoodISI263.356.03514.264.32.71
220.23Very GoodEVS263.855.03523.853.922.48
230.23Very GoodHVS161.057.03533.943.962.41
240.31Very GoodJSI159.462.03534.394.432.62
250.31Very GoodJSI158.162.03534.444.472.59
260.23Very GoodGVVS260.458.03543.974.012.41
270.24PremiumIVS162.557.03553.973.942.47
280.3Very GoodJVS262.257.03574.284.32.67
290.23Very GoodDVS260.561.03573.963.972.4
300.23Very GoodFVS160.957.03573.963.992.42
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

or first attach the package's namespace, then use a symbol.


In [4]:
rcopy("library(robustbase)")


Out[4]:
8-element Array{ASCIIString,1}:
 "robustbase"
 "stats"     
 "graphics"  
 "grDevices" 
 "utils"     
 "datasets"  
 "methods"   
 "base"      

In [5]:
coleman = DataFrame(:coleman)


Out[5]:
salaryPfatherWcsstatusteacherScmotherLevY
13.8328.877.226.66.1937.01
22.8920.1-11.7124.45.1726.51
32.8669.0512.3225.77.0436.51
42.9265.414.2825.77.140.7
53.0629.596.3125.46.1537.1
62.0744.826.1621.66.4133.9
72.5277.3712.724.96.8641.8
82.4524.67-0.1725.015.7833.4
93.1365.019.8526.66.5141.01
102.449.99-0.0528.015.5737.2
112.0912.2-12.8623.515.6223.3
122.5222.550.9223.65.3435.2
132.2214.34.7724.515.834.9
142.6731.79-0.9625.86.1933.1
152.7111.6-16.0425.25.6222.7
163.1468.4710.6225.016.9439.7
173.5442.642.6625.016.3331.8
182.5216.7-10.9924.86.0131.7
192.6886.2715.0325.517.5143.1
202.3776.7312.7724.516.9641.01

The difference between using the :: operator in R and evaluating an expression like "library(robustbase)" is that the latter attaches the namespace of the package to the R search path. The :: operator accesses the package's contents but does not put it on the search path.

You can also check the search path explicitly


In [6]:
rcopy("search()")


Out[6]:
10-element Array{ASCIIString,1}:
 ".GlobalEnv"        
 "package:robustbase"
 "package:stats"     
 "package:graphics"  
 "package:grDevices" 
 "package:utils"     
 "package:datasets"  
 "package:methods"   
 "Autoloads"         
 "package:base"      

High-level evaluation of R expressions, rcopy

The RCall package exports functions rcopy, reval, rparse, rprint and sexp, the datatype SEXPREC, and the globalEnv object.


In [7]:
whos(RCall)


RCall                         Module
SEXPREC                       DataType
getAttrib                     Function
globalEnv                     EnvSxp
isFactor                      Function
isOrdered                     Function
isTs                          Function
libR                          ASCIIString
named                         Function
rcopy                         Function
reval                         Function
rparse                        Function
rprint                        Function
sexp                          Function

rcopy, rprint and globalEnv provide the high-level interface.

As seen in the last section, rcopy evaluates a symbol or a character string in the embedded R. The reason it is called rcopy is because it evaluates the expression and copies the contents of the return value to storage allocated by Julia.

This is the preferred way to evaluate an R expression because the value of the R expression is copied into storage allocated by Julia and displayed by Julia. Some users prefer to write such calls as a Julia pipe so that the R expression occurs first.


In [8]:
"search()" |> rcopy


Out[8]:
10-element Array{ASCIIString,1}:
 ".GlobalEnv"        
 "package:robustbase"
 "package:stats"     
 "package:graphics"  
 "package:grDevices" 
 "package:utils"     
 "package:datasets"  
 "package:methods"   
 "Autoloads"         
 "package:base"      

Sometimes the R expression to be evaluated contains a quoted string. It is easiest to use single quotes in the R expression because, in R, single quotes, ', are equivalent to double quotes, ". In a Julia literal string double quotes must be escaped but single quotes do not.


In [9]:
"exists('airmiles')" |> rcopy


Out[9]:
1-element Array{Int32,1}:
 1

The result of rcopy cannot be used as an argument to other functions in the C API for R. To do this we must preserve the intermediate representation returned by reval.

Also, because there is not a one-to-one correspondence between R objects and Julia objects, information is often lost when to Julia.

For example, the value of exists('airmiles') is a logical vector of length 1 in R but logical values in R happen to be stored as 32-bit integers and rcopy returns this type.

Julia uses types to represent objects with data and structural metadata. In R the structural metadata, such as array dimensions, names or dimension names, are stored as attributes of the object. rcopy preserves array dimensions but drops other attributes.

For example, airmiles is an R time series.


In [10]:
"names(attributes(airmiles))" |> rcopy


Out[10]:
2-element Array{ASCIIString,1}:
 "tsp"  
 "class"

Access to the attributes is available from the result of reval, the low-level evaluation of R expressions, which is described below.

The rprint generic applies R's printing methods which, naturally, do use the information in the attributes.


In [11]:
rprint(:airmiles)


Time Series:
Start = 1937 
End = 1960 
Frequency = 1 
 [1]   412   480   683  1052  1385  1418  1634  2178  3362  5948  6109  5981
[13]  6753  8003 10566 12528 14760 16769 19819 22362 25340 25343 29269 30514

Assigning Julia objects in R

Top-level assignments in R are in what is called the global environment. Assignments in Julia to this R environment are written as assignments to names in the globalEnv object exported by the RCall package.


In [12]:
globalEnv[:x] = [1:10];
rprint(:x)


 [1]  1  2  3  4  5  6  7  8  9 10

In [13]:
"mean(x)" |> rcopy


Out[13]:
1-element Array{Float64,1}:
 5.5

In [14]:
"ls()" |> rcopy


Out[14]:
1-element Array{ASCIIString,1}:
 "x"

Typing globalEnv for each assignment can get tedious. Some users prefer to create a shorter alias, such as g.


In [15]:
const g = globalEnv;
g[:y] = "Hello world";
rprint(:y)


[1] "Hello world"

The conversion of a Julia object to an R object is performed by methods for the sexp generic. If an sexp method doesn't exist for the Julia object you will see an error of the form


In [16]:
g[:x] = 1:10


`sexp` has no method matching sexp(::UnitRange{Int64})
while loading In[16], in expression starting on line 1

 in setindex! at /home/bates/.julia/v0.3/RCall/src/sexp.jl:192

Please open an issue or create a pull request on the github repository if you have a reasonable way of representing the Julia type in R.

Fitting models using R

Base R and various R packages provide a wide variety of model-fitting functions. Frequently these functions return an R list object with a class attribute. At present you can't count on rcopy being able to handle lists that may contain language elements.


In [17]:
"m1 <- lmrob(Y ~ ., coleman)" |> rcopy


`rcopy` has no method matching rcopy(::ClosSxp)
while loading In[17], in expression starting on line 1

 in rcopy at no file
 in map_to! at abstractarray.jl:1311
 in map_to! at abstractarray.jl:1320
 in map at abstractarray.jl:1331
 in rcopy at /home/bates/.julia/v0.3/RCall/src/sexp.jl:129
 in rcopy at no file
 in map_to! at abstractarray.jl:1311
 in map_to! at abstractarray.jl:1320
 in map at abstractarray.jl:1331
 in rcopy at /home/bates/.julia/v0.3/RCall/src/sexp.jl:129
 in rcopy at /home/bates/.julia/v0.3/RCall/src/iface.jl:35
 in |> at operators.jl:178

Note that the failure is in copying the results of the model fit to Julia. m1 has been evaluated and assigned in R so the usual extractor functions in R can be applied to it. If they return simple objects, rcopy can be applied.


In [18]:
rprint(:m1)


Call:
lmrob(formula = Y ~ ., data = coleman)
 \--> method = "MM"
Coefficients:
(Intercept)      salaryP     fatherWc      sstatus    teacherSc    motherLev  
   30.50232     -1.66615      0.08425      0.66774      1.16778     -4.13657  


In [19]:
"coef(m1)" |> rprint


(Intercept)     salaryP    fatherWc     sstatus   teacherSc   motherLev 
30.50232037 -1.66614686  0.08425381  0.66773659  1.16777741 -4.13656908 

In [20]:
m1coef = rcopy("coef(m1)")


Out[20]:
6-element Array{Float64,1}:
 30.5023   
 -1.66615  
  0.0842538
  0.667737 
  1.16778  
 -4.13657  

In [21]:
"coef(summary(m1))" |> rprint


               Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 30.50232037 6.71260433  4.544037 4.588913e-04
salaryP     -1.66614686 0.43128710 -3.863197 1.722163e-03
fatherWc     0.08425381 0.01467468  5.741440 5.100315e-05
sstatus      0.66773659 0.03385090 19.725816 1.296519e-11
teacherSc    1.16777741 0.10983312 10.632289 4.348638e-08
motherLev   -4.13656908 0.92083740 -4.492182 5.067196e-04

In [22]:
m1coefmat = rcopy("coef(summary(m1))")


Out[22]:
6x4 Array{Float64,2}:
 30.5023     6.7126      4.54404  0.000458891
 -1.66615    0.431287   -3.8632   0.00172216 
  0.0842538  0.0146747   5.74144  5.10032e-5 
  0.667737   0.0338509  19.7258   1.29652e-11
  1.16778    0.109833   10.6323   4.34864e-8 
 -4.13657    0.920837   -4.49218  0.00050672 

We see that rcopy drops the names and dimnames. Julia Array objects do not have provision for names. The NamedArrays package for Julia does provide for named dimensions. Methods for NamedArray may help.

Missing values in R vector types

Missing values are indicated by a sentinel value in R vectors. The sentinel for Float64 vectors is one of the NaN values and the sentinel for integer or logical vectors and for factors is typemin(Int32)


In [23]:
RCall.R_NaReal


Out[23]:
NaN

In [24]:
reinterpret(Uint64,ans)


Out[24]:
0x7ff00000000007a2

In [25]:
RCall.R_NaInt


Out[25]:
-2147483648

In [26]:
typemin(Int32)


Out[26]:
-2147483648

DataArray methods for R numeric vectors create DataArray objects in which the missing data values are converted to the representation used in the DataArrays package. In the process R logical vectors are converted to Bool objects and R factor objects are converted to PackedDataArray objects.


In [27]:
"c(1,2,NA,3)" |> rcopy


Out[27]:
4-element Array{Float64,1}:
   1.0
   2.0
 NaN  
   3.0

In [28]:
"c(1,2,NA,4)" |> reval |> DataArray  # check this result


Out[28]:
4-element DataArray{Float64,1}:
   1.0
   2.0
 NaN  
   4.0

In [29]:
"gl(2,4,labels=c('A','B'))" |> reval |> DataArray


Out[29]:
8-element PooledDataArray{ASCIIString,Uint8,1}:
 "A"
 "A"
 "A"
 "A"
 "B"
 "B"
 "B"
 "B"

When the implementation of NullAble types in Julia is more mature it may make sense to convert to those types for missing data representation.

Plotting in R

For many people the ability to use R graphics packages, such as ggplot2, would be a prime motivation for using the RCall package. Unfortunately, for me, trying to initialize the graphics system in an embedded R always fails.


In [30]:
"pdf(file='/tmp/plt1.pdf'); plot(1:10,(1:10)^2); dev.off()" |> rcopy


Error in plot.new() : the base graphics system is not registered
Error occurred in R_tryEval
while loading In[30], in expression starting on line 1

 in reval at /home/bates/.julia/v0.3/RCall/src/iface.jl:5
 in reval at /home/bates/.julia/v0.3/RCall/src/iface.jl:12
 in reval at /home/bates/.julia/v0.3/RCall/src/iface.jl:16
 in rcopy at /home/bates/.julia/v0.3/RCall/src/iface.jl:35
 in |> at operators.jl:178

The low-level interface; reval and sexp

To understand the internal structures used by R it helps to know a bit of the history of the language and implementation. In the mid 1990's Ross Ihaka and Robert Gentleman, both then at the University of Auckland, embarked on developing a language implementation that was "not unlike S", a language developed by John Chambers and others at AT&T Bell Labs. The Bell Labs S implementation and the commercial S-PLUS implementation were proprietary, closed-source code. Ross and Robert chose to base their open-source implementation internally on Scheme, as described in Abelson and Sussmans book "Structure and Interpretation of Computer Programs".

Those familiar with Lisp will recognize many terms and concepts in the C API for R. R objects are represented as a C struct called a SEXPREC or symbolic expression. This struct is actually a union of different representations for different types of R objects. The SEXP type in the API is a pointer to a SEXPREC. Most functions in the C API to R return such a pointer.

The Julia abstract type, also called SEXPREC, has several subtypes corresponding to the internal R structures.


In [31]:
subtypes(SEXPREC)


Out[31]:
24-element Array{Any,1}:
 AnySxp    
 BcodeSxp  
 BuiltinSxp
 CharSxp   
 ClosSxp   
 CplxSxp   
 DotSxp    
 EnvSxp    
 ExprSxp   
 ExtPtrSxp 
 IntSxp    
 LangSxp   
 LglSxp    
 ListSxp   
 NilSxp    
 PromSxp   
 RawSxp    
 RealSxp   
 S4Sxp     
 SpecialSxp
 StrSxp    
 SymSxp    
 VecSxp    
 WeakRefSxp

The reval function is the low-level version of rcopy. It evaluates a Julia Symbol or Julia String containing an R expression and returns one of the Julia SEXPREC types.


In [32]:
m1 = reval(:m1)


Out[32]:
VecSxp(-536870733,Ptr{Void} @0x0000000006b3d058,Ptr{Void} @0x00000000046ac770,Ptr{Ptr{None}} @0x00000000046ac798,22,0)

In [33]:
reval("coef(summary(m1))")


Out[33]:
RealSxp(-536870770,Ptr{Void} @0x000000000d967b70,Ptr{Void} @0x000000000d92f330,Ptr{Float64} @0x000000000d92f358,24,0)

Methods for the Julia sexp generic create SEXPREC representations of Julia types, copying the data from Julia into storage allocated by R . There is also an sexp(p::Ptr{Void}) method that takes an SEXP (pointer to an R SEXPREC), usually the value of ccall of a function in the R API, and converts it to the appropriate type of Julia SEXPREC.

As seen above, a Julia SEXPREC type like VecSxp or RealSxp consists of an integer tag, several pointers and, in some cases, other information like the length of a vector.


In [34]:
names(RCall.RealSxp)


Out[34]:
6-element Array{Symbol,1}:
 :info      
 :attrib    
 :p         
 :pv        
 :length    
 :truelength

In [35]:
names(RCall.VecSxp)


Out[35]:
6-element Array{Symbol,1}:
 :info      
 :attrib    
 :p         
 :pv        
 :length    
 :truelength

The Julia type is created by unsafe_load applied to the SEXP pointer returned by the R API function followed by overwriting two of the pointers used by R's garbage collector. The p pointer is to the original SEXP and the pv pointer is to the contents of a vector. The numerical value of the pv pointer is p plus a fixed offset, which happens to be 40 bytes on a 64-bit system.


In [36]:
RCall.voffset


Out[36]:
0x0000000000000028

In [37]:
m1.pv - m1.p


Out[37]:
0x0000000000000028

In [38]:
int(m1.pv - m1.p)


Out[38]:
40

The reason for storing both p and pv is because pv has both an address and an eltype. Conversion of pv with, say, pointer_to_array, returns a vector of the appropriate Julia bitstype.


In [39]:
m1coef = reval("coef(m1)");
pointer_to_array(m1coef.pv,m1coef.length)


Out[39]:
6-element Array{Float64,1}:
 30.5023   
 -1.66615  
  0.0842538
  0.667737 
  1.16778  
 -4.13657  

or, equivalently,


In [40]:
vec(m1coef)


Out[40]:
6-element Array{Float64,1}:
 30.5023   
 -1.66615  
  0.0842538
  0.667737 
  1.16778  
 -4.13657  

Although this result looks like the result of rcopy(m1coef) there is an important difference between the two. The contents of the vec result are in storage allocated by R and may be trashed by the R garbage collector. The contents of the rcopy result have been copied to storage allocated by Julia and controlled by the Julia garbage collector.

This is why reval is a low-level interface. It gives you enough rope to hang yourself.

Another use of sexp is unravelling some of the fields in an SEXPREC. As mentioned previously, R objects can contain "attributes". The attrib field in a Julia SEXPREC type is the pointer to the attributes.


In [41]:
m1.attrib


Out[41]:
Ptr{Void} @0x0000000006b3d058

In [42]:
sexp(m1.attrib)


Out[42]:
ListSxp(2,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000006b3d058,Ptr{Void} @0x0000000006b3d090,Ptr{Void} @0x000000000cad3360,Ptr{Void} @0x0000000006b3d020,Ptr{Void} @0x000000000876e5e8)

Somewhat confusingly, what is called a list in R is internally a VecSxp or vector of SEXPs (pointers to SEXPRECs). The type of SEXPREC known as a ListSxp is a cons cell. Types that can occur in a cons cell are


In [43]:
RCall.PairList


Out[43]:
Union(ListSxp,NilSxp,LangSxp)

Except for NilSxp, which is the NULL sentinel value, these PairList types have car, cdr and tag fields.


In [44]:
names(RCall.ListSxp)


Out[44]:
7-element Array{Symbol,1}:
 :info     
 :attrib   
 :p        
 :genc_prev
 :car      
 :cdr      
 :tag      

(The genc_prev pointer field in this and other other non-vector concrete SEXPREC types is an artifact of the constructor's using unsafe_copy. It is not used in Julia.)

The vector-like concrete Julia SEXPREC types are


In [45]:
RCall.RVector


Out[45]:
Union(CplxSxp,StrSxp,CharSxp,LglSxp,IntSxp,RealSxp,ExprSxp,VecSxp,RawSxp)

which are further divided into


In [46]:
RCall.VectorAtomic


Out[46]:
Union(CplxSxp,CharSxp,LglSxp,IntSxp,RealSxp,RawSxp)

for which vec returns an array of Numbers, and


In [47]:
RCall.VectorList


Out[47]:
Union(StrSxp,ExprSxp,VecSxp)

for which vec returns a vector of pointers.


In [48]:
vec(m1)


Out[48]:
22-element Array{Ptr{None},1}:
 Ptr{Void} @0x00000000040136c0
 Ptr{Void} @0x0000000007159c98
 Ptr{Void} @0x0000000006b18540
 Ptr{Void} @0x0000000007159e48
 Ptr{Void} @0x0000000007159db8
 Ptr{Void} @0x0000000007159ed8
 Ptr{Void} @0x0000000006b18470
 Ptr{Void} @0x0000000007cc7580
 Ptr{Void} @0x000000000ca6e9d0
 Ptr{Void} @0x000000000c545520
 Ptr{Void} @0x0000000004bea550
 Ptr{Void} @0x0000000007020d88
 Ptr{Void} @0x000000000927b050
 Ptr{Void} @0x0000000007ef30b0
 Ptr{Void} @0x000000000a579498
 Ptr{Void} @0x000000000a579498
 Ptr{Void} @0x0000000006b3aab0
 Ptr{Void} @0x00000000094a3448
 Ptr{Void} @0x000000000acde988
 Ptr{Void} @0x000000000926f318
 Ptr{Void} @0x000000000cae9c50
 Ptr{Void} @0x000000000a2aff10

It is usually more meaningful to map sexp over this vector of pointers


In [49]:
map(sexp,vec(m1))


Out[49]:
22-element Array{SEXPREC,1}:
 RealSxp(-2147483506,Ptr{Void} @0x0000000004014018,Ptr{Void} @0x00000000040136c0,Ptr{Float64} @0x00000000040136e8,6,0)                                                                           
 RealSxp(536871054,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000007159c98,Ptr{Float64} @0x0000000007159cc0,1,0)                                                                             
 RealSxp(-536870770,Ptr{Void} @0x0000000004013b80,Ptr{Void} @0x0000000006b18540,Ptr{Float64} @0x0000000006b18568,20,0)                                                                           
 RealSxp(536871054,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000007159e48,Ptr{Float64} @0x0000000007159e70,1,0)                                                                             
 LglSxp(536871050,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000007159db8,Ptr{Int32} @0x0000000007159de0,1,0)                                                                                
 IntSxp(536871053,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000007159ed8,Ptr{Int32} @0x0000000007159f00,1,0)                                                                                
 RealSxp(-536870770,Ptr{Void} @0x0000000004012b80,Ptr{Void} @0x0000000006b18470,Ptr{Float64} @0x0000000006b18498,20,0)                                                                           
 RealSxp(-536870770,Ptr{Void} @0x0000000009d26760,Ptr{Void} @0x0000000007cc7580,Ptr{Float64} @0x0000000007cc75a8,20,0)                                                                           
 VecSxp(-536870765,Ptr{Void} @0x0000000006631fb0,Ptr{Void} @0x000000000ca6e9d0,Ptr{Ptr{None}} @0x000000000ca6e9f8,32,0)                                                                          
 VecSxp(-2147483469,Ptr{Void} @0x000000000c545f58,Ptr{Void} @0x000000000c545520,Ptr{Ptr{None}} @0x000000000c545548,8,0)                                                                          
 VecSxp(1610612915,Ptr{Void} @0x0000000004be9848,Ptr{Void} @0x0000000004bea550,Ptr{Ptr{None}} @0x0000000004bea578,4,0)                                                                           
 IntSxp(536871053,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000007020d88,Ptr{Int32} @0x0000000007020db0,1,0)                                                                                
 RealSxp(1610612878,Ptr{Void} @0x0000000004748438,Ptr{Void} @0x000000000927b050,Ptr{Float64} @0x000000000927b078,4,0)                                                                            
 RealSxp(-536870770,Ptr{Void} @0x0000000007ef3978,Ptr{Void} @0x0000000007ef30b0,Ptr{Float64} @0x0000000007ef30d8,36,0)                                                                           
 IntSxp(536871053,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000a579498,Ptr{Int32} @0x000000000a5794c0,1,0)                                                                                
 IntSxp(536871053,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000a579498,Ptr{Int32} @0x000000000a5794c0,1,0)                                                                                
 VecSxp(147,Ptr{Void} @0x0000000006b3aa40,Ptr{Void} @0x0000000006b3aab0,Ptr{Ptr{None}} @0x0000000006b3aad8,0,0)                                                                                  
 LangSxp(134,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x00000000094a3448,Ptr{Void} @0x00000000094a3480,Ptr{Void} @0x0000000007006510,Ptr{Void} @0x00000000094a3480,Ptr{Void} @0x000000000876ece8)
 LangSxp(166,Ptr{Void} @0x000000000cae9320,Ptr{Void} @0x000000000acde988,Ptr{Void} @0x000000000acde9c0,Ptr{Void} @0x00000000071d48c0,Ptr{Void} @0x000000000acde9c0,Ptr{Void} @0x000000000876ece8)
 IntSxp(1610612877,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000926f318,Ptr{Int32} @0x000000000926f340,6,0)                                                                               
 VecSxp(-2147483469,Ptr{Void} @0x000000000cae8fa0,Ptr{Void} @0x000000000cae9c50,Ptr{Ptr{None}} @0x000000000cae9c78,6,0)                                                                          
 RealSxp(-536870770,Ptr{Void} @0x00000000072e9f10,Ptr{Void} @0x000000000a2aff10,Ptr{Float64} @0x000000000a2aff38,120,0)                                                                          

The SymSxp type is an R symbol. The CharSxp type is character string represented as a vector vector of bytes. Although the byte vector is null-terminated the length field is the string length (i.e. the number of characters before the null terminator).


In [50]:
sexp(sexp(m1.attrib).tag)


Out[50]:
SymSxp(291504129,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000876e5e8,Ptr{Void} @0x000000000a985760,Ptr{Void} @0x00000000087ddf58,Ptr{Void} @0x00000000071d58b0,Ptr{Void} @0x000000000876ece8)

In [51]:
names(RCall.SymSxp)


Out[51]:
7-element Array{Symbol,1}:
 :info     
 :attrib   
 :p        
 :genc_prev
 :pname    
 :value    
 :internal 

In [52]:
snm = sexp(sexp(sexp(m1.attrib).tag).pname)


Out[52]:
CharSxp(822108425,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x00000000087ddf58,Ptr{Uint8} @0x00000000087ddf80,5,7635907)

In [53]:
vec(snm)


Out[53]:
5-element Array{Uint8,1}:
 0x6e
 0x61
 0x6d
 0x65
 0x73

In [54]:
bytestring(vec(snm))


Out[54]:
"names"

Don't confuse a CharSxp with a "character" vector in R. The "character" type in the R REPL is a StrSxp, which internally is a vector of pointers to CharSxp objects. Because there are no scalars at the level of the R REPL, CharSxp objects cannot occur there.

It is worthwhile examining the "all data is stored in a vector" approach of R. A single numeric value is represented as a vector of length 1.


In [55]:
"1" |> reval


Out[55]:
RealSxp(536871054,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000d96fd78,Ptr{Float64} @0x000000000d96fda0,1,0)

Also, we can see that R has the peculiar convention that numeric literals, even those without a decimal point, are converted to Float64 values. An integer literal value is written as


In [56]:
"1L" |> reval


Out[56]:
IntSxp(536871053,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000d96fc88,Ptr{Int32} @0x000000000d96fcb0,1,0)

Returning to the unravelling of attributes, the car of the first cons cell is the vector of names of the R "list".


In [57]:
sexp(sexp(m1.attrib).car)


Out[57]:
StrSxp(-536870768,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000cad3360,Ptr{Ptr{None}} @0x000000000cad3388,22,0)

In [58]:
sexp(sexp(m1.attrib).car) |> rcopy


Out[58]:
22-element Array{ASCIIString,1}:
 "coefficients"  
 "scale"         
 "residuals"     
 "loss"          
 "converged"     
 "iter"          
 "fitted.values" 
 "rweights"      
 "control"       
 "init.S"        
 "qr"            
 "rank"          
 "ostats"        
 "cov"           
 "df.residual"   
 "degree.freedom"
 "xlevels"       
 "call"          
 "terms"         
 "assign"        
 "model"         
 "x"             

The cdr of the first cons cell is a pointer to the next cons cell or to the NilSxp (there is only one NilSxp).


In [59]:
sexp(sexp(m1.attrib).cdr)


Out[59]:
ListSxp(2,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000006b3d020,Ptr{Void} @0x0000000006b3d058,Ptr{Void} @0x0000000006b3bc28,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x000000000876eab8)

In [60]:
rcopy(sexp(sexp(m1.attrib).cdr).tag)


Out[60]:
:class

In [61]:
rcopy(sexp(sexp(m1.attrib).cdr).car)


Out[61]:
1-element Array{ASCIIString,1}:
 "lmrob"

In [62]:
sexp(sexp(sexp(m1.attrib).cdr).cdr)


Out[62]:
NilSxp(285212800,Ptr{Void} @0x000000000876ece8)

If you are looking for a particular attribute you can access it by name using getAttrib


In [63]:
getAttrib(m1,:class)


Out[63]:
StrSxp(536871056,Ptr{Void} @0x000000000876ece8,Ptr{Void} @0x0000000006b3bc28,Ptr{Ptr{None}} @0x0000000006b3bc50,1,0)

In [64]:
getAttrib(m1,:class) |> rcopy


Out[64]:
1-element Array{ASCIIString,1}:
 "lmrob"

If an attribute of the given name does not exist getAttrib returns NilSxp.


In [65]:
getAttrib(m1,:foo)


Out[65]:
NilSxp(285212800,Ptr{Void} @0x000000000876ece8)

Following the Jon Snow convention, rcopy returns the Julia nothing object for an NilSxp


In [66]:
getAttrib(m1,:foo) |> rcopy

Currently the Base.names method for a SEXPREC returns the R names attribute. This is convenient if you know of the names function in R but may be changed in later versions of RCall. (It is in some ways a misuse of the Base.names generic which should return the names of fields from a Julia object.)


In [67]:
names(m1)


Out[67]:
22-element Array{Union(ASCIIString,UTF8String),1}:
 "coefficients"  
 "scale"         
 "residuals"     
 "loss"          
 "converged"     
 "iter"          
 "fitted.values" 
 "rweights"      
 "control"       
 "init.S"        
 "qr"            
 "rank"          
 "ostats"        
 "cov"           
 "df.residual"   
 "degree.freedom"
 "xlevels"       
 "call"          
 "terms"         
 "assign"        
 "model"         
 "x"             

length and size methods return the usual results


In [68]:
length(m1)


Out[68]:
22

In [69]:
size(m1)


Out[69]:
(22,)

In [70]:
length(reval("coef(summary(m1))"))


Out[70]:
24

In [71]:
size(reval("coef(summary(m1))"))


Out[71]:
(6,4)