Calling R, Python and C++ from Julia

  • Douglas Bates
  • U. of Wisconsin - Madison
  • github: dmbates

In 2012 I started using Julia

Good things

  • What Viral said: multiple dispatch, JIT method compilation, extensible type system, interesting language constructs.

Bad things

  • Many things I knew how to do in R were not available or needed to be relearned.
  • Re-learning takes a while for things like data visualization systems
  • R packages often provide data sets for illustration/experimentation. As a rule Julia packages don't.

The RCall package (JuliaStats/RCall.jl)

  • Julia has the ccall function. Steven Johnson had written PyCall. Avik created JavaCall
  • So I started writing RCall. Most of the recent work has been done by Randy Lai and Simon Byrne.
  • Basic approach
    • Create Julia immutables to mirror R's SEXPREC struct (see types.jl )
    • Locate and dlopen libR (see setup.jl )
    • Start an embedded R process
    • Call R's API
  • Functions of interest are reval, rcall, rcopy, etc.

In [1]:
using RCall  # make the package available
whos(RCall)


                        @R_str   2163 bytes  Function
                         @rget   1576 bytes  Function
                      @rimport   1860 bytes  Function
                     @rlibrary   1296 bytes  Function
                         @rput   1542 bytes  Function
                       @rusing    452 bytes  Function
                      @var_str    259 bytes  Function
                       CharSxp    136 bytes  DataType
                       ClosSxp    148 bytes  DataType
                       CplxSxp    136 bytes  DataType
                        IntSxp    136 bytes  DataType
                        LglSxp    136 bytes  DataType
                        NilSxp    112 bytes  DataType
                         RCall    566 KB     Module
                       RObject    168 bytes  DataType
                       RealSxp    136 bytes  DataType
                        StrSxp    136 bytes  DataType
                           Sxp     92 bytes  DataType
                         anyNA   1377 bytes  Function
                     getAttrib   2722 bytes  Function
                      getNames    978 bytes  Function
                     globalEnv      8 bytes  RCall.RObject{RCall.EnvSxp}
                      isFactor   1066 bytes  Function
                          isNA   4223 bytes  Function
                     isOrdered   1067 bytes  Function
                         rcall   3149 bytes  Function
                         rcopy     39 KB     Function
                         reval   6289 bytes  Function
                         rlang   2229 bytes  Function
                        rparse    829 bytes  Function
                        rprint   4106 bytes  Function
                    setAttrib!   4453 bytes  Function
                     setNames!   1054 bytes  Function

In [2]:
?rcall


search: rcall RCall remotecall remotecall_wait remotecall_fetch UniformScaling

Out[2]:

Evaluate a function in the global environment. The first argument corresponds to the function to be called. It can be either a FunctionSxp type, a SymSxp or a Symbol.


In [3]:
?rcopy


search: rcopy precompile __precompile__ readchomp ProcessGroup

Out[3]:

Evaluate and convert the result of a string as an R expression.

rcopy copies the contents of an R object into a corresponding canonical Julia type.

rcopy(T,p) converts a pointer p to a Sxp object to a native Julia object of type T.

rcopy(p) performs a default conversion.


In [4]:
mtcars = rcopy("mtcars")


Out[4]:
mpgcyldisphpdratwtqsecvsamgearcarb
121.06.0160.0110.03.92.6216.460.01.04.04.0
221.06.0160.0110.03.92.87517.020.01.04.04.0
322.84.0108.093.03.852.3218.611.01.04.01.0
421.46.0258.0110.03.083.21519.441.00.03.01.0
518.78.0360.0175.03.153.4417.020.00.03.02.0
618.16.0225.0105.02.763.4620.221.00.03.01.0
714.38.0360.0245.03.213.5715.840.00.03.04.0
824.44.0146.762.03.693.1920.01.00.04.02.0
922.84.0140.895.03.923.1522.91.00.04.02.0
1019.26.0167.6123.03.923.4418.31.00.04.04.0
1117.86.0167.6123.03.923.4418.91.00.04.04.0
1216.48.0275.8180.03.074.0717.40.00.03.03.0
1317.38.0275.8180.03.073.7317.60.00.03.03.0
1415.28.0275.8180.03.073.7818.00.00.03.03.0
1510.48.0472.0205.02.935.2517.980.00.03.04.0
1610.48.0460.0215.03.05.42417.820.00.03.04.0
1714.78.0440.0230.03.235.34517.420.00.03.04.0
1832.44.078.766.04.082.219.471.01.04.01.0
1930.44.075.752.04.931.61518.521.01.04.02.0
2033.94.071.165.04.221.83519.91.01.04.01.0
2121.54.0120.197.03.72.46520.011.00.03.01.0
2215.58.0318.0150.02.763.5216.870.00.03.02.0
2315.28.0304.0150.03.153.43517.30.00.03.02.0
2413.38.0350.0245.03.733.8415.410.00.03.04.0
2519.28.0400.0175.03.083.84517.050.00.03.02.0
2627.34.079.066.04.081.93518.91.01.04.01.0
2726.04.0120.391.04.432.1416.70.01.05.02.0
2830.44.095.1113.03.771.51316.91.01.05.02.0
2915.88.0351.0264.04.223.1714.50.01.05.04.0
3019.76.0145.0175.03.622.7715.50.01.05.06.0
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

A simpler interface

  • the character R followed by a string evaluates the string as an R expression. The behavior is defined in @R_str. See rstr.jl.
  • Julia allows multiline strings using triple quotes
  • string interpolation of Julia objects is allowed using $, only when it is not used as valid R syntax.

In [5]:
R"""
suppressMessages(library(lme4))
fm1 <- lmer(Yield ~ 1 + (1 | Batch), Dyestuff, REML = FALSE)
getME(fm1, "theta")
"""[1]


Out[5]:
0.7525806925786059

In [6]:
using DataFrames
brownian(N) = DataFrame(x = 1:N, y = cumsum(randn(N)))
srand(1234321)  # set random number seed
br = brownian(10_000);
R"""
suppressMessages(library(ggplot2))
ggplot($br, aes(x = x, y = y)) + geom_line()
"""


Out[6]:
RCall.RObject{RCall.VecSxp}

An even simpler interface

  • Most of base Julia, including the REPL, is written in Julia
  • The REPL allows for different, user-extensible, modes: julia, shell, help, ..
  • Modes can be switched by the first character you type at the prompt (? for help; ; for shell)
  • RCall adds an R REPL mode (magic char is $). Cxx adds a C++ REPL mode (magic is <).

RCall summary

  • There is no "glue" code written in a compiled language. The whole package is written in Julia.
  • It is feasible to emulate the internal R structures in Julia. The other direction would be very difficult.
  • ccall, cglobal, cfunction, Libdl.dlopen allow low-level access to a C API.
  • String macros and REPL modes allow for a familiar interface.

PyCall (stevengj/PyCall.jl)

  • the @pyimport macro functions like import in Python.

In [7]:
using PyCall
@pyimport numpy as np
@pyimport pandas as pd
@pyimport feather

In [8]:
whos(pd)


                   Categorical      8 bytes  PyCall.PyObject
              CategoricalIndex      8 bytes  PyCall.PyObject
                     DataFrame      8 bytes  PyCall.PyObject
                    DateOffset      8 bytes  PyCall.PyObject
                 DatetimeIndex      8 bytes  PyCall.PyObject
                     ExcelFile      8 bytes  PyCall.PyObject
                   ExcelWriter      8 bytes  PyCall.PyObject
                          Expr      8 bytes  PyCall.PyObject
                  Float64Index      8 bytes  PyCall.PyObject
                       Grouper      8 bytes  PyCall.PyObject
                      HDFStore      8 bytes  PyCall.PyObject
                         Index      8 bytes  PyCall.PyObject
                    IndexSlice      8 bytes  PyCall.PyObject
                    Int64Index      8 bytes  PyCall.PyObject
                    MultiIndex      8 bytes  PyCall.PyObject
                           NaT      8 bytes  PyCall.PyObject
                         Panel      8 bytes  PyCall.PyObject
                       Panel4D      8 bytes  PyCall.PyObject
                        Period      8 bytes  PyCall.PyObject
                   PeriodIndex      8 bytes  PyCall.PyObject
                    RangeIndex      8 bytes  PyCall.PyObject
                        Series      8 bytes  PyCall.PyObject
                   SparseArray      8 bytes  PyCall.PyObject
               SparseDataFrame      8 bytes  PyCall.PyObject
                    SparseList      8 bytes  PyCall.PyObject
                   SparsePanel      8 bytes  PyCall.PyObject
                  SparseSeries      8 bytes  PyCall.PyObject
              SparseTimeSeries      8 bytes  PyCall.PyObject
                          Term      8 bytes  PyCall.PyObject
                   TimeGrouper      8 bytes  PyCall.PyObject
                    TimeSeries      8 bytes  PyCall.PyObject
                     Timedelta      8 bytes  PyCall.PyObject
                TimedeltaIndex      8 bytes  PyCall.PyObject
                     Timestamp      8 bytes  PyCall.PyObject
                     WidePanel      8 bytes  PyCall.PyObject
                      __anon__     10 KB     Module
                         algos      8 bytes  PyCall.PyObject
                   bdate_range      8 bytes  PyCall.PyObject
                        compat      8 bytes  PyCall.PyObject
                   computation      8 bytes  PyCall.PyObject
                        concat      8 bytes  PyCall.PyObject
                          core      8 bytes  PyCall.PyObject
                      crosstab      8 bytes  PyCall.PyObject
                           cut      8 bytes  PyCall.PyObject
                    date_range      8 bytes  PyCall.PyObject
                      datetime      8 bytes  PyCall.PyObject
                     datetools      8 bytes  PyCall.PyObject
                    dependency     16 bytes  ASCIIString
               describe_option      8 bytes  PyCall.PyObject
                          eval      8 bytes  PyCall.PyObject
                          ewma      8 bytes  PyCall.PyObject
                       ewmcorr      8 bytes  PyCall.PyObject
                        ewmcov      8 bytes  PyCall.PyObject
                        ewmstd      8 bytes  PyCall.PyObject
                        ewmvar      8 bytes  PyCall.PyObject
                        ewmvol      8 bytes  PyCall.PyObject
               expanding_apply      8 bytes  PyCall.PyObject
                expanding_corr      8 bytes  PyCall.PyObject
               expanding_count      8 bytes  PyCall.PyObject
                 expanding_cov      8 bytes  PyCall.PyObject
                expanding_kurt      8 bytes  PyCall.PyObject
                 expanding_max      8 bytes  PyCall.PyObject
                expanding_mean      8 bytes  PyCall.PyObject
              expanding_median      8 bytes  PyCall.PyObject
                 expanding_min      8 bytes  PyCall.PyObject
            expanding_quantile      8 bytes  PyCall.PyObject
                expanding_skew      8 bytes  PyCall.PyObject
                 expanding_std      8 bytes  PyCall.PyObject
                 expanding_sum      8 bytes  PyCall.PyObject
                 expanding_var      8 bytes  PyCall.PyObject
                     factorize      8 bytes  PyCall.PyObject
                  fama_macbeth      8 bytes  PyCall.PyObject
                       formats      8 bytes  PyCall.PyObject
                   get_dummies      8 bytes  PyCall.PyObject
                    get_option      8 bytes  PyCall.PyObject
                     get_store      8 bytes  PyCall.PyObject
                       groupby      8 bytes  PyCall.PyObject
             hard_dependencies     65 bytes  Tuple{ASCIIString,ASCIIString,ASCI…
                     hashtable      8 bytes  PyCall.PyObject
                         index      8 bytes  PyCall.PyObject
                       indexes      8 bytes  PyCall.PyObject
                    infer_freq      8 bytes  PyCall.PyObject
                          info      8 bytes  PyCall.PyObject
                            io      8 bytes  PyCall.PyObject
                        isnull      8 bytes  PyCall.PyObject
                          json      8 bytes  PyCall.PyObject
                           lib      8 bytes  PyCall.PyObject
                      lreshape      8 bytes  PyCall.PyObject
                         match      8 bytes  PyCall.PyObject
                          melt      8 bytes  PyCall.PyObject
                         merge      8 bytes  PyCall.PyObject
          missing_dependencies      0 bytes  0-element Array{Any,1}
                       msgpack      8 bytes  PyCall.PyObject
                       notnull      8 bytes  PyCall.PyObject
                            np      8 bytes  PyCall.PyObject
                       offsets      8 bytes  PyCall.PyObject
                           ols      8 bytes  PyCall.PyObject
                option_context      8 bytes  PyCall.PyObject
                       options      8 bytes  PyCall.PyObject
                 ordered_merge      8 bytes  PyCall.PyObject
                        pandas      8 bytes  PyCall.PyObject
                        parser      8 bytes  PyCall.PyObject
                  period_range      8 bytes  PyCall.PyObject
                         pivot      8 bytes  PyCall.PyObject
                   pivot_table      8 bytes  PyCall.PyObject
                   plot_params    357 bytes  Dict{Any,Any} with 1 entry
                          pnow      8 bytes  PyCall.PyObject
                          qcut      8 bytes  PyCall.PyObject
                read_clipboard      8 bytes  PyCall.PyObject
                      read_csv      8 bytes  PyCall.PyObject
                    read_excel      8 bytes  PyCall.PyObject
                      read_fwf      8 bytes  PyCall.PyObject
                      read_gbq      8 bytes  PyCall.PyObject
                      read_hdf      8 bytes  PyCall.PyObject
                     read_html      8 bytes  PyCall.PyObject
                     read_json      8 bytes  PyCall.PyObject
                  read_msgpack      8 bytes  PyCall.PyObject
                   read_pickle      8 bytes  PyCall.PyObject
                      read_sas      8 bytes  PyCall.PyObject
                      read_sql      8 bytes  PyCall.PyObject
                read_sql_query      8 bytes  PyCall.PyObject
                read_sql_table      8 bytes  PyCall.PyObject
                    read_stata      8 bytes  PyCall.PyObject
                    read_table      8 bytes  PyCall.PyObject
                  reset_option      8 bytes  PyCall.PyObject
                 rolling_apply      8 bytes  PyCall.PyObject
                  rolling_corr      8 bytes  PyCall.PyObject
                 rolling_count      8 bytes  PyCall.PyObject
                   rolling_cov      8 bytes  PyCall.PyObject
                  rolling_kurt      8 bytes  PyCall.PyObject
                   rolling_max      8 bytes  PyCall.PyObject
                  rolling_mean      8 bytes  PyCall.PyObject
                rolling_median      8 bytes  PyCall.PyObject
                   rolling_min      8 bytes  PyCall.PyObject
              rolling_quantile      8 bytes  PyCall.PyObject
                  rolling_skew      8 bytes  PyCall.PyObject
                   rolling_std      8 bytes  PyCall.PyObject
                   rolling_sum      8 bytes  PyCall.PyObject
                   rolling_var      8 bytes  PyCall.PyObject
                rolling_window      8 bytes  PyCall.PyObject
                scatter_matrix      8 bytes  PyCall.PyObject
          set_eng_float_format      8 bytes  PyCall.PyObject
                    set_option      8 bytes  PyCall.PyObject
                 show_versions      8 bytes  PyCall.PyObject
                        sparse      8 bytes  PyCall.PyObject
                         stats      8 bytes  PyCall.PyObject
                          test      8 bytes  PyCall.PyObject
               timedelta_range      8 bytes  PyCall.PyObject
                   to_datetime      8 bytes  PyCall.PyObject
                    to_msgpack      8 bytes  PyCall.PyObject
                    to_numeric      8 bytes  PyCall.PyObject
                     to_pickle      8 bytes  PyCall.PyObject
                  to_timedelta      8 bytes  PyCall.PyObject
                         tools      8 bytes  PyCall.PyObject
                       tseries      8 bytes  PyCall.PyObject
                         tslib      8 bytes  PyCall.PyObject
                         types      8 bytes  PyCall.PyObject
                        unique      8 bytes  PyCall.PyObject
                          util      8 bytes  PyCall.PyObject
                  value_counts      8 bytes  PyCall.PyObject
                  wide_to_long      8 bytes  PyCall.PyObject

In [9]:
function brown(N::Number)
    v = zeros(N + 1)
    for i in 1:N
        v[i + 1] = v[i] + randn()
    end
    v
end


Out[9]:
brown (generic function with 1 method)

In [10]:
df = pd.DataFrame(Dict("br1" => pd.Series(brown(10_000)), "br2" => pd.Series(brown(10_000))))


Out[10]:
br1 br2
0 0.000000 0.000000
1 -0.773834 -0.006075
2 -0.537581 -0.281177
3 0.070239 0.989620
4 0.913922 0.272461
5 0.783253 0.943957
6 1.847161 0.796048
7 0.448132 1.478668
8 -2.358731 0.106409
9 -3.037559 -0.322209
10 -0.647172 -0.024637
11 -0.324446 0.887330
12 -1.087391 0.764816
13 -0.092598 1.635592
14 0.270325 1.103798
15 0.026771 0.601902
16 -1.469982 1.815822
17 -2.144328 0.957945
18 -3.344290 1.105669
19 -2.112758 1.303400
20 -3.072021 0.911289
21 -2.243848 1.488212
22 -1.276259 1.990742
23 -2.105055 2.864032
24 -1.909855 4.591759
25 -2.241574 4.498321
26 -2.712888 4.342676
27 -2.501568 4.245411
28 -2.737541 3.471929
29 -1.666890 3.263433
... ... ...
9971 -67.201469 -37.090684
9972 -68.372847 -37.603236
9973 -69.053534 -37.842471
9974 -67.648497 -38.373551
9975 -65.773006 -39.242039
9976 -66.520155 -39.173315
9977 -66.774692 -40.773012
9978 -67.839113 -39.828624
9979 -68.676012 -40.120377
9980 -68.364612 -38.445496
9981 -66.843212 -38.913253
9982 -66.390980 -38.201632
9983 -65.733951 -37.470297
9984 -65.099306 -38.119241
9985 -66.354152 -37.442763
9986 -68.651218 -38.013503
9987 -67.823224 -37.741410
9988 -68.053629 -38.639877
9989 -67.943860 -36.262430
9990 -66.721974 -35.792575
9991 -67.957720 -35.626207
9992 -68.531987 -34.892304
9993 -68.036708 -34.475120
9994 -69.109629 -34.392904
9995 -69.369302 -34.677581
9996 -69.953834 -36.628048
9997 -69.380250 -37.220133
9998 -69.382725 -36.422765
9999 -69.528264 -35.077929
10000 -68.787819 -35.410307

10001 rows × 2 columns


In [11]:
feather.write_dataframe(df, "/tmp/br.feather")

In [12]:
R"""
library(feather)
read_feather("/tmp/br.feather")
"""


Out[12]:
RCall.RObject{RCall.VecSxp}
Source: local data frame [10,001 x 2]

           br1          br2
         <dbl>        <dbl>
1   0.00000000  0.000000000
2  -0.77383429 -0.006074936
3  -0.53758071 -0.281177366
4   0.07023901  0.989619762
5   0.91392200  0.272461480
6   0.78325339  0.943957326
7   1.84716072  0.796047645
8   0.44813220  1.478668055
9  -2.35873065  0.106408928
10 -3.03755867 -0.322209156
..         ...          ...

In [ ]: