R-Python integration


In [5]:
from IPython.core.display import HTML

In [7]:
import cPickle
import collections
import gzip
import numpy
import os
import sys
import theano
from theano.tensor.shared_randomstreams import RandomStreams
import time

import theano.tensor as T
from numpy import dtype

In [37]:
from theano import tensor as T
from theano import function, shared
import numpy

x = shared(numpy.array([[0,1,2], [0,1,2]]))
z = shared(numpy.array([[0,1,1], [0,1,1]]))
size_of_x = 2

In [39]:
x.get_value()


Out[39]:
array([[0, 1, 2],
       [0, 1, 2]])

In [41]:
y = theano.tensor.mean(x)

In [71]:
train_y = numpy.array([0,1,2,4,5,6 ,7, 8, 9, 2, 8])

In [72]:
train_y_T = train_y[numpy.newaxis].T

In [76]:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(n_values = 10, dtype = theano.config.floatX, sparse=False)

In [74]:
encode_train_y = enc.fit_transform(train_y_T)

In [68]:
train_y_T


Out[68]:
array([[0],
       [1],
       [2],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9],
       [2],
       [8]])

In [75]:
encode_train_y


Out[75]:
array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.]], dtype=float32)

In [36]:
x[:,:size_of_x].get_value()


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-36-82d6fdd42f26> in <module>()
----> 1 x[:,:size_of_x].get_value()

AttributeError: 'TensorVariable' object has no attribute 'get_value'

In [ ]:
- T.mean(self.x[:,:size_of_x] * T.log(z[:,:size_of_x]) + (1 - self.x[:,:size_of_x]) * T.log(1 - z[:,:size_of_x]), axis=1)

In [32]:
a = {}
a['s'] =1
a['b'] =2

In [33]:
a


Out[33]:
{'b': 2, 's': 1}

In [34]:
b = a.copy()
b['s'] =3
b['c']=3
b


Out[34]:
{'b': 2, 'c': 3, 's': 3}

In [35]:
a


Out[35]:
{'b': 2, 's': 1}

In [8]:
import pandas as pd
import numpy as np
import rpy2.robjects as robjects
import rpy2.robjects as ro

Running simple command/code in R


In [5]:
pi = robjects.r('pi')
pi[0]


Out[5]:
3.141592653589793

Running code using IPython R magic


In [10]:
%load_ext rmagic
%load_ext rpy2.ipython


The rmagic extension is already loaded. To reload it, use:
  %reload_ext rmagic

Create data in R, compute in R, return result to Python

Run linear regression in R, print out a summary, and pass the result variable error back to Python:


In [11]:
%%R -o error
set.seed(10)
y<-c(1:1000)
x1<-c(1:1000)*runif(1000,min=0,max=2)
x2<-(c(1:1000)*runif(1000,min=0,max=2))^2
x3<-log(c(1:1000)*runif(1000,min=0,max=2))

all_data<-data.frame(y,x1,x2,x3)
positions <- sample(nrow(all_data),size=floor((nrow(all_data)/4)*3))
training<- all_data[positions,]
testing<- all_data[-positions,]

lm_fit<-lm(y~x1+x2+x3,data=training)
print(summary(lm_fit))

predictions<-predict(lm_fit,newdata=testing)
error<-sqrt((sum((testing$y-predictions)^2))/nrow(testing))


Call:
lm(formula = y ~ x1 + x2 + x3, data = training)

Residuals:
    Min      1Q  Median      3Q     Max 
-379.34 -125.71  -29.88   87.58  732.59 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -5.234e+01  2.495e+01  -2.098   0.0363 *  
x1           2.414e-01  1.589e-02  15.188   <2e-16 ***
x2           1.553e-04  9.767e-06  15.900   <2e-16 ***
x3           6.404e+01  4.827e+00  13.267   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 166.4 on 746 degrees of freedom
Multiple R-squared: 0.6613,	Adjusted R-squared: 0.6599 
F-statistic: 485.5 on 3 and 746 DF,  p-value: < 2.2e-16 


In [12]:
print error


[1] 169.8533

Create data in R, compute in Python

First we create the data in R:


In [13]:
%%R -o training,testing
set.seed(10)
y<-c(1:1000)
x1<-c(1:1000)*runif(1000,min=0,max=2)
x2<-(c(1:1000)*runif(1000,min=0,max=2))^2
x3<-log(c(1:1000)*runif(1000,min=0,max=2))

all_data<-data.frame(y,x1,x2,x3)
positions <- sample(nrow(all_data),size=floor((nrow(all_data)/4)*3))
training<- all_data[positions,]
testing<- all_data[-positions,]

The variables training and testing are now available as numpy array in Python namespace due to the -o flag in the cell above. We'll create pandas DataFrame from them:


In [8]:
tr = pd.DataFrame(dict(zip(['y', 'x1', 'x2', 'x3'], training)))
te = pd.DataFrame(dict(zip(['y', 'x1', 'x2', 'x3'], testing)))

tr.head()


Out[8]:
x1 x2 x3 y
0 724.861370 19728.318211 6.430894 614
1 103.074180 928.821687 5.132348 108
2 606.561051 1050676.686068 6.564257 518
3 862.674044 91504.275820 4.670171 879
4 393.014599 1134.679888 5.721699 379

Create linear regression model, print a summary:


In [9]:
from statsmodels.formula.api import ols

lm = ols('y ~ x1 + x2 + x3', tr).fit()
lm.summary()


Out[9]:
OLS Regression Results
Dep. Variable: y R-squared: 0.661
Model: OLS Adj. R-squared: 0.660
Method: Least Squares F-statistic: 485.5
Date: Sun, 05 May 2013 Prob (F-statistic): 7.53e-175
Time: 12:06:08 Log-Likelihood: -4898.0
No. Observations: 750 AIC: 9804.
Df Residuals: 746 BIC: 9823.
Df Model: 3
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -52.3400 24.950 -2.098 0.036 -101.321 -3.359
x1 0.2414 0.016 15.188 0.000 0.210 0.273
x2 0.0002 9.77e-06 15.900 0.000 0.000 0.000
x3 64.0431 4.827 13.267 0.000 54.567 73.520
Omnibus: 85.222 Durbin-Watson: 1.999
Prob(Omnibus): 0.000 Jarque-Bera (JB): 112.468
Skew: 0.898 Prob(JB): 3.78e-25
Kurtosis: 3.609 Cond. No. 3.41e+06

Predict and compute RMSE:


In [10]:
pred = lm.predict(te)

error = sqrt((sum((te.y - pred)**2)) / len(te))
error


Out[10]:
169.85333821453432

Create data in Python, compute in R

First we create data (numpy array) in Python:


In [11]:
X = np.array([0,1,2,3,4])
Y = np.array([3,5,4,6,7])

We pass them into R using the -i flag, run linear regression in R, print a summary and plot, output the result back in Python:


In [12]:
%%R -i X,Y -o XYcoef
XYlm = lm(Y~X)
XYcoef = coef(XYlm)
print(summary(XYlm))
par(mfrow=c(2,2))
plot(XYlm)


Call:
lm(formula = Y ~ X)

Residuals:
   1    2    3    4    5 
-0.2  0.9 -1.0  0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2000     0.6164   5.191   0.0139 *
X             0.9000     0.2517   3.576   0.0374 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.7958 on 3 degrees of freedom
Multiple R-squared:  0.81,	Adjusted R-squared: 0.7467 
F-statistic: 12.79 on 1 and 3 DF,  p-value: 0.03739 

We also pass the model coefficients from R as variable XYcoef:


In [13]:
XYcoef


Out[13]:
array([ 3.2,  0.9])

In [ ]: