Calling R Functions in IPython

Installation Instructions on Windows

%rmagic can be difficult to install/setup. These are the steps that worked for me. Please feel free to suggest additions, subtractions and edits that make the process easier. I advise you take the steps below in order if you can.
There can be some discrepencies when the environment variables do not exist when rpy2 is installed.

1. Install R

http://www.r-project.org/

2. Set Environment Variables

  • Add the path to R.dll to the environment variable PATH (C:\Program Files\R\R-3.0.3\bin\i386)
  • Add an environment variable R_HOME (C:\Program Files\R\R-3.0.3 in my case)
  • Add an environment variable R_USER (simply my username in Windows)

Python Modules

You can use pip or easy_install for many of these packages, however on Windows, I suggest you visit the Unofficial Windows Binaries for Python Extension Packages created by Christoph Gohlke at UC Irvine: http://www.lfd.uci.edu/~gohlke/pythonlibs/

Import Modules


In [ ]:
import numpy as NUM
import pylab as PYLAB
import arcpy as ARCPY
import numpy as NUM
import SSDataObject as SSDO
import scipy as SCIPY
import pandas as PANDAS

Initialize Data Object, Select Fields and Obtain Data

Use Case - Using Regression Trees to Analyze 2008 Presidential Vote in California Counties


In [ ]:
inputFC = r'../data/CA_Polygons.shp'
ssdo = SSDO.SSDataObject(inputFC)
ssdo.obtainData(ssdo.oidName, ['PCR2008', 'POPDEN08', 'PERCNOHS', 'MAJORO'])

Make Use of PANDAS Data Frame


In [ ]:
ids = [ssdo.order2Master[i] for i in range(ssdo.numObs)]
convertDictDF = {}
for fieldName in ssdo.fields.keys():
    convertDictDF[fieldName] = ssdo.fields[fieldName].data
df = PANDAS.DataFrame(convertDictDF, index = ids)
print(df[0:5])

Push PANDAS Data Frame to R Data Frame - Use the -i flag


In [ ]:
%load_ext rpy2.ipython
#%reload_ext rpy2.ipython

%R -i df

Analyze in R


In [ ]:
%R library(rms)
%R logit = lrm(MAJORO ~ PCR2008 + POPDEN08 + PERCNOHS, data = df, x = TRUE, y = TRUE)
%R z_scores = logit$coefficients / sqrt(diag(logit$var))

Pull Results Back to Python - Use the -o flag


In [ ]:
%R -o logit_coef logit_coef = logit$coefficients
%R -o p_values p_values = pnorm(abs(z_scores), lower.tail = FALSE) * 2.0

print("Coefficients")
py_coef = NUM.array(logit_coef)
print(py_coef)

print("p_values")
py_pvalues = NUM.array(p_values)
print(py_pvalues)