Week 1 - Basics

We begin with some basics, establishing that the environment is working.


In [1]:
print "hello from python"


hello from python

In [2]:
import sys
sys.version


Out[2]:
'2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan 10 2014, 11:23:15) \n[GCC 4.0.1 (Apple Inc. build 5493)]'

Python is working just fine. Let's load up access to R:


In [3]:
%load_ext rmagic

In [4]:
%R print("hello from R")


[1] "hello from R"

In [5]:
%R R.version


Out[5]:
array([['x86_64-apple-darwin10.8.0'],
       ['x86_64'],
       ['darwin10.8.0'],
       ['x86_64, darwin10.8.0'],
       [''],
       ['3'],
       ['0.2'],
       ['2013'],
       ['09'],
       ['25'],
       ['63987'],
       ['R'],
       ['R version 3.0.2 (2013-09-25)'],
       ['Frisbee Sailing']], 
      dtype='|S28')

Simple plots

Okay, with that established let's run through a few simple statistical functions. First, in python:


In [6]:
import random
x = [random.normalvariate(0, 1) for i in range(10000)]
sum(x) / len(x)


Out[6]:
-0.005532462765793515

In [7]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(x, bins=20)
plt.show()


Note: at this point I spent two hours configuring the ipython environment to work. I couldn't fix a problem that left the figures not drawing, so I gave up and installed the anaconda environment: https://store.continuum.io/cshop/anaconda/ and this worked fine, after separately installing rpy2 to provide R integration.


In [8]:
%R x <- rnorm(10000, 0, 1)
%R hist(x)


Out[8]:
array([<rpy2.rinterface.SexpVector - Python:0x10a4a2120 / R:0x10a95ce40>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a2138 / R:0x102a6da08>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21c8 / R:0x103b569f8>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21e0 / R:0x103b56950>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21f8 / R:0x102a24d48>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a2210 / R:0x102a24d18>], dtype=object)

Using R data in python

Just for fun, let's see if we can generate data in R and use it in Python.


In [9]:
%R xl <- rlnorm(10000)


Out[9]:
array([ 3.76095608,  0.45328098,  2.71188165, ...,  2.31172431,
        1.08341973,  0.32140222])

In [10]:
plt.hist(xl, bins=20)
plt.show()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-b58144454c0b> in <module>()
----> 1 plt.hist(xl, bins=20)
      2 plt.show()

NameError: name 'xl' is not defined

That doesn't work, but there's another way:


In [11]:
from rpy2.robjects import r
xl = r('rlnorm(10000)')
plt.hist(xl, bins=20)
plt.show()


That worked!

Let's try again with a log y-axis to better show the range of data:


In [12]:
plt.hist(xl, bins=20, log=True)
plt.show()