Week 1 - Basics

We begin with some basics, establishing that the environment is working.

In [1]:
print "hello from python"

hello from python

In [2]:
import sys

'2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan 10 2014, 11:23:15) \n[GCC 4.0.1 (Apple Inc. build 5493)]'

Python is working just fine. Let's load up access to R:

In [3]:
%load_ext rmagic

In [4]:
%R print("hello from R")

[1] "hello from R"

In [5]:
%R R.version

       ['x86_64, darwin10.8.0'],
       ['R version 3.0.2 (2013-09-25)'],
       ['Frisbee Sailing']], 

Simple plots

Okay, with that established let's run through a few simple statistical functions. First, in python:

In [6]:
import random
x = [random.normalvariate(0, 1) for i in range(10000)]
sum(x) / len(x)


In [7]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(x, bins=20)

Note: at this point I spent two hours configuring the ipython environment to work. I couldn't fix a problem that left the figures not drawing, so I gave up and installed the anaconda environment: https://store.continuum.io/cshop/anaconda/ and this worked fine, after separately installing rpy2 to provide R integration.

In [8]:
%R x <- rnorm(10000, 0, 1)
%R hist(x)

array([<rpy2.rinterface.SexpVector - Python:0x10a4a2120 / R:0x10a95ce40>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a2138 / R:0x102a6da08>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21c8 / R:0x103b569f8>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21e0 / R:0x103b56950>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21f8 / R:0x102a24d48>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a2210 / R:0x102a24d18>], dtype=object)

Using R data in python

Just for fun, let's see if we can generate data in R and use it in Python.

In [9]:
%R xl <- rlnorm(10000)

array([ 3.76095608,  0.45328098,  2.71188165, ...,  2.31172431,
        1.08341973,  0.32140222])

In [10]:
plt.hist(xl, bins=20)

NameError                                 Traceback (most recent call last)
<ipython-input-10-b58144454c0b> in <module>()
----> 1 plt.hist(xl, bins=20)
      2 plt.show()

NameError: name 'xl' is not defined

That doesn't work, but there's another way:

In [11]:
from rpy2.robjects import r
xl = r('rlnorm(10000)')
plt.hist(xl, bins=20)

That worked!

Let's try again with a log y-axis to better show the range of data:

In [12]:
plt.hist(xl, bins=20, log=True)