Week 1 - Basics

We begin with some basics, establishing that the environment is working.



In [1]:

    
print "hello from python"









    



hello from python



In [2]:

    
import sys
sys.version









    Out[2]:





'2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan 10 2014, 11:23:15) \n[GCC 4.0.1 (Apple Inc. build 5493)]'

Python is working just fine. Let's load up access to R:



In [3]:

    
%load_ext rmagic



In [4]:

    
%R print("hello from R")









    





[1] "hello from R"



In [5]:

    
%R R.version









    Out[5]:





array([['x86_64-apple-darwin10.8.0'],
       ['x86_64'],
       ['darwin10.8.0'],
       ['x86_64, darwin10.8.0'],
       [''],
       ['3'],
       ['0.2'],
       ['2013'],
       ['09'],
       ['25'],
       ['63987'],
       ['R'],
       ['R version 3.0.2 (2013-09-25)'],
       ['Frisbee Sailing']], 
      dtype='|S28')

Simple plots

Okay, with that established let's run through a few simple statistical functions. First, in python:



In [6]:

    
import random
x = [random.normalvariate(0, 1) for i in range(10000)]
sum(x) / len(x)









    Out[6]:





-0.005532462765793515



In [7]:

    
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(x, bins=20)
plt.show()

Note: at this point I spent two hours configuring the ipython environment to work. I couldn't fix a problem that left the figures not drawing, so I gave up and installed the anaconda environment: https://store.continuum.io/cshop/anaconda/ and this worked fine, after separately installing rpy2 to provide R integration.



In [8]:

    
%R x <- rnorm(10000, 0, 1)
%R hist(x)









    












    Out[8]:





array([<rpy2.rinterface.SexpVector - Python:0x10a4a2120 / R:0x10a95ce40>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a2138 / R:0x102a6da08>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21c8 / R:0x103b569f8>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21e0 / R:0x103b56950>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a21f8 / R:0x102a24d48>,
       <rpy2.rinterface.SexpVector - Python:0x10a4a2210 / R:0x102a24d18>], dtype=object)

Using R data in python

Just for fun, let's see if we can generate data in R and use it in Python.



In [9]:

    
%R xl <- rlnorm(10000)









    Out[9]:





array([ 3.76095608,  0.45328098,  2.71188165, ...,  2.31172431,
        1.08341973,  0.32140222])



In [10]:

    
plt.hist(xl, bins=20)
plt.show()









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-b58144454c0b> in <module>()
----> 1 plt.hist(xl, bins=20)
      2 plt.show()

NameError: name 'xl' is not defined

That doesn't work, but there's another way:



In [11]:

    
from rpy2.robjects import r
xl = r('rlnorm(10000)')
plt.hist(xl, bins=20)
plt.show()

That worked!

Let's try again with a log y-axis to better show the range of data:



In [12]:

    
plt.hist(xl, bins=20, log=True)
plt.show()