# Week 1 - Basics

We begin with some basics, establishing that the environment is working.

``````

In [1]:

print "hello from python"

``````
``````

hello from python

``````
``````

In [2]:

import sys
sys.version

``````
``````

Out[2]:

'2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan 10 2014, 11:23:15) \n[GCC 4.0.1 (Apple Inc. build 5493)]'

``````

``````

In [3]:

``````
``````

In [4]:

%R print("hello from R")

``````
``````

[1] "hello from R"

``````
``````

In [5]:

%R R.version

``````
``````

Out[5]:

array([['x86_64-apple-darwin10.8.0'],
['x86_64'],
['darwin10.8.0'],
['x86_64, darwin10.8.0'],
[''],
['3'],
['0.2'],
['2013'],
['09'],
['25'],
['63987'],
['R'],
['R version 3.0.2 (2013-09-25)'],
['Frisbee Sailing']],
dtype='|S28')

``````

## Simple plots

Okay, with that established let's run through a few simple statistical functions. First, in python:

``````

In [6]:

import random
x = [random.normalvariate(0, 1) for i in range(10000)]
sum(x) / len(x)

``````
``````

Out[6]:

-0.005532462765793515

``````
``````

In [7]:

%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(x, bins=20)
plt.show()

``````
``````

``````

Note: at this point I spent two hours configuring the ipython environment to work. I couldn't fix a problem that left the figures not drawing, so I gave up and installed the anaconda environment: https://store.continuum.io/cshop/anaconda/ and this worked fine, after separately installing rpy2 to provide R integration.

``````

In [8]:

%R x <- rnorm(10000, 0, 1)
%R hist(x)

``````
``````

Out[8]:

array([<rpy2.rinterface.SexpVector - Python:0x10a4a2120 / R:0x10a95ce40>,
<rpy2.rinterface.SexpVector - Python:0x10a4a2138 / R:0x102a6da08>,
<rpy2.rinterface.SexpVector - Python:0x10a4a21c8 / R:0x103b569f8>,
<rpy2.rinterface.SexpVector - Python:0x10a4a21e0 / R:0x103b56950>,
<rpy2.rinterface.SexpVector - Python:0x10a4a21f8 / R:0x102a24d48>,
<rpy2.rinterface.SexpVector - Python:0x10a4a2210 / R:0x102a24d18>], dtype=object)

``````

## Using R data in python

Just for fun, let's see if we can generate data in R and use it in Python.

``````

In [9]:

%R xl <- rlnorm(10000)

``````
``````

Out[9]:

array([ 3.76095608,  0.45328098,  2.71188165, ...,  2.31172431,
1.08341973,  0.32140222])

``````
``````

In [10]:

plt.hist(xl, bins=20)
plt.show()

``````
``````

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-b58144454c0b> in <module>()
----> 1 plt.hist(xl, bins=20)
2 plt.show()

NameError: name 'xl' is not defined

``````

That doesn't work, but there's another way:

``````

In [11]:

from rpy2.robjects import r
xl = r('rlnorm(10000)')
plt.hist(xl, bins=20)
plt.show()

``````
``````

``````

That worked!

Let's try again with a log y-axis to better show the range of data:

``````

In [12]:

plt.hist(xl, bins=20, log=True)
plt.show()

``````
``````

``````