Note

To view This notebook properly, you need to disable Mixed Content Blocking warning in your browser

Firefox

Chrome

The scientist’s needs

  • Get data (simulation, experiment control)
  • Manipulate and process data.
  • Visualize results... to understand what we are doing!
  • Communicate results: produce figures for reports or publications, write presentations.

IPython

IPython provides a rich architecture for interactive computing with:

  • Powerful interactive shells (terminal and Qt-based).
  • A browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media.
  • Support for interactive data visualization and use of GUI toolkits.
  • Flexible, embeddable interpreters to load into your own projects.
  • Easy to use, high performance tools for parallel computing.

Backends

Bash (*NIX-based)


In [1]:
%%bash
ls -lh ~/ | head -n 3


total 34M
-rw-rw-r--  1 rmyeid rmyeid  69K Aug 31 22:11 aapl_ohlc.csv
-rw-rw-r--  1 rmyeid rmyeid  13K Apr 23 17:14 Download.pdf

In [2]:
!uname -a


Linux einstein 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Python

No need to introduce any special directive.


In [3]:
print("This is Python!")


This is Python!

In [4]:
def fact(n):
  if n <= 0:
    return 1
  return n*fact(n-1)

fact(20)


Out[4]:
2432902008176640000

Ruby


In [5]:
%%ruby
puts 'This is Ruby playing with Python!!!'


This is Ruby playing with Python!!!

Latex

You need to change the cell type to Markdown.

\begin{align} \nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\ \nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\ \nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\ \nabla \cdot \vec{\mathbf{B}} & = 0 \end{align}

or

\begin{equation*} \left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right) \end{equation*}

You like it! Here is more


In [6]:
from IPython.display import IFrame
IFrame('http://nbviewer.ipython.org/', width='100%', height=350)


Out[6]:

Python Resources

This is not a Python tutorial, we trust that you can pick the language so quickly if you follow any of the following resources:

Installation

Ubuntu Linux

You should either use the distribution's python packages or the packages available on PYPI using pip install. It is recommended that you use the most updated version of your linux distribution.

Example

$ sudo apt-get install python-numpy python-scipy
$ sudo apt-get install python-scikits-learn python-pandas
$ sudo apt-get install python-nltk python-sympy python-pip
$ sudo pip install ipython
$ sudo pip install bokeh

Mac OS X

This is harder in general, but you can use homebrew, macports, or just use Enthought or Ananconda Python distributions (Look at Windows instructions). Here, is a mac specific tutorial.

(Homebrew) Example

$ brew install python
$ pip install virtualenv virtualenvwrapper
$ pip install numpy
$ brew install gfortran
$ pip install scipy
$ brew install freetype
$ pip install matplotlib
$ pip install ipython bokeh

Windows

Windows lacks a good packaging system, so the easiest way to setup a Python environment is to install a pre-packaged distribution. Some good alternatives are:

Note

EPD and Anaconda CE are also available for Linux and Max OS X.

Scientific Python EcoSystem


In [7]:
%install_ext http://raw.github.com/jrjohansson/version_information/master/version_information.py
%load_ext version_information

%version_information numpy, scipy, matplotlib, sympy, scikit_learn, nltk, pandas


Installed version_information.py. To use it, type:
  %load_ext version_information
Out[7]:
SoftwareVersion
Python2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2]
IPython2.2.0
OSposix [linux2]
numpy1.8.1
scipy0.13.3
matplotlib1.3.1
sympysympy
scikit_learn0.15.1
nltk2.0.4
pandas0.13.1
Tue Sep 02 14:53:24 2014 EDT

Numpy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • A powerful N-dimensional array object
  • Sophisticated (broadcasting) functions
  • Tools for integrating C/C++ and Fortran code
  • Useful linear algebra, Fourier transform, and random number capabilities

In [1]:
import numpy as np
from __future__ import print_function

ndArrays

We can initialize arrays from Python lists or list of lists.


In [2]:
# a vector: the argument to the array function is a Python list
v = np.array([11, 12, 13, 14])
print('v =\n{}'.format(v))
# a matrix: the argument to the array function is a nested Python list
M = np.array([[2, 1], [3, 4]])
print('M =\n{}'.format(M))
print (type(v), type(M))


v =
[11 12 13 14]
M =
[[2 1]
 [3 4]]
<type 'numpy.ndarray'> <type 'numpy.ndarray'>

Basic Operations and Attributes

The array object has many useful attributes, like:

  • shape: size of each dimension.
  • dtype: data type.
  • ndim: number of dimensions.

Also many operations are available:

  • reshaping, flattening
  • aggregation as max, min and mean.
  • transpose.

In [10]:
print("v shape is {}".format(v.shape))
print("M shape is {}".format(M.shape))
print("Data type of v is {}".format(v.dtype))
print()
print("M transpose =\n{}".format(M.T))
print()
M.sort(axis=1)
print("M sorted by row =\n{}".format(np.asarray(M)))
print()
print("v stats are mean = {}, standard deviation = {:.4}, max = {}, min ={}".format(v.mean(), v.std(), v.max(), v.min()))
print()
print("Converting matrix M to a vector {}".format(M.flatten()))
print("Converting vector v to a matrix=\n{}".format(v.reshape(2,2)))
print()
print("M matrix size is {} and number of dimensions is {}".format(M.size, M.ndim))


v shape is (4,)
M shape is (2, 2)
Data type of v is int64

M transpose =
[[2 3]
 [1 4]]

M sorted by row =
[[1 2]
 [3 4]]

v stats are mean = 12.5, standard deviation = 1.118, max = 14, min =11

Converting matrix M to a vector [1 2 3 4]
Converting vector v to a matrix=
[[11 12]
 [13 14]]

M matrix size is 4 and number of dimensions is 2

Generating Arrays

Sequences of numbers as well as random numbers could be used to initialize arrays.


In [11]:
x = np.arange(0, 10, 1) # arguments: start, stop, step
print("Create a range\n{}".format(x))
print()
# using linspace, both end points ARE included
x = np.linspace(0, 10, 41)
print("Create a spaced range\n{}".format(x))
print()
# uniform random numbers in [0,1]
x = np.random.rand(4,4)
print("Create a uniform random matrix (4,4)\n{}".format(x))
print()
# a diagonal matrix
x = np.diag([1,2,3])
print("Create a digonal matrix\n{}".format(x))
print()
x = np.zeros((3,3))
print("Create a zero matrix (3,3) \n{}".format(x))


Create a range
[0 1 2 3 4 5 6 7 8 9]

Create a spaced range
[  0.     0.25   0.5    0.75   1.     1.25   1.5    1.75   2.     2.25
   2.5    2.75   3.     3.25   3.5    3.75   4.     4.25   4.5    4.75   5.
   5.25   5.5    5.75   6.     6.25   6.5    6.75   7.     7.25   7.5
   7.75   8.     8.25   8.5    8.75   9.     9.25   9.5    9.75  10.  ]

Create a uniform random matrix (4,4)
[[ 0.31281701  0.09433053  0.04063735  0.61049305]
 [ 0.9849427   0.03703267  0.79554371  0.48745227]
 [ 0.46420451  0.17958533  0.32115069  0.33124119]
 [ 0.96986526  0.74728439  0.93486614  0.24787973]]

Create a digonal matrix
[[1 0 0]
 [0 2 0]
 [0 0 3]]

Create a zero matrix (3,3) 
[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]

Indexing

ndarrays can be indexed using the standard Python $\mathbf{x}$[obj] syntax, where $\mathbf{x}$ is the array and obj the selection. There are three kinds of indexing available: record access, basic slicing, advanced indexing. Which one occurs depends on obj.


In [12]:
print("v[0] = {}\n".format(v[0]))
print("M =\n{}\n".format(M))
print("M[1, 1] = {}\n".format(M[1,1]))
print("M[1] = {}\n".format(M[1]))
print("M[1, :] = {}\n".format(M[1, :]))
print("M[:, 1] = {}\n".format(M[:, 1]))
print("M[1, :] = 0")
M[1, :] = 0
print("M =\n{}\n".format(M))


v[0] = 11

M =
[[1 2]
 [3 4]]

M[1, 1] = 4

M[1] = [3 4]

M[1, :] = [3 4]

M[:, 1] = [2 4]

M[1, :] = 0
M =
[[1 2]
 [0 0]]


In [13]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
print("A =\n{}\n".format(A))
print("A[1:4, 1:4]=\n{}\n".format(A[1:4, 1:4]))
print("A[::2, ::2]=\n{}\n".format(A[::2, ::2]))
print("A[ [1,4] ]=\n{}\n".format(A[[1,4]]))
print("A[ [1,4], [2,-1] ]=\n{}\n".format(A[[1,4],[2,-1]]))


A =
[[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]

A[1:4, 1:4]=
[[11 12 13]
 [21 22 23]
 [31 32 33]]

A[::2, ::2]=
[[ 0  2  4]
 [20 22 24]
 [40 42 44]]

A[ [1,4] ]=
[[10 11 12 13 14]
 [40 41 42 43 44]]

A[ [1,4], [2,-1] ]=
[12 44]

Conditions and Arithmetic


In [14]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
print("A =\n{}\n".format(A))
print("A > 20 =\n{}\n".format(A > 20))
print("np.where(A > 20) =\n{}\n".format(np.where(A > 20)))
print("np.argwhere(A > 20) =\n{}\n".format(np.argwhere(A > 20)))
print("A - 10 =\n{}\n".format(A - 10))
print("A * 10 =\n{}\n".format(A * 10))
print("A * A =\n{}\n".format(A * A))


A =
[[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]

A > 20 =
[[False False False False False]
 [False False False False False]
 [False  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]

np.where(A > 20) =
(array([2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]), array([1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))

np.argwhere(A > 20) =
[[2 1]
 [2 2]
 [2 3]
 [2 4]
 [3 0]
 [3 1]
 [3 2]
 [3 3]
 [3 4]
 [4 0]
 [4 1]
 [4 2]
 [4 3]
 [4 4]]

A - 10 =
[[-10  -9  -8  -7  -6]
 [  0   1   2   3   4]
 [ 10  11  12  13  14]
 [ 20  21  22  23  24]
 [ 30  31  32  33  34]]

A * 10 =
[[  0  10  20  30  40]
 [100 110 120 130 140]
 [200 210 220 230 240]
 [300 310 320 330 340]
 [400 410 420 430 440]]

A * A =
[[   0    1    4    9   16]
 [ 100  121  144  169  196]
 [ 400  441  484  529  576]
 [ 900  961 1024 1089 1156]
 [1600 1681 1764 1849 1936]]

Linear Algebra

Inverse, Determinant


In [15]:
print("np.linalg.det(A) = {}\n".format(np.linalg.det(A)))


np.linalg.det(A) = 0.0


In [16]:
try:
  print("np.linalg.inv(A) = {}\n".format(np.linalg.inv(A)))
except np.linalg.LinAlgError as e:
  print("Matrix is singular")


Matrix is singular

Inner Product

To calculate $||\mathbf{v}||_2 = \sqrt{\mathbf{v}^T \mathbf{v}}$ if $\mathbf{v} \in \mathbb{R}^d$


In [17]:
v = np.arange(5)
print("v = {}\n".format(v))
print("||v|| = np.linalg.norm(v) = {}\n".format(np.linalg.norm(v)))
print("np.dot(v.T, v) = {}\n".format(np.dot(v.T, v)))
print("np.dot(v.T, v) ** 0.5 = {}".format(np.dot(v.T, v) ** 0.5))


v = [0 1 2 3 4]

||v|| = np.linalg.norm(v) = 5.47722557505

np.dot(v.T, v) = 30

np.dot(v.T, v) ** 0.5 = 5.47722557505

Outer Product

To calculate $\mathbf{v} \mathbf{v}^T \in R^{d\times d}$


In [18]:
print("v.shape = {}".format(v.shape))
u = v[:, np.newaxis]
print("u = v[np.newaxis,:] =\n{}\n".format(u))
print("u.shape = {}".format(u.shape))
print("np.dot(u, u.T) =\n{}\n".format(np.dot(u, u.T)))
#print("np.linalg.inv(")


v.shape = (5,)
u = v[np.newaxis,:] =
[[0]
 [1]
 [2]
 [3]
 [4]]

u.shape = (5, 1)
np.dot(u, u.T) =
[[ 0  0  0  0  0]
 [ 0  1  2  3  4]
 [ 0  2  4  6  8]
 [ 0  3  6  9 12]
 [ 0  4  8 12 16]]

Multidimensional Data Processing

You can apply arithmetic to specific dimensions, like dividing each column by specific value. Moreover, you can aggregate quantities like sum over specific dimensions.


In [19]:
A = np.random.randint(0, 100, (4, 5))
v = np.arange(5) + 1.
u = np.arange(4) + 2.
print("A =\n{}\n".format(A))
print("A.max() = {}".format(A.max()))
print("A.max(axis=0) = {}".format(A.max(axis=0)))
print("A.min(axis=1) = {}".format(A.min(axis=1)))
print()
print("v = {}".format(v))
print("A / v =\n{}\n".format(A/v))
print()
print("u = {}".format(u))
print("(A.T - u).T =\n{}\n".format((A.T-u).T))
print("np.diff(A, axis=0) =\n{}\n".format(np.diff(A, axis=0)))
print("np.cumsum(A, axis=1) =\n{}\n".format(np.cumsum(A, axis=1)))


A =
[[51 98 56 95 17]
 [32 24  8 87  8]
 [ 5 95  1 17 45]
 [22  8 63 97 20]]

A.max() = 98
A.max(axis=0) = [51 98 63 97 45]
A.min(axis=1) = [17  8  1  8]

v = [ 1.  2.  3.  4.  5.]
A / v =
[[ 51.          49.          18.66666667  23.75         3.4       ]
 [ 32.          12.           2.66666667  21.75         1.6       ]
 [  5.          47.5          0.33333333   4.25         9.        ]
 [ 22.           4.          21.          24.25         4.        ]]


u = [ 2.  3.  4.  5.]
(A.T - u).T =
[[ 49.  96.  54.  93.  15.]
 [ 29.  21.   5.  84.   5.]
 [  1.  91.  -3.  13.  41.]
 [ 17.   3.  58.  92.  15.]]

np.diff(A, axis=0) =
[[-19 -74 -48  -8  -9]
 [-27  71  -7 -70  37]
 [ 17 -87  62  80 -25]]

np.cumsum(A, axis=1) =
[[ 51 149 205 300 317]
 [ 32  56  64 151 159]
 [  5 100 101 118 163]
 [ 22  30  93 190 210]]

Matplotlib

matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell, web application servers, and six graphical user interface toolkits.

Best practice to import matplotlib


In [20]:
%matplotlib inline
import matplotlib.pyplot as plt

Basic example


In [21]:
x = np.linspace(0, 5, 10)
y = x ** 2

In [22]:
fig, ax = plt.subplots()
ax.plot(x, x**2, label="$y = x^2$")
ax.plot(x, x**3, label="y = x**3")
ax.legend(loc=2); # upper left corner
ax.set_xlabel('x')
ax.set_ylabel('y', fontsize=38)
ax.set_title('Advertise Here');


/usr/lib/pymodules/python2.7/matplotlib/font_manager.py:1236: UserWarning: findfont: Font family ['monospace'] not found. Falling back to Bitstream Vera Sans
  (prop.get_family(), self.defaultFamily[fontext]))

Subplots


In [23]:
xx = np.linspace(-0.75, 1., 100)
n = np.array([0,1,2,3,4,5])

In [24]:
fig, axes = plt.subplots(1, 4, figsize=(12,3))

axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))
axes[0].set_title("scatter")

axes[1].step(n, n**2, lw=2)
axes[1].set_title("step")

axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5)
axes[2].set_title("bar")

axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);
axes[3].set_title("fill_between");


Histograms


In [25]:
# A histogram
n = np.random.randn(100000)
fig, axes = plt.subplots(1, 2, figsize=(12,4))

axes[0].hist(n)
axes[0].set_title("Default histogram")
axes[0].set_xlim((min(n), max(n)))

axes[1].hist(n, cumulative=True, bins=50)
axes[1].set_title("Cumulative detailed histogram")
axes[1].set_xlim((min(n), max(n)));


3D figures


In [26]:
from mpl_toolkits.mplot3d.axes3d import Axes3D

In [27]:
alpha = 0.7
phi_ext = 2 * np.pi * 0.5

def flux_qubit_potential(phi_m, phi_p):
    return 2 + alpha - 2 * np.cos(phi_p)*np.cos(phi_m) - alpha * np.cos(phi_ext - 2*phi_p)

phi_m = np.linspace(0, 2*np.pi, 100)
phi_p = np.linspace(0, 2*np.pi, 100)
X,Y = np.meshgrid(phi_p, phi_m)
Z = flux_qubit_potential(X, Y).T

In [28]:
fig = plt.figure(figsize=(8,6))

ax = fig.add_subplot(1,1,1, projection='3d')

ax.plot_surface(X, Y, Z, rstride=4, cstride=4, alpha=0.25)
cset = ax.contour(X, Y, Z, zdir='z', offset=-np.pi, cmap=plt.cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='x', offset=-np.pi, cmap=plt.cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='y', offset=3*np.pi, cmap=plt.cm.coolwarm)

ax.set_xlim3d(-np.pi, 2*np.pi);
ax.set_ylim3d(0, 3*np.pi);
ax.set_zlim3d(-np.pi, 2*np.pi);


Styling

To change your matplotlib figures styling, you have one several options:

  • Change matplotlib.rcParams values.

PrettyPlot Example


In [29]:
import prettyplotlib as ppl
import matplotlib as mpl

In [30]:
np.random.seed(12)

In [31]:
fig, ax = plt.subplots(1)

# Show the whole color range
for i in range(8):
    x = np.random.normal(loc=i, size=1000)
    y = np.random.normal(loc=i, size=1000)
    ppl.scatter(ax, x, y, label=str(i))

ppl.legend(ax)
_ = ax.set_title('prettyplotlib `scatter` example\nshowing default color cycle and scatter params')



In [32]:
from IPython.display import IFrame
IFrame('http://matplotlib.org/gallery.html#lines_bars_and_markers', width='100%', height=550)


Out[32]:

Further Reading

MPLD3

The mpld3 project brings together Matplotlib, and D3js, the popular Javascript library for creating interactive data visualizations for the web. The result is a simple API for exporting your matplotlib graphics to HTML code which can be used within the browser, within standard web pages, blogs, or tools such as the IPython notebook.


In [33]:
import mpld3
mpld3.enable_notebook()

In [34]:
np.random.seed(0)

P = np.random.random(size=10)
A = np.random.random(size=10)

x = np.linspace(0, 10, 100)
data = np.array([[x, Ai * np.sin(x / Pi)]
                 for (Ai, Pi) in zip(A, P)])

fig, ax = plt.subplots(2)

points = ax[1].scatter(P, A, c=P + A,
                       s=200, alpha=0.5)
ax[1].set_xlabel('Period')
ax[1].set_ylabel('Amplitude')

colors = plt.cm.ScalarMappable().to_rgba(P + A)

for (x, l), c in zip(data, colors):
    ax[0].plot(x, l, c=c, alpha=0.5, lw=3)



In [35]:
mpld3.disable_notebook()

Bokeh

Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, but also deliver this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.


In [36]:
import bokeh
try:
  from bokeh.sampledata import us_counties, unemployment
except:
  bokeh.sampledata.download()
  from bokeh.sampledata import us_counties, unemployment

In [37]:
from bokeh.plotting import *
colors = ["#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043"]

In [38]:
county_xs=[
    us_counties.data[code]['lons'] for code in us_counties.data
    if us_counties.data[code]['state'] == 'tx'
]
county_ys=[
    us_counties.data[code]['lats'] for code in us_counties.data
    if us_counties.data[code]['state'] == 'tx'
]

In [39]:
county_colors = []
for county_id in us_counties.data:
  if us_counties.data[county_id]['state'] != 'tx':
    continue
  try:
    rate = unemployment.data[county_id]
    idx = min(int(rate/2), 5)
    county_colors.append(colors[idx])
  except KeyError:
    county_colors.append("black")

In [40]:
output_notebook()
patches(county_xs, county_ys, fill_color=county_colors, fill_alpha=0.7,
        line_color="white", line_width=0.5, title="Texas Unemployment 2009")
show()


BokehJS successfully loaded.

In [41]:
from IPython.display import IFrame
IFrame('http://bokeh.pydata.org/docs/gallery.html', width='100%', height=550)


Out[41]:

IPython Interact and Widgets


In [42]:
from IPython.html.widgets import interact, RadioButtonsWidget, IntSliderWidget, TextWidget

Example 1


In [43]:
def plot_sine(freq):
  x = np.linspace(-np.pi, np.pi, num=1000)
  plt.plot(x, np.sin(2*np.pi*freq*x))

In [44]:
interact(plot_sine, freq=(1, 10, 0.5))


Out[44]:
<function __main__.plot_sine>

Example 2


In [45]:
def plot_sine2(amplitude, color, title):
    fig, ax = plt.subplots(figsize=(4, 3),
                           subplot_kw={'axisbg':'#EEEEEE',
                                       'axisbelow':True})
    ax.grid(color='w', linewidth=2, linestyle='solid')
    x = np.linspace(0, 10, 1000)
    ax.plot(x, amplitude * np.sin(x), color=color,
            lw=5, alpha=0.4)
    ax.set_xlim(0, 10)
    ax.set_ylim(-10.1, 10.1)
    ax.set_title(title)
    return fig

In [46]:
interact(plot_sine2,
         amplitude=IntSliderWidget(min=0, max=10, step=1,value=1),
         color=RadioButtonsWidget(values=['blue', 'green', 'red']),
         title=TextWidget(value="Advertise here"))


Out[46]:
<function __main__.plot_sine2>

Inspiring Plots from Plotly


In [47]:
from IPython.display import IFrame
IFrame('https://plot.ly/feed', width='100%', height=550)


Out[47]:

Pandas

pandas is a library for data manipulation and analysis:

  • Data structures: TimeSeries and DataFrame
  • An integrated group by engine for aggregating and transforming data sets
  • Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.
  • Memory-efficent “sparse” versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value)
  • Moving window statistics (rolling mean, rolling standard deviation, etc.)

In [48]:
import pandas as pd
from pandas import Series, DataFrame

Time Series


In [49]:
labels = ['a', 'b', 'c', 'd', 'e']
s = Series([1, 2, 3, 4, 5], index=labels)
s


Out[49]:
a    1
b    2
c    3
d    4
e    5
dtype: int64

In [50]:
print("'b' in s = {}".format('b' in s))
print(" s['b'] = {}".format(s['b']))


'b' in s = True
 s['b'] = 2

In [51]:
mapping = s.to_dict()
mapping


Out[51]:
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

In [52]:
Series(mapping)


Out[52]:
a    1
b    2
c    3
d    4
e    5
dtype: int64

DataFrame


In [53]:
import pandas.io.data
import datetime
aapl = pd.io.data.get_data_yahoo('AAPL', 
                                 start=datetime.datetime(2006, 10, 1), 
                                 end=datetime.datetime(2012, 1, 1))
aapl.head()


Out[53]:
Open High Low Close Volume Adj Close
Date
2006-10-02 75.10 75.87 74.30 74.86 178159800 10.17
2006-10-03 74.45 74.95 73.19 74.08 197677200 10.07
2006-10-04 74.10 75.46 73.16 75.38 207270700 10.24
2006-10-05 74.53 76.16 74.13 74.83 170970800 10.17
2006-10-06 74.42 75.04 73.81 74.22 116739700 10.08

5 rows × 6 columns

CSV

writing to csv file.

Notice how I mixed python with bash.


In [54]:
aapl.to_csv('aapl_ohlc.csv')
!head aapl_ohlc.csv


Date,Open,High,Low,Close,Volume,Adj Close
2006-10-02,75.1,75.87,74.3,74.86,178159800,10.17
2006-10-03,74.45,74.95,73.19,74.08,197677200,10.07
2006-10-04,74.1,75.46,73.16,75.38,207270700,10.24
2006-10-05,74.53,76.16,74.13,74.83,170970800,10.17
2006-10-06,74.42,75.04,73.81,74.22,116739700,10.08
2006-10-09,73.8,75.08,73.53,74.63,109555600,10.14
2006-10-10,74.54,74.58,73.08,73.81,132897100,10.03
2006-10-11,73.42,73.98,72.6,73.23,142963800,9.95
2006-10-12,73.61,75.39,73.6,75.26,148213800,10.23

reading a csv file.


In [55]:
df = pd.read_csv('aapl_ohlc.csv', index_col='Date', parse_dates=True)
df.head()


Out[55]:
Open High Low Close Volume Adj Close
Date
2006-10-02 75.10 75.87 74.30 74.86 178159800 10.17
2006-10-03 74.45 74.95 73.19 74.08 197677200 10.07
2006-10-04 74.10 75.46 73.16 75.38 207270700 10.24
2006-10-05 74.53 76.16 74.13 74.83 170970800 10.17
2006-10-06 74.42 75.04 73.81 74.22 116739700 10.08

5 rows × 6 columns


In [56]:
df.index


Out[56]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-10-02, ..., 2011-12-30]
Length: 1323, Freq: None, Timezone: None

In [57]:
df[['Open', 'Close']].head()


Out[57]:
Open Close
Date
2006-10-02 75.10 74.86
2006-10-03 74.45 74.08
2006-10-04 74.10 75.38
2006-10-05 74.53 74.83
2006-10-06 74.42 74.22

5 rows × 2 columns


In [58]:
print(type(df['Open']))
print(type(df[['Open', 'Close']]))


<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

In [59]:
df['diff'] = df.Open - df.Close
df.head()


Out[59]:
Open High Low Close Volume Adj Close diff
Date
2006-10-02 75.10 75.87 74.30 74.86 178159800 10.17 0.24
2006-10-03 74.45 74.95 73.19 74.08 197677200 10.07 0.37
2006-10-04 74.10 75.46 73.16 75.38 207270700 10.24 -1.28
2006-10-05 74.53 76.16 74.13 74.83 170970800 10.17 -0.30
2006-10-06 74.42 75.04 73.81 74.22 116739700 10.08 0.20

5 rows × 7 columns


In [60]:
close_px = df['Adj Close']
mavg = pd.rolling_mean(close_px, 40)
close_px.plot(label='AAPL')
mavg.plot(label='mavg')
plt.legend(loc='best')


Out[60]:
<matplotlib.legend.Legend at 0x7f1d5d265c50>

Covariance Analysis


In [61]:
df = pd.io.data.get_data_yahoo(['AAPL', 'Googl', 'GE', 'IBM', 'KO', 'MSFT', 'PEP'], 
                               start=datetime.datetime(2010, 1, 1), 
                               end=datetime.datetime(2013, 1, 1))['Adj Close']
rets = df.pct_change()
df.head()


Out[61]:
AAPL GE Googl IBM KO MSFT PEP
Date
2010-01-04 29.08 13.33 313.69 121.19 25.02 27.31 53.54
2010-01-05 29.13 13.40 312.31 119.73 24.72 27.32 54.19
2010-01-06 28.66 13.33 304.43 118.95 24.71 27.15 53.64
2010-01-07 28.61 14.02 297.35 118.54 24.65 26.87 53.30
2010-01-08 28.80 14.32 301.31 119.73 24.19 27.05 53.13

5 rows × 7 columns


In [62]:
_ = pd.scatter_matrix(rets, diagonal='kde', figsize=(10, 10))



In [63]:
corr = rets.corr()
plt.imshow(corr, cmap='hot', interpolation='none')
plt.colorbar()
plt.xticks(range(len(corr)), corr.columns)
plt.yticks(range(len(corr)), corr.columns);


NLTK

A library to deal with English language. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.


In [64]:
import nltk

Tokenization

identifies sentence and word boundaries.


In [65]:
nltk.download("punkt")


[nltk_data] Downloading package 'punkt' to /home/rmyeid/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
/usr/local/lib/python2.7/dist-packages/nltk/__init__.py:682: DeprecationWarning: object() takes no parameters
Out[65]:
True

In [66]:
sentences = """This is Rami. At eight o'clock on Thursday morning James Arthur didn't feel very good."""
sents = nltk.sent_tokenize(sentences)
sents


Out[66]:
['This is Rami.',
 "At eight o'clock on Thursday morning James Arthur didn't feel very good."]

In [67]:
words = nltk.word_tokenize(sents[1])
words


Out[67]:
['At',
 'eight',
 "o'clock",
 'on',
 'Thursday',
 'morning',
 'James',
 'Arthur',
 'did',
 "n't",
 'feel',
 'very',
 'good',
 '.']

POS Tagging

classifies words to several categories as nouns(NN), verbs(VB), and adjectives (ADJ).


In [68]:
nltk.download("maxent_treebank_pos_tagger")


[nltk_data] Downloading package 'maxent_treebank_pos_tagger' to
[nltk_data]     /home/rmyeid/nltk_data...
[nltk_data]   Package maxent_treebank_pos_tagger is already up-to-
[nltk_data]       date!
Out[68]:
True

In [69]:
tagged = nltk.pos_tag(words)
tagged


Out[69]:
[('At', 'IN'),
 ('eight', 'CD'),
 ("o'clock", 'JJ'),
 ('on', 'IN'),
 ('Thursday', 'NNP'),
 ('morning', 'NN'),
 ('James', 'NNP'),
 ('Arthur', 'NNP'),
 ('did', 'VBD'),
 ("n't", 'RB'),
 ('feel', 'VB'),
 ('very', 'RB'),
 ('good', 'JJ'),
 ('.', '.')]

Named Entity Chunking

identifies phrases in text that refers to persons, locations and organizations.


In [70]:
nltk.download("maxent_ne_chunker")
nltk.download("words")


[nltk_data] Downloading package 'maxent_ne_chunker' to
[nltk_data]     /home/rmyeid/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package 'words' to /home/rmyeid/nltk_data...
[nltk_data]   Package words is already up-to-date!
Out[70]:
True

In [71]:
entities = nltk.chunk.ne_chunk(tagged)
list(entities.subtrees(filter=lambda x: x.node == 'PERSON'))


Out[71]:
[Tree('PERSON', [('James', 'NNP'), ('Arthur', 'NNP')])]

Stemming

removes suffixes and prefixes to reduce sparsity of language vocabulary usage.


In [72]:
stemmer = nltk.stem.LancasterStemmer()
words = u"Stemming is funnier than a bummer says the sushi loving computer scientist".split()
[stemmer.stem(w) for w in words]


Out[72]:
[u'stem',
 u'is',
 u'funny',
 u'than',
 u'a',
 u'bum',
 u'say',
 u'the',
 u'sush',
 u'lov',
 u'comput',
 u'sci']

Data Mining

Before You Scrape

Check if the data is available through an API or just downloadable! Here are some pointers:

Scraping Websites

requests + LXML

Extracting prices and buyers


In [73]:
from lxml import html
import requests

In [74]:
from IPython.display import IFrame
IFrame('http://econpy.pythonanywhere.com/ex/001.html', width='100%', height=250)


Out[74]:

In [75]:
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.text)

In [76]:
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')

In [77]:
print('Buyers: ', buyers)
print()
print('Prices: ', prices)


Buyers:  ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']

Prices:  ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', '$15.00', '$114.07', '$10.09']

BeautifulSoup

Extracting hyperlinks from Google homepage.


In [78]:
from bs4 import BeautifulSoup

In [79]:
r  = requests.get("http://www.google.com")
data = r.text
soup = BeautifulSoup(data)

In [80]:
for link in soup.find_all('a'):
  print(link.get('href'))


http://www.google.com/imghp?hl=en&tab=wi
http://maps.google.com/maps?hl=en&tab=wl
https://play.google.com/?hl=en&tab=w8
http://www.youtube.com/?tab=w1
http://news.google.com/nwshp?hl=en&tab=wn
https://mail.google.com/mail/?tab=wm
https://drive.google.com/?tab=wo
http://www.google.com/intl/en/options/
http://www.google.com/history/optout?hl=en
/preferences?hl=en
https://accounts.google.com/ServiceLogin?hl=en&continue=http://www.google.com/
/chrome/index.html?hl=en&brand=CHNG&utm_source=en-hpp&utm_medium=hpp&utm_campaign=en
/advanced_search?hl=en&authuser=0
/language_tools?hl=en&authuser=0
/intl/en/ads/
/services/
https://plus.google.com/116899029375914044550
/intl/en/about.html
/intl/en/policies/

NetworkX (Graph Library)

A library to construct, manipulate and visualize graphs, it contains:

  • Data structures for graphs, digraphs, and multigraphs.
  • Nodes and edges can hold arbitrary data
  • Generators for classic graphs, random graphs, and synthetic networks
  • Standard graph algorithms and Network analysis measures

In [81]:
import networkx as nx

In [82]:
G = nx.karate_club_graph()
nx.draw_spring(G)
plt.show()


Colaboratory

It is an interactive, collaborative analytics tool that integrates:

  • Google Docs
  • Chrome,
  • IPython.

You can open a notebook from Google Drive. You can share notebooks like you would share a Google Doc. You can comment and edit collaboratively, in realtime. There is zero setup, because all the computation happens in Chrome. You can even quickly and easily package your analytics pipeline into a GUI for folks that don't want to program. In effect, you can go from zero to analytics with little impedance.

Further Reading