IPython provides a rich architecture for interactive computing with:
In [1]:
%%bash
ls -lh ~/ | head -n 3
In [2]:
!uname -a
In [3]:
print("This is Python!")
In [4]:
def fact(n):
if n <= 0:
return 1
return n*fact(n-1)
fact(20)
Out[4]:
In [5]:
%%ruby
puts 'This is Ruby playing with Python!!!'
or
In [6]:
from IPython.display import IFrame
IFrame('http://nbviewer.ipython.org/', width='100%', height=350)
Out[6]:
This is not a Python tutorial, we trust that you can pick the language so quickly if you follow any of the following resources:
You should either use the distribution's python packages or the packages available on PYPI using pip install
. It is recommended that you use the most updated version of your linux distribution.
$ sudo apt-get install python-numpy python-scipy
$ sudo apt-get install python-scikits-learn python-pandas
$ sudo apt-get install python-nltk python-sympy python-pip
$ sudo pip install ipython
$ sudo pip install bokeh
This is harder in general, but you can use homebrew, macports, or just use Enthought or Ananconda Python distributions (Look at Windows instructions). Here, is a mac specific tutorial.
$ brew install python
$ pip install virtualenv virtualenvwrapper
$ pip install numpy
$ brew install gfortran
$ pip install scipy
$ brew install freetype
$ pip install matplotlib
$ pip install ipython bokeh
Windows lacks a good packaging system, so the easiest way to setup a Python environment is to install a pre-packaged distribution. Some good alternatives are:
EPD and Anaconda CE are also available for Linux and Max OS X.
In [7]:
%install_ext http://raw.github.com/jrjohansson/version_information/master/version_information.py
%load_ext version_information
%version_information numpy, scipy, matplotlib, sympy, scikit_learn, nltk, pandas
Out[7]:
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
In [1]:
import numpy as np
from __future__ import print_function
In [2]:
# a vector: the argument to the array function is a Python list
v = np.array([11, 12, 13, 14])
print('v =\n{}'.format(v))
# a matrix: the argument to the array function is a nested Python list
M = np.array([[2, 1], [3, 4]])
print('M =\n{}'.format(M))
print (type(v), type(M))
In [10]:
print("v shape is {}".format(v.shape))
print("M shape is {}".format(M.shape))
print("Data type of v is {}".format(v.dtype))
print()
print("M transpose =\n{}".format(M.T))
print()
M.sort(axis=1)
print("M sorted by row =\n{}".format(np.asarray(M)))
print()
print("v stats are mean = {}, standard deviation = {:.4}, max = {}, min ={}".format(v.mean(), v.std(), v.max(), v.min()))
print()
print("Converting matrix M to a vector {}".format(M.flatten()))
print("Converting vector v to a matrix=\n{}".format(v.reshape(2,2)))
print()
print("M matrix size is {} and number of dimensions is {}".format(M.size, M.ndim))
In [11]:
x = np.arange(0, 10, 1) # arguments: start, stop, step
print("Create a range\n{}".format(x))
print()
# using linspace, both end points ARE included
x = np.linspace(0, 10, 41)
print("Create a spaced range\n{}".format(x))
print()
# uniform random numbers in [0,1]
x = np.random.rand(4,4)
print("Create a uniform random matrix (4,4)\n{}".format(x))
print()
# a diagonal matrix
x = np.diag([1,2,3])
print("Create a digonal matrix\n{}".format(x))
print()
x = np.zeros((3,3))
print("Create a zero matrix (3,3) \n{}".format(x))
ndarrays can be indexed using the standard Python $\mathbf{x}$[obj] syntax, where $\mathbf{x}$ is the array and obj the selection. There are three kinds of indexing available: record access, basic slicing, advanced indexing. Which one occurs depends on obj.
In [12]:
print("v[0] = {}\n".format(v[0]))
print("M =\n{}\n".format(M))
print("M[1, 1] = {}\n".format(M[1,1]))
print("M[1] = {}\n".format(M[1]))
print("M[1, :] = {}\n".format(M[1, :]))
print("M[:, 1] = {}\n".format(M[:, 1]))
print("M[1, :] = 0")
M[1, :] = 0
print("M =\n{}\n".format(M))
In [13]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
print("A =\n{}\n".format(A))
print("A[1:4, 1:4]=\n{}\n".format(A[1:4, 1:4]))
print("A[::2, ::2]=\n{}\n".format(A[::2, ::2]))
print("A[ [1,4] ]=\n{}\n".format(A[[1,4]]))
print("A[ [1,4], [2,-1] ]=\n{}\n".format(A[[1,4],[2,-1]]))
In [14]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
print("A =\n{}\n".format(A))
print("A > 20 =\n{}\n".format(A > 20))
print("np.where(A > 20) =\n{}\n".format(np.where(A > 20)))
print("np.argwhere(A > 20) =\n{}\n".format(np.argwhere(A > 20)))
print("A - 10 =\n{}\n".format(A - 10))
print("A * 10 =\n{}\n".format(A * 10))
print("A * A =\n{}\n".format(A * A))
In [15]:
print("np.linalg.det(A) = {}\n".format(np.linalg.det(A)))
In [16]:
try:
print("np.linalg.inv(A) = {}\n".format(np.linalg.inv(A)))
except np.linalg.LinAlgError as e:
print("Matrix is singular")
In [17]:
v = np.arange(5)
print("v = {}\n".format(v))
print("||v|| = np.linalg.norm(v) = {}\n".format(np.linalg.norm(v)))
print("np.dot(v.T, v) = {}\n".format(np.dot(v.T, v)))
print("np.dot(v.T, v) ** 0.5 = {}".format(np.dot(v.T, v) ** 0.5))
In [18]:
print("v.shape = {}".format(v.shape))
u = v[:, np.newaxis]
print("u = v[np.newaxis,:] =\n{}\n".format(u))
print("u.shape = {}".format(u.shape))
print("np.dot(u, u.T) =\n{}\n".format(np.dot(u, u.T)))
#print("np.linalg.inv(")
In [19]:
A = np.random.randint(0, 100, (4, 5))
v = np.arange(5) + 1.
u = np.arange(4) + 2.
print("A =\n{}\n".format(A))
print("A.max() = {}".format(A.max()))
print("A.max(axis=0) = {}".format(A.max(axis=0)))
print("A.min(axis=1) = {}".format(A.min(axis=1)))
print()
print("v = {}".format(v))
print("A / v =\n{}\n".format(A/v))
print()
print("u = {}".format(u))
print("(A.T - u).T =\n{}\n".format((A.T-u).T))
print("np.diff(A, axis=0) =\n{}\n".format(np.diff(A, axis=0)))
print("np.cumsum(A, axis=1) =\n{}\n".format(np.cumsum(A, axis=1)))
matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell, web application servers, and six graphical user interface toolkits.
Best practice to import matplotlib
In [20]:
%matplotlib inline
import matplotlib.pyplot as plt
In [21]:
x = np.linspace(0, 5, 10)
y = x ** 2
In [22]:
fig, ax = plt.subplots()
ax.plot(x, x**2, label="$y = x^2$")
ax.plot(x, x**3, label="y = x**3")
ax.legend(loc=2); # upper left corner
ax.set_xlabel('x')
ax.set_ylabel('y', fontsize=38)
ax.set_title('Advertise Here');
In [23]:
xx = np.linspace(-0.75, 1., 100)
n = np.array([0,1,2,3,4,5])
In [24]:
fig, axes = plt.subplots(1, 4, figsize=(12,3))
axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))
axes[0].set_title("scatter")
axes[1].step(n, n**2, lw=2)
axes[1].set_title("step")
axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5)
axes[2].set_title("bar")
axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);
axes[3].set_title("fill_between");
In [25]:
# A histogram
n = np.random.randn(100000)
fig, axes = plt.subplots(1, 2, figsize=(12,4))
axes[0].hist(n)
axes[0].set_title("Default histogram")
axes[0].set_xlim((min(n), max(n)))
axes[1].hist(n, cumulative=True, bins=50)
axes[1].set_title("Cumulative detailed histogram")
axes[1].set_xlim((min(n), max(n)));
In [26]:
from mpl_toolkits.mplot3d.axes3d import Axes3D
In [27]:
alpha = 0.7
phi_ext = 2 * np.pi * 0.5
def flux_qubit_potential(phi_m, phi_p):
return 2 + alpha - 2 * np.cos(phi_p)*np.cos(phi_m) - alpha * np.cos(phi_ext - 2*phi_p)
phi_m = np.linspace(0, 2*np.pi, 100)
phi_p = np.linspace(0, 2*np.pi, 100)
X,Y = np.meshgrid(phi_p, phi_m)
Z = flux_qubit_potential(X, Y).T
In [28]:
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(1,1,1, projection='3d')
ax.plot_surface(X, Y, Z, rstride=4, cstride=4, alpha=0.25)
cset = ax.contour(X, Y, Z, zdir='z', offset=-np.pi, cmap=plt.cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='x', offset=-np.pi, cmap=plt.cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='y', offset=3*np.pi, cmap=plt.cm.coolwarm)
ax.set_xlim3d(-np.pi, 2*np.pi);
ax.set_ylim3d(0, 3*np.pi);
ax.set_zlim3d(-np.pi, 2*np.pi);
To change your matplotlib figures styling, you have one several options:
In [29]:
import prettyplotlib as ppl
import matplotlib as mpl
In [30]:
np.random.seed(12)
In [31]:
fig, ax = plt.subplots(1)
# Show the whole color range
for i in range(8):
x = np.random.normal(loc=i, size=1000)
y = np.random.normal(loc=i, size=1000)
ppl.scatter(ax, x, y, label=str(i))
ppl.legend(ax)
_ = ax.set_title('prettyplotlib `scatter` example\nshowing default color cycle and scatter params')
In [32]:
from IPython.display import IFrame
IFrame('http://matplotlib.org/gallery.html#lines_bars_and_markers', width='100%', height=550)
Out[32]:
The mpld3 project brings together Matplotlib, and D3js, the popular Javascript library for creating interactive data visualizations for the web. The result is a simple API for exporting your matplotlib graphics to HTML code which can be used within the browser, within standard web pages, blogs, or tools such as the IPython notebook.
In [33]:
import mpld3
mpld3.enable_notebook()
In [34]:
np.random.seed(0)
P = np.random.random(size=10)
A = np.random.random(size=10)
x = np.linspace(0, 10, 100)
data = np.array([[x, Ai * np.sin(x / Pi)]
for (Ai, Pi) in zip(A, P)])
fig, ax = plt.subplots(2)
points = ax[1].scatter(P, A, c=P + A,
s=200, alpha=0.5)
ax[1].set_xlabel('Period')
ax[1].set_ylabel('Amplitude')
colors = plt.cm.ScalarMappable().to_rgba(P + A)
for (x, l), c in zip(data, colors):
ax[0].plot(x, l, c=c, alpha=0.5, lw=3)
In [35]:
mpld3.disable_notebook()
Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, but also deliver this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
In [36]:
import bokeh
try:
from bokeh.sampledata import us_counties, unemployment
except:
bokeh.sampledata.download()
from bokeh.sampledata import us_counties, unemployment
In [37]:
from bokeh.plotting import *
colors = ["#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043"]
In [38]:
county_xs=[
us_counties.data[code]['lons'] for code in us_counties.data
if us_counties.data[code]['state'] == 'tx'
]
county_ys=[
us_counties.data[code]['lats'] for code in us_counties.data
if us_counties.data[code]['state'] == 'tx'
]
In [39]:
county_colors = []
for county_id in us_counties.data:
if us_counties.data[county_id]['state'] != 'tx':
continue
try:
rate = unemployment.data[county_id]
idx = min(int(rate/2), 5)
county_colors.append(colors[idx])
except KeyError:
county_colors.append("black")
In [40]:
output_notebook()
patches(county_xs, county_ys, fill_color=county_colors, fill_alpha=0.7,
line_color="white", line_width=0.5, title="Texas Unemployment 2009")
show()
In [41]:
from IPython.display import IFrame
IFrame('http://bokeh.pydata.org/docs/gallery.html', width='100%', height=550)
Out[41]:
In [42]:
from IPython.html.widgets import interact, RadioButtonsWidget, IntSliderWidget, TextWidget
In [43]:
def plot_sine(freq):
x = np.linspace(-np.pi, np.pi, num=1000)
plt.plot(x, np.sin(2*np.pi*freq*x))
In [44]:
interact(plot_sine, freq=(1, 10, 0.5))
Out[44]:
In [45]:
def plot_sine2(amplitude, color, title):
fig, ax = plt.subplots(figsize=(4, 3),
subplot_kw={'axisbg':'#EEEEEE',
'axisbelow':True})
ax.grid(color='w', linewidth=2, linestyle='solid')
x = np.linspace(0, 10, 1000)
ax.plot(x, amplitude * np.sin(x), color=color,
lw=5, alpha=0.4)
ax.set_xlim(0, 10)
ax.set_ylim(-10.1, 10.1)
ax.set_title(title)
return fig
In [46]:
interact(plot_sine2,
amplitude=IntSliderWidget(min=0, max=10, step=1,value=1),
color=RadioButtonsWidget(values=['blue', 'green', 'red']),
title=TextWidget(value="Advertise here"))
Out[46]:
In [47]:
from IPython.display import IFrame
IFrame('https://plot.ly/feed', width='100%', height=550)
Out[47]:
pandas is a library for data manipulation and analysis:
In [48]:
import pandas as pd
from pandas import Series, DataFrame
In [49]:
labels = ['a', 'b', 'c', 'd', 'e']
s = Series([1, 2, 3, 4, 5], index=labels)
s
Out[49]:
In [50]:
print("'b' in s = {}".format('b' in s))
print(" s['b'] = {}".format(s['b']))
In [51]:
mapping = s.to_dict()
mapping
Out[51]:
In [52]:
Series(mapping)
Out[52]:
In [53]:
import pandas.io.data
import datetime
aapl = pd.io.data.get_data_yahoo('AAPL',
start=datetime.datetime(2006, 10, 1),
end=datetime.datetime(2012, 1, 1))
aapl.head()
Out[53]:
In [54]:
aapl.to_csv('aapl_ohlc.csv')
!head aapl_ohlc.csv
reading a csv file.
In [55]:
df = pd.read_csv('aapl_ohlc.csv', index_col='Date', parse_dates=True)
df.head()
Out[55]:
In [56]:
df.index
Out[56]:
In [57]:
df[['Open', 'Close']].head()
Out[57]:
In [58]:
print(type(df['Open']))
print(type(df[['Open', 'Close']]))
In [59]:
df['diff'] = df.Open - df.Close
df.head()
Out[59]:
In [60]:
close_px = df['Adj Close']
mavg = pd.rolling_mean(close_px, 40)
close_px.plot(label='AAPL')
mavg.plot(label='mavg')
plt.legend(loc='best')
Out[60]:
In [61]:
df = pd.io.data.get_data_yahoo(['AAPL', 'Googl', 'GE', 'IBM', 'KO', 'MSFT', 'PEP'],
start=datetime.datetime(2010, 1, 1),
end=datetime.datetime(2013, 1, 1))['Adj Close']
rets = df.pct_change()
df.head()
Out[61]:
In [62]:
_ = pd.scatter_matrix(rets, diagonal='kde', figsize=(10, 10))
In [63]:
corr = rets.corr()
plt.imshow(corr, cmap='hot', interpolation='none')
plt.colorbar()
plt.xticks(range(len(corr)), corr.columns)
plt.yticks(range(len(corr)), corr.columns);
A library to deal with English language. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.
In [64]:
import nltk
In [65]:
nltk.download("punkt")
Out[65]:
In [66]:
sentences = """This is Rami. At eight o'clock on Thursday morning James Arthur didn't feel very good."""
sents = nltk.sent_tokenize(sentences)
sents
Out[66]:
In [67]:
words = nltk.word_tokenize(sents[1])
words
Out[67]:
In [68]:
nltk.download("maxent_treebank_pos_tagger")
Out[68]:
In [69]:
tagged = nltk.pos_tag(words)
tagged
Out[69]:
In [70]:
nltk.download("maxent_ne_chunker")
nltk.download("words")
Out[70]:
In [71]:
entities = nltk.chunk.ne_chunk(tagged)
list(entities.subtrees(filter=lambda x: x.node == 'PERSON'))
Out[71]:
In [72]:
stemmer = nltk.stem.LancasterStemmer()
words = u"Stemming is funnier than a bummer says the sushi loving computer scientist".split()
[stemmer.stem(w) for w in words]
Out[72]:
In [73]:
from lxml import html
import requests
In [74]:
from IPython.display import IFrame
IFrame('http://econpy.pythonanywhere.com/ex/001.html', width='100%', height=250)
Out[74]:
In [75]:
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.text)
In [76]:
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')
In [77]:
print('Buyers: ', buyers)
print()
print('Prices: ', prices)
In [78]:
from bs4 import BeautifulSoup
In [79]:
r = requests.get("http://www.google.com")
data = r.text
soup = BeautifulSoup(data)
In [80]:
for link in soup.find_all('a'):
print(link.get('href'))
A library to construct, manipulate and visualize graphs, it contains:
In [81]:
import networkx as nx
In [82]:
G = nx.karate_club_graph()
nx.draw_spring(G)
plt.show()
It is an interactive, collaborative analytics tool that integrates:
You can open a notebook from Google Drive. You can share notebooks like you would share a Google Doc. You can comment and edit collaboratively, in realtime. There is zero setup, because all the computation happens in Chrome. You can even quickly and easily package your analytics pipeline into a GUI for folks that don't want to program. In effect, you can go from zero to analytics with little impedance.