TASK 1
Write the code to enumerate items in the list:
Example:
Input
items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']
Output
#something like:
[0, 1, 2, 0, 2, 1]
TASK 2
For each element in a list [0, 1, 2, ..., N]
build all possible pairs with other elements of that list.
Example:
Input:
[0, 1, 2, 3] or just 4
Output:
0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3
1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2
The Hitchhiker’s Guide to Python http://docs.python-guide.org/en/latest/
Hard way is easier http://learnpythonthehardway.org
Google python class https://developers.google.com/edu/python/
Python tutorial https://docs.python.org/2/tutorial/
Python Tutor - code visualizing (developed by MIT) http://pythontutor.com/
If you feel lost: CodeAcademy https://www.codecademy.com/en/tracks/python
Learning by doing!
code should be readable first!
style guides
looping through dictionaries http://docs.quantifiedcode.com/python-anti-patterns/performance/index.html
using wildcard imports (from ... import *) http://docs.quantifiedcode.com/python-anti-patterns/maintainability/from_module_import_all_used.html
Using single letter to name your variables http://docs.quantifiedcode.com/python-anti-patterns/maintainability/using_single_letter_as_variable_name.html
Comparing things to None the wrong way http://docs.quantifiedcode.com/python-anti-patterns/readability/comparison_to_none.html
Comparing things to True the wrong way http://docs.quantifiedcode.com/python-anti-patterns/readability/comparison_to_true.html
Using type() to compare types http://docs.quantifiedcode.com/python-anti-patterns/readability/do_not_compare_types_use_isinstance.html
Using an unpythonic loop http://docs.quantifiedcode.com/python-anti-patterns/readability/using_an_unpythonic_loop.html
Using CamelCase in function names http://docs.quantifiedcode.com/python-anti-patterns/readability/using_camelcase_in_function_names.html
Verify your python version by running
python --version
This notebook is written in pyhton 2.
a = b = 3
c, d = 4, 5
c, d = d, c
In [458]:
greeting = 'Hello'
guest = "John"
my_string = 'Hello "John"'
named_greeting = 'Hello, {name}'.format(name=guest)
named_greeting2 = '{}, {}'.format(greeting, guest)
print named_greeting
print named_greeting2
for more details see docs: https://docs.python.org/2/tutorial/datastructures.html
In [459]:
fruit_list = ['apple', 'orange', 'peach', 'mango', 'bananas', 'pineapple']
name_length = [len(fruit) for fruit in fruit_list]
print name_length
In [460]:
name_with_p = [fruit for fruit in fruit_list if fruit[0]=='p'] #even better: fruit.startswith('p')
In [461]:
numbered_fruits = []
In [462]:
for i, fruit in enumerate(fruit_list):
numbered_fruits.append('{}.{}'.format(i, fruit))
numbered_fruits
Out[462]:
Indexing starts with zero.
General indexing rule (mind the brackets): [start:stop:step]
In [463]:
numbered_fruits[0] = None
In [464]:
numbered_fruits[1:4]
Out[464]:
In [465]:
numbered_fruits[1:-1:2]
Out[465]:
In [466]:
numbered_fruits[::-1]
Out[466]:
immutable type!
In [467]:
p_fruits = (name_with_p[1], name_with_p[0])
p_fruits[1] = 'mango'
In [468]:
single_number_tuple = 3,
single_number_tuple
Out[468]:
In [469]:
single_number_tuple + (2,) + (1, 0)
Out[469]:
Immutable type. Stores only unique elements.
In [470]:
set([0, 1, 2, 1, 1, 1, 3])
Out[470]:
In [471]:
fruit_list = ['apple', 'orange', 'mango', 'banana', 'pineapple']
quantities = [3, 5, 2, 3, 4]
order_fruits = {fruit: num \
for fruit, num in zip(fruit_list, quantities)}
order_fruits
Out[471]:
In [472]:
order_fruits['pineapple'] = 2
order_fruits
Out[472]:
In [473]:
print order_fruits.keys()
print order_fruits.values()
In [474]:
for fruit, amount in order_fruits.iteritems():
print 'Buy {num} {entity}s'.format(num=amount, entity=fruit)
In [475]:
def my_func(var1, var2, default_var1=0, default_var2 = False):
"""
This is a generic example of python a function.
You can see this string when do call: my_func?
"""
#do something with vars
if not default_var2:
result = var1
elif default_var1 == 0:
result = var1
else:
result = var1 + var2
return result
function is just another object (like almost everything in python)
In [476]:
print 'Function {} has the following docstring:\n{}'\
.format(my_func.func_name, my_func.func_doc)
Guidence on how to create meaningful docstring: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#docstring-standard
In [477]:
def function_over_function(func, *args, **kwargs):
function_result = func(*args, **kwargs)
return function_result
In [478]:
function_over_function(my_func, 3, 5, default_var1=1, default_var2=True)
Out[478]:
In [479]:
function_over_function(lambda x, y, factor=10: (x+y)*factor, 1, 2, 5)
Out[479]:
Don't assign lambda expressions to variables. If you need named instance - create standard function with def
In [480]:
my_simple_func = lambda x: x+1
vs
In [481]:
def my_simple_func(x):
return x + 1
In [482]:
import numpy as np
In [483]:
matrix_from_list = np.array([[1, 3, 4],
[2, 0, 5],
[4, 4, 1],
[0, 1, 0]])
vector_from_list = np.array([2, 1, 3])
print 'The matrix is\n{matrix}\n\nthe vector is\n{vector}'\
.format(vector=vector_from_list, matrix=matrix_from_list)
In [484]:
matrix_from_list.dot(vector_from_list)
Out[484]:
In [485]:
matrix_from_list + vector_from_list
Out[485]:
In [486]:
single_precision_vector = np.array([1, 3, 5, 2], dtype=np.float32)
single_precision_vector.dtype
Out[486]:
In [487]:
vector_from_list.dtype
Out[487]:
In [488]:
vector_from_list.astype(np.int16)
Out[488]:
mind dimensionality!
In [550]:
row_vector = np.array([[1,2,3]])
print 'New vector {} has dimensionality {}'\
.format(row_vector, row_vector.shape)
print 'The dot-product is: ', matrix_from_list.dot(row_vector)
In [551]:
singleton_vector = row_vector.squeeze()
print 'Squeezed vector {} has shape {}'.format(singleton_vector, singleton_vector.shape)
In [552]:
matrix_from_list.dot(singleton_vector)
Out[552]:
In [553]:
print singleton_vector[:, np.newaxis]
In [554]:
mat = np.arange(12)
mat.reshape(-1, 4)
mat
Out[554]:
In [555]:
print singleton_vector[:, None]
In [556]:
vector12 = np.arange(12)
vector12
Out[556]:
Guess what is the output:
vector12[:3]
vector12[-1]
vector12[:-2]
vector12[3:7]
vector12[::2]
vector12[::-1]
In [557]:
matrix43 = vector12.reshape(4, 3)
matrix43
Out[557]:
Guess what is the output:
matrix43[:, 0]
matrix43[-1, :]
matrix43[::2, :]
matrix43[:3, :-1]
matrix43[3:, 1]
Unlike Matlab, numpy arrays are column-major (or C-major) by default, not row-major (or F-major).
Working with views is more efficient and is a preferred way.
view is returned whenever basic slicing is used
more details at http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
making copy is simple:
In [558]:
matrix43_copy = matrix43[:]
In [559]:
matrix_to_reshape = np.random.randint(10, 99, size=(6, 4))
matrix_to_reshape
Out[559]:
In [560]:
reshaped_matrix = matrix_to_reshape.reshape(8, 3)
reshaped_matrix
Out[560]:
reshape always returns view!
In [561]:
reshaped_matrix[-1, 0] = 1
In [562]:
np.set_printoptions(formatter={'all':lambda x: '_{}_'.format(x) if x < 10 else str(x)})
In [563]:
matrix_to_reshape[:]
Out[563]:
In [564]:
np.set_printoptions()
In [565]:
idx = matrix43 > 4
matrix43[idx]
Out[565]:
eye, ones, zeros, diag
Example: Build three-diagonal matrix with -2's on main diagonal and 1's and subdiagonals
Is this code valid?
In [566]:
def three_diagonal(N):
A = np.zeros((N, N), dtype=np.int)
for i in range(N):
A[i, i] = -2
if i > 0:
A[i, i-1] = 1
if i < N-1:
A[i, i+1] = 1
return A
print three_diagonal(5)
In [567]:
def numpy_three_diagonal(N):
main_diagonal = -2 * np.eye(N)
suddiag_value = np.ones(N-1,)
lower_subdiag = np.diag(suddiag_value, k=-1)
upper_subdiag = np.diag(suddiag_value, k=1)
result = main_diagonal + lower_subdiag + upper_subdiag
return result.astype(np.int)
numpy_three_diagonal(5)
Out[567]:
In [568]:
A = numpy_three_diagonal(5)
A[0, -1] = 5
A[-1, 0] = 3
print A
print A.sum()
print A.min()
print A.max(axis=0)
print A.sum(axis=0)
print A.mean(axis=1)
print (A > 4).any(axis=1)
In [569]:
print np.pi
In [570]:
args = np.arange(0, 2.5*np.pi, 0.5*np.pi)
In [571]:
print np.sin(args)
In [572]:
print np.round(np.sin(args), decimals=2)
In [573]:
'{}, {:.1%}, {:e}, {:.2f}, {:.0f}'.format(*np.sin(args))
Out[573]:
In [574]:
np.set_printoptions(formatter={'all':lambda x: '{:.2f}'.format(x)})
print np.sin(args)
np.set_printoptions()
linspace, meshgrid
Let's produce a function $$ f(x, y) = sin(x+y) $$ on some mesh.
In [575]:
linear_index = np.linspace(0, np.pi, 10, endpoint=True)
mesh_x, mesh_y = np.meshgrid(linear_index, linear_index)
values_3D = np.sin(mesh_x + mesh_y)
In [576]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
fig = plt.figure(figsize=(10,6))
ax = fig.gca(projection='3d')
ax.plot_wireframe(mesh_x, mesh_y, values_3D)
ax.view_init(azim=-45, elev=30)
plt.title('The plot of $f(x, y) = sin(x+y)$')
Out[576]:
In [577]:
import scipy.sparse as sp
In [578]:
def scipy_three_diagonal(N):
main_diagonal = -2 * np.ones(N, )
suddiag_values = np.ones(N-1,)
diagonals = [main_diagonal, suddiag_values, suddiag_values]
# Another option: use sp.eye(N) and add subdiagonals
offsets = [0, 1, -1]
result = sp.diags(diagonals, offsets, shape=(N, N), format='coo')
return result
my_sparse_matrix = scipy_three_diagonal(5)
In [579]:
my_sparse_matrix
Out[579]:
Sparse matrix stores only non-zero elements (and their indices)
In [580]:
print my_sparse_matrix
In [581]:
my_sparse_matrix.toarray()
Out[581]:
In [582]:
my_sparse_matrix.A
Out[582]:
In [583]:
from scipy.linalg import toeplitz, hankel
In [584]:
hankel(xrange(4), [-1, -2, -3, -4])
Out[584]:
In [585]:
toeplitz(xrange(4))
Out[585]:
In [586]:
N = 1000
%timeit three_diagonal(N)
%timeit numpy_three_diagonal(N)
%timeit scipy_three_diagonal(N)
You can also use %%timeit
magic to measure run time of the whole cell
In [587]:
%%timeit
N = 1000
calc = three_diagonal(N)
calc = scipy_three_diagonal(N)
del calc
Avoid using time.time()
or time.clock()
directly as their behaviour's different depending on platform; default_timer
makes the best choice for you. It measures wall time though, e.g. not very precise.
In [588]:
from timeit import default_timer as timer
In [589]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench:[] for bench in bench_names}
for n in dims:
start_time = timer()
calc = three_diagonal(n)
time_delta = timer() - start_time
timings['loop'].append(time_delta)
start_time = timer()
calc = numpy_three_diagonal(n)
time_delta = timer() - start_time
timings['numpy'].append(time_delta)
start_time = timer()
calc = scipy_three_diagonal(n)
time_delta = timer() - start_time
timings['scipy'].append(time_delta)
Let's make the code less redundant
In [590]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench_name: [] for bench_name in bench_names}
def timing_machine(func, *args, **kwargs):
start_time = timer()
result = func(*args, **kwargs)
time_delta = timer() - start_time
return time_delta
for n in dims:
timings['loop'].append(timing_machine(three_diagonal, n))
timings['numpy'].append(timing_machine(numpy_three_diagonal, n))
timings['scipy'].append(timing_machine(scipy_three_diagonal, n))
more details on different parameters: https://ipython.org/ipython-doc/dev/interactive/magics.html#magic-timeit
In [612]:
timeit_result = %timeit -q -r 5 -o three_diagonal(10)
print 'Best of {} runs: {:.8f}s'.format(timeit_result.repeat,
timeit_result.best)
Our new benchmark procedure
In [592]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
bench_funcs = [three_diagonal, numpy_three_diagonal, scipy_three_diagonal]
timings_best = {bench_name: [] for bench_name in bench_names}
for bench_name, bench_func in zip(bench_names, bench_funcs):
print '\nMeasuring {}'.format(bench_func.func_name)
for n in dims:
print n,
time_result = %timeit -q -o bench_func(n)
timings_best[bench_name].append(time_result.best)
In [593]:
import matplotlib.pyplot as plt
%matplotlib inline
%matplotlib inline
ensures all graphs are plotted inside your notebook
In [594]:
# plt.rcParams.update({'axes.labelsize': 'large'})
plt.rcParams.update({'font.size': 14})
In [595]:
plt.figure(figsize=(10,8))
for bench_name, values in timings_best.iteritems():
plt.semilogy(dims, values, label=bench_name)
plt.legend(loc='best')
plt.title('Benchmarking results with best of timeit', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
Out[595]:
In [596]:
plt.figure(figsize=(10,8))
for bench_name, values in timings.iteritems():
plt.semilogy(dims, values, label=bench_name)
plt.legend(loc='best')
plt.title('Benchmarking results with default_timer', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
Out[596]:
Think, why:
You might want to read the docs:
Remark: starting from python 3.3 it's recommended to use time.perf_counter()
and time.process_time()
https://docs.python.org/3/library/time.html#time.perf_counter
Also note, that for advanced benchmarking it's better to use profiling tools.
Use plt.plot?
to get detailed info on function usage.
Task: given lists of x-values, y-falues and plot format strings, plot all three graphs in one line.
Hint: use list comprehensions
In [597]:
k = len(timings_best)
iter_xyf = [item for sublist in zip([dims]*k,
timings_best.values(),
list('rgb'))\
for item in sublist]
plt.figure(figsize=(10, 8))
plt.semilogy(*iter_xyf)
plt.legend(timings_best.keys(), loc=2, frameon=False)
plt.title('Benchmarking results - "one-liner"', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
Out[597]:
Even simpler way - also gives you granular control on plot objects
In [598]:
plt.figure(figsize=(10, 8))
figs = [plt.semilogy(dims, values, label=bench_name)\
for bench_name, values in timings.iteritems()];
ax0, = figs[0]
ax0.set_dashes([5, 10, 20, 10, 5, 10])
ax1, = figs[1]
ax1.set_marker('s')
ax1.set_markerfacecolor('r')
ax2, = figs[2]
ax2.set_linewidth(6)
ax2.set_alpha(0.3)
ax2.set_color('m')
matplotlib has a number of different options for styling your plot
In [599]:
all_markers = [
'.', # point
',', # pixel
'o', # circle
'v', # triangle down
'^', # triangle up
'<', # triangle_left
'>', # triangle_right
'1', # tri_down
'2', # tri_up
'3', # tri_left
'4', # tri_right
'8', # octagon
's', # square
'p', # pentagon
'*', # star
'h', # hexagon1
'H', # hexagon2
'+', # plus
'x', # x
'D', # diamond
'd', # thin_diamond
'|', # vline
]
all_linestyles = [
'-', # solid line style
'--', # dashed line style
'-.', # dash-dot line style
':', # dotted line style
'None'# no line
]
all_colors = [
'b', # blue
'g', # green
'r', # red
'c', # cyan
'm', # magenta
'y', # yellow
'k', # black
'w', # white
]
for advanced usage of subplots start here
In [622]:
n = len(timings)
experiment_names = timings.keys()
fig, axes = plt.subplots(1, n, sharey=True, figsize=(16,4))
colors = np.random.choice(list('rgbcmyk'), n, replace=False)
markers = np.random.choice(all_markers, n, replace=False)
lines = np.random.choice(all_linestyles, n, replace=False)
for ax_num, ax in enumerate(axes):
key = experiment_names[ax_num]
ax.semilogy(dims, timings[key], label=key,
color=colors[ax_num],
marker=markers[ax_num],
markersize=8,
linestyle=lines[ax_num],
lw=3)
ax.set_xlabel('matrix dimension')
ax.set_title(key)
axes[0].set_ylabel('Time, s')
plt.suptitle('Benchmarking results', fontsize=16, y=1.03)
Out[622]:
In [601]:
plt.figure()
plt.subplot(211)
plt.plot([1,2,3])
plt.subplot(212)
plt.plot([2,5,4])
Out[601]:
Task: create subplot with 2 columns and 2 rows. Leave bottom left quarter empty. Scipy and numpy benchmarks should go into top row.
function wrappers and decorators
installing packages
importing modules
ipyton magic
qtconsole
environment
extensions
profiles (deprecated in jupyter)
profiling
debugging
cython, numba
openmp
OOP
python 2 vs python 3
plotting in python - palletes and colormaps, styles
pandas (presenting results)
numpy strides, contiguousness, vectorize function, broadcasting, saving output
magic functions (applied to line and to code cell)
jupyter configuration
Task 1
In [602]:
items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']
method 1
In [603]:
from collections import defaultdict
item_ids = defaultdict(lambda: len(item_ids))
map(item_ids.__getitem__, items)
Out[603]:
method 2
In [604]:
import pandas as pd
pd.DataFrame({'items': items}).groupby('items', sort=False).grouper.group_info[0]
Out[604]:
method 3
In [605]:
import numpy as np
np.unique(items, return_inverse=True)[1]
Out[605]:
method 4
In [606]:
last = 0
counts = {}
result = []
for item in items:
try:
count = counts[item]
except KeyError:
counts[item] = count = last
last += 1
result.append(count)
result
Out[606]:
Task 2
In [607]:
N = 1000
In [608]:
from itertools import permutations
%timeit list(permutations(xrange(N), 2))
Hankel matrix: $a_{ij} = a_{i-1, j+1}$
In [609]:
import numpy as np
from scipy.linalg import hankel
def pairs_idx(n):
return np.vstack((np.repeat(xrange(n), n-1), hankel(xrange(1, n), xrange(-1, n-1)).ravel()))
In [610]:
%timeit pairs_idx(N)