Tasks for those who "feel like a pro":

TASK 1

Write the code to enumerate items in the list:

  • items are not ordered
  • items are not unique
  • don't use loops
  • try to be as short as possible (not considering import statements)

Example:

Input

items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']

Output

#something like:
[0, 1, 2, 0, 2, 1]

TASK 2

For each element in a list [0, 1, 2, ..., N] build all possible pairs with other elements of that list.

  • exclude "self-pairing" (e.g. 0-0, 1-1, 2-2)
  • don't use loops
  • try to be as short as possible (not considering import statements)

Example:

Input:

[0, 1, 2, 3] or just 4

Output:

0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3

1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2

Learning Resources

Online

Learning by doing!

Reading (in the future)

  • Al Sweigart, "Automate the Boring Stuff with Python", https://automatetheboringstuff.com
  • Mark Lutz, "Python Pocket Reference" (250 pages)
  • Mark Lutz, "Learning Python" (1600 pages!)

Programming in python

Writing code

Some anti-patterns

Python basics

Verify your python version by running

python --version

This notebook is written in pyhton 2.

Basic types

variables

a = b = 3

c, d = 4, 5

c, d = d, c

strings


In [458]:
greeting = 'Hello'
guest = "John"
my_string = 'Hello "John"'
named_greeting = 'Hello, {name}'.format(name=guest)

named_greeting2 = '{}, {}'.format(greeting, guest)

print named_greeting
print named_greeting2


Hello, John
Hello, John

data containers

  • list
  • tuple
  • set
  • dictionary

lists


In [459]:
fruit_list = ['apple', 'orange', 'peach', 'mango', 'bananas', 'pineapple']

name_length = [len(fruit) for fruit in fruit_list]
print name_length


[5, 6, 5, 5, 7, 9]

In [460]:
name_with_p = [fruit for fruit in fruit_list if fruit[0]=='p']  #even better: fruit.startswith('p')

In [461]:
numbered_fruits = []

In [462]:
for i, fruit in enumerate(fruit_list):
    numbered_fruits.append('{}.{}'.format(i, fruit))
    
numbered_fruits


Out[462]:
['0.apple', '1.orange', '2.peach', '3.mango', '4.bananas', '5.pineapple']

Indexing starts with zero.

General indexing rule (mind the brackets): [start:stop:step]


In [463]:
numbered_fruits[0] = None

In [464]:
numbered_fruits[1:4]


Out[464]:
['1.orange', '2.peach', '3.mango']

In [465]:
numbered_fruits[1:-1:2]


Out[465]:
['1.orange', '3.mango']

In [466]:
numbered_fruits[::-1]


Out[466]:
['5.pineapple', '4.bananas', '3.mango', '2.peach', '1.orange', None]

tuples

immutable type!


In [467]:
p_fruits = (name_with_p[1], name_with_p[0])
p_fruits[1] = 'mango'


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-467-a967184828ef> in <module>()
      1 p_fruits = (name_with_p[1], name_with_p[0])
----> 2 p_fruits[1] = 'mango'

TypeError: 'tuple' object does not support item assignment

In [468]:
single_number_tuple = 3,
single_number_tuple


Out[468]:
(3,)

In [469]:
single_number_tuple + (2,) + (1, 0)


Out[469]:
(3, 2, 1, 0)

sets

Immutable type. Stores only unique elements.


In [470]:
set([0, 1, 2, 1, 1, 1, 3])


Out[470]:
{0, 1, 2, 3}

dictionaries


In [471]:
fruit_list = ['apple', 'orange', 'mango', 'banana', 'pineapple']
quantities = [3, 5, 2, 3, 4]

order_fruits = {fruit: num \
                for fruit, num in zip(fruit_list, quantities)}
order_fruits


Out[471]:
{'apple': 3, 'banana': 3, 'mango': 2, 'orange': 5, 'pineapple': 4}

In [472]:
order_fruits['pineapple'] = 2
order_fruits


Out[472]:
{'apple': 3, 'banana': 3, 'mango': 2, 'orange': 5, 'pineapple': 2}

In [473]:
print order_fruits.keys()
print order_fruits.values()


['orange', 'mango', 'pineapple', 'apple', 'banana']
[5, 2, 2, 3, 3]

In [474]:
for fruit, amount in order_fruits.iteritems():
    print 'Buy {num} {entity}s'.format(num=amount, entity=fruit)


Buy 5 oranges
Buy 2 mangos
Buy 2 pineapples
Buy 3 apples
Buy 3 bananas

Functions

general patterns


In [475]:
def my_func(var1, var2, default_var1=0, default_var2 = False):
    """
    This is a generic example of python a function.
    You can see this string when do call: my_func?
    """
    #do something with vars
    if not default_var2:
        result = var1
    elif default_var1 == 0:
        result = var1
    else:
        result = var1 + var2
    return result

function is just another object (like almost everything in python)


In [476]:
print 'Function {} has the following docstring:\n{}'\
        .format(my_func.func_name, my_func.func_doc)


Function my_func has the following docstring:

    This is a generic example of python a function.
    You can see this string when do call: my_func?
    

functions as arguments


In [477]:
def function_over_function(func, *args, **kwargs):
    function_result = func(*args, **kwargs)
    return function_result

In [478]:
function_over_function(my_func, 3, 5, default_var1=1, default_var2=True)


Out[478]:
8

lambda evaluation


In [479]:
function_over_function(lambda x, y, factor=10: (x+y)*factor, 1, 2, 5)


Out[479]:
15

Don't assign lambda expressions to variables. If you need named instance - create standard function with def


In [480]:
my_simple_func = lambda x: x+1

vs


In [481]:
def my_simple_func(x):
    return x + 1

Numpy - scientific computing

Building matrices and vectors


In [482]:
import numpy as np

In [483]:
matrix_from_list = np.array([[1, 3, 4],
                             [2, 0, 5],
                             [4, 4, 1],
                             [0, 1, 0]])

vector_from_list = np.array([2, 1, 3])

print 'The matrix is\n{matrix}\n\nthe vector is\n{vector}'\
        .format(vector=vector_from_list, matrix=matrix_from_list)


The matrix is
[[1 3 4]
 [2 0 5]
 [4 4 1]
 [0 1 0]]

the vector is
[2 1 3]

Basic manipulations

matvec


In [484]:
matrix_from_list.dot(vector_from_list)


Out[484]:
array([17, 19, 15,  1])

broadcasting


In [485]:
matrix_from_list + vector_from_list


Out[485]:
array([[3, 4, 7],
       [4, 1, 8],
       [6, 5, 4],
       [2, 2, 3]])

forcing dtype


In [486]:
single_precision_vector = np.array([1, 3, 5, 2], dtype=np.float32)
single_precision_vector.dtype


Out[486]:
dtype('float32')

converting dtypes


In [487]:
vector_from_list.dtype


Out[487]:
dtype('int32')

In [488]:
vector_from_list.astype(np.int16)


Out[488]:
array([2, 1, 3], dtype=int16)

shapes (singletons)

mind dimensionality!


In [550]:
row_vector = np.array([[1,2,3]])

print 'New vector {} has dimensionality {}'\
        .format(row_vector, row_vector.shape)

print 'The dot-product is: ', matrix_from_list.dot(row_vector)


New vector [[1 2 3]] has dimensionality (1L, 3L)
The dot-product is: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-550-286bbd2b9667> in <module>()
      3 print 'New vector {} has dimensionality {}'        .format(row_vector, row_vector.shape)
      4 
----> 5 print 'The dot-product is: ', matrix_from_list.dot(row_vector)

ValueError: shapes (4,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)

In [551]:
singleton_vector = row_vector.squeeze()
print 'Squeezed vector {} has shape {}'.format(singleton_vector, singleton_vector.shape)


 Squeezed vector [1 2 3] has shape (3L,)

In [552]:
matrix_from_list.dot(singleton_vector)


Out[552]:
array([19, 17, 15,  2])

adding new dimension


In [553]:
print singleton_vector[:, np.newaxis]


[[1]
 [2]
 [3]]

In [554]:
mat = np.arange(12)
mat.reshape(-1, 4)
mat


Out[554]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [555]:
print singleton_vector[:, None]


[[1]
 [2]
 [3]]

Indexing, slicing


In [556]:
vector12 = np.arange(12)
vector12


Out[556]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Guess what is the output:

vector12[:3]
vector12[-1]
vector12[:-2]
vector12[3:7]
vector12[::2]
vector12[::-1]

In [557]:
matrix43 = vector12.reshape(4, 3)
matrix43


Out[557]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Guess what is the output:

matrix43[:, 0]
matrix43[-1, :]
matrix43[::2, :]
matrix43[:3, :-1]
matrix43[3:, 1]

Unlike Matlab, numpy arrays are column-major (or C-major) by default, not row-major (or F-major).

View vs Copy

Working with views is more efficient and is a preferred way.

view is returned whenever basic slicing is used

more details at http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

making copy is simple:


In [558]:
matrix43_copy = matrix43[:]

Reshaping


In [559]:
matrix_to_reshape = np.random.randint(10, 99, size=(6, 4))
matrix_to_reshape


Out[559]:
array([[34, 93, 79, 92],
       [39, 80, 92, 78],
       [91, 67, 78, 73],
       [90, 78, 51, 66],
       [86, 29, 60, 30],
       [88, 58, 10, 35]])

In [560]:
reshaped_matrix = matrix_to_reshape.reshape(8, 3)
reshaped_matrix


Out[560]:
array([[34, 93, 79],
       [92, 39, 80],
       [92, 78, 91],
       [67, 78, 73],
       [90, 78, 51],
       [66, 86, 29],
       [60, 30, 88],
       [58, 10, 35]])

reshape always returns view!


In [561]:
reshaped_matrix[-1, 0] = 1

In [562]:
np.set_printoptions(formatter={'all':lambda x: '_{}_'.format(x) if x < 10 else str(x)})

In [563]:
matrix_to_reshape[:]


Out[563]:
array([[34, 93, 79, 92],
       [39, 80, 92, 78],
       [91, 67, 78, 73],
       [90, 78, 51, 66],
       [86, 29, 60, 30],
       [88, _1_, 10, 35]])

In [564]:
np.set_printoptions()

Boolean indexing


In [565]:
idx = matrix43 > 4
matrix43[idx]


Out[565]:
array([ 5,  6,  7,  8,  9, 10, 11])

Useful numpy functions

eye, ones, zeros, diag

Example: Build three-diagonal matrix with -2's on main diagonal and 1's and subdiagonals

Is this code valid?


In [566]:
def three_diagonal(N):
    A = np.zeros((N, N), dtype=np.int)
    for i in range(N):
        A[i, i] = -2
        if i > 0:
            A[i, i-1] = 1
        if i < N-1:
            A[i, i+1] = 1
    return A

print three_diagonal(5)


[[-2  1  0  0  0]
 [ 1 -2  1  0  0]
 [ 0  1 -2  1  0]
 [ 0  0  1 -2  1]
 [ 0  0  0  1 -2]]

In [567]:
def numpy_three_diagonal(N):
    main_diagonal = -2 * np.eye(N)
    
    suddiag_value = np.ones(N-1,)
    lower_subdiag = np.diag(suddiag_value, k=-1)
    upper_subdiag = np.diag(suddiag_value, k=1)
    
    result = main_diagonal + lower_subdiag + upper_subdiag
    return result.astype(np.int)

numpy_three_diagonal(5)


Out[567]:
array([[-2,  1,  0,  0,  0],
       [ 1, -2,  1,  0,  0],
       [ 0,  1, -2,  1,  0],
       [ 0,  0,  1, -2,  1],
       [ 0,  0,  0,  1, -2]])

reducers: sum, mean, max, min, all, any


In [568]:
A = numpy_three_diagonal(5)
A[0, -1] = 5
A[-1, 0] = 3

print A
print A.sum()
print A.min()
print A.max(axis=0)
print A.sum(axis=0)
print A.mean(axis=1)
print (A > 4).any(axis=1)


[[-2  1  0  0  5]
 [ 1 -2  1  0  0]
 [ 0  1 -2  1  0]
 [ 0  0  1 -2  1]
 [ 3  0  0  1 -2]]
6
-2
[3 1 1 1 5]
[2 0 0 0 4]
[ 0.8  0.   0.   0.   0.4]
[ True False False False False]

numpy math functions


In [569]:
print np.pi


3.14159265359

In [570]:
args = np.arange(0, 2.5*np.pi, 0.5*np.pi)

In [571]:
print np.sin(args)


[  0.00000000e+00   1.00000000e+00   1.22464680e-16  -1.00000000e+00
  -2.44929360e-16]

In [572]:
print np.round(np.sin(args), decimals=2)


[ 0.  1.  0. -1.  0.]

managing output


In [573]:
'{}, {:.1%}, {:e}, {:.2f}, {:.0f}'.format(*np.sin(args))


Out[573]:
'0.0, 100.0%, 1.224647e-16, -1.00, -0'

In [574]:
np.set_printoptions(formatter={'all':lambda x: '{:.2f}'.format(x)})
print np.sin(args)
np.set_printoptions()


[0.00 1.00 0.00 -1.00 -0.00]

Meshes

linspace, meshgrid

Let's produce a function $$ f(x, y) = sin(x+y) $$ on some mesh.


In [575]:
linear_index = np.linspace(0, np.pi, 10, endpoint=True)
mesh_x, mesh_y = np.meshgrid(linear_index, linear_index)

values_3D = np.sin(mesh_x + mesh_y)

In [576]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline

fig = plt.figure(figsize=(10,6))
ax = fig.gca(projection='3d')

ax.plot_wireframe(mesh_x, mesh_y, values_3D)
ax.view_init(azim=-45, elev=30)

plt.title('The plot of $f(x, y) = sin(x+y)$')


Out[576]:
<matplotlib.text.Text at 0x4fc264e0>

Scipy - scientific computing 2

Building sparse matrix


In [577]:
import scipy.sparse as sp

In [578]:
def scipy_three_diagonal(N):
    main_diagonal = -2 * np.ones(N, )
    suddiag_values = np.ones(N-1,)
    
    diagonals = [main_diagonal, suddiag_values, suddiag_values]
    # Another option: use sp.eye(N) and add subdiagonals
    offsets = [0, 1, -1]
    
    result = sp.diags(diagonals, offsets, shape=(N, N), format='coo')
    return result

my_sparse_matrix = scipy_three_diagonal(5)

How does scipy represent sparse matrix?


In [579]:
my_sparse_matrix


Out[579]:
<5x5 sparse matrix of type '<type 'numpy.float64'>'
	with 13 stored elements in COOrdinate format>

Sparse matrix stores only non-zero elements (and their indices)


In [580]:
print my_sparse_matrix


  (0, 0)	-2.0
  (1, 1)	-2.0
  (2, 2)	-2.0
  (3, 3)	-2.0
  (4, 4)	-2.0
  (0, 1)	1.0
  (1, 2)	1.0
  (2, 3)	1.0
  (3, 4)	1.0
  (1, 0)	1.0
  (2, 1)	1.0
  (3, 2)	1.0
  (4, 3)	1.0

Restoring full matrix


In [581]:
my_sparse_matrix.toarray()


Out[581]:
array([[-2.,  1.,  0.,  0.,  0.],
       [ 1., -2.,  1.,  0.,  0.],
       [ 0.,  1., -2.,  1.,  0.],
       [ 0.,  0.,  1., -2.,  1.],
       [ 0.,  0.,  0.,  1., -2.]])

In [582]:
my_sparse_matrix.A


Out[582]:
array([[-2.,  1.,  0.,  0.,  0.],
       [ 1., -2.,  1.,  0.,  0.],
       [ 0.,  1., -2.,  1.,  0.],
       [ 0.,  0.,  1., -2.,  1.],
       [ 0.,  0.,  0.,  1., -2.]])

In [583]:
from scipy.linalg import toeplitz, hankel

In [584]:
hankel(xrange(4), [-1, -2, -3, -4])


Out[584]:
array([[ 0,  1,  2,  3],
       [ 1,  2,  3, -2],
       [ 2,  3, -2, -3],
       [ 3, -2, -3, -4]])

In [585]:
toeplitz(xrange(4))


Out[585]:
array([[0, 1, 2, 3],
       [1, 0, 1, 2],
       [2, 1, 0, 1],
       [3, 2, 1, 0]])

Timing - measuring performance

Simplest way to measure time


In [586]:
N = 1000
%timeit three_diagonal(N)
%timeit numpy_three_diagonal(N)
%timeit scipy_three_diagonal(N)


1000 loops, best of 3: 1.53 ms per loop
10 loops, best of 3: 20.6 ms per loop
1000 loops, best of 3: 272 µs per loop

You can also use %%timeit magic to measure run time of the whole cell


In [587]:
%%timeit
N = 1000
calc = three_diagonal(N)
calc = scipy_three_diagonal(N)
del calc


100 loops, best of 3: 2.17 ms per loop

Storing timings in a separate variable

Avoid using time.time() or time.clock() directly as their behaviour's different depending on platform; default_timer makes the best choice for you. It measures wall time though, e.g. not very precise.


In [588]:
from timeit import default_timer as timer

In [589]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench:[] for bench in bench_names}

for n in dims:
    start_time = timer()
    calc = three_diagonal(n)
    time_delta = timer() - start_time
    timings['loop'].append(time_delta)
    
    start_time = timer()
    calc = numpy_three_diagonal(n)
    time_delta = timer() - start_time
    timings['numpy'].append(time_delta)
    
    start_time = timer()
    calc = scipy_three_diagonal(n)
    time_delta = timer() - start_time
    timings['scipy'].append(time_delta)

Let's make the code less redundant


In [590]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench_name: [] for bench_name in bench_names}

def timing_machine(func, *args, **kwargs):
    start_time = timer()
    result = func(*args, **kwargs)
    time_delta = timer() - start_time
    return time_delta

for n in dims:
    timings['loop'].append(timing_machine(three_diagonal, n))
    timings['numpy'].append(timing_machine(numpy_three_diagonal, n))
    timings['scipy'].append(timing_machine(scipy_three_diagonal, n))

timeit with -o parameter


In [612]:
timeit_result = %timeit -q -r 5 -o three_diagonal(10)
print 'Best of {} runs: {:.8f}s'.format(timeit_result.repeat,
                                        timeit_result.best)


Best of 5 runs: 0.00000565s

Our new benchmark procedure


In [592]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
bench_funcs = [three_diagonal, numpy_three_diagonal, scipy_three_diagonal]
timings_best = {bench_name: [] for bench_name in bench_names}

for bench_name, bench_func in zip(bench_names, bench_funcs):
    print '\nMeasuring {}'.format(bench_func.func_name)
    for n in dims:
        print n,
        time_result = %timeit -q -o bench_func(n)
        timings_best[bench_name].append(time_result.best)


Measuring three_diagonal
300 1000 3000 10000 
Measuring numpy_three_diagonal
300 1000 3000 10000 
Measuring scipy_three_diagonal
300 1000 3000 10000

Matplotlib - plotting in python

Configuring matplotlib


In [593]:
import matplotlib.pyplot as plt
%matplotlib inline

%matplotlib inline ensures all graphs are plotted inside your notebook

Global controls


In [594]:
# plt.rcParams.update({'axes.labelsize': 'large'})
plt.rcParams.update({'font.size': 14})

Combined plot


In [595]:
plt.figure(figsize=(10,8))

for bench_name, values in timings_best.iteritems():
    plt.semilogy(dims, values, label=bench_name)
    
plt.legend(loc='best')
plt.title('Benchmarking results with best of timeit', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')


Out[595]:
<matplotlib.text.Text at 0x4fc49cc0>

In [596]:
plt.figure(figsize=(10,8))

for bench_name, values in timings.iteritems():
    plt.semilogy(dims, values, label=bench_name)
    
plt.legend(loc='best')
plt.title('Benchmarking results with default_timer', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')


Out[596]:
<matplotlib.text.Text at 0x375a2630>

Think, why:

  • "loop" was faster then "numpy"
  • "scipy" is almost constant
  • results for default_timer and "best of timeit" are different

You might want to read the docs:

Remark: starting from python 3.3 it's recommended to use time.perf_counter() and time.process_time() https://docs.python.org/3/library/time.html#time.perf_counter

Also note, that for advanced benchmarking it's better to use profiling tools.

Combined plot "one-liner"

Use plt.plot? to get detailed info on function usage.

Task: given lists of x-values, y-falues and plot format strings, plot all three graphs in one line.

Hint: use list comprehensions


In [597]:
k = len(timings_best)
iter_xyf = [item for sublist in zip([dims]*k,
                                    timings_best.values(),
                                    list('rgb'))\
                                for item in sublist]

plt.figure(figsize=(10, 8))
plt.semilogy(*iter_xyf)

plt.legend(timings_best.keys(), loc=2, frameon=False)
plt.title('Benchmarking results - "one-liner"', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')


Out[597]:
<matplotlib.text.Text at 0x2859bfd0>

Even simpler way - also gives you granular control on plot objects


In [598]:
plt.figure(figsize=(10, 8))

figs = [plt.semilogy(dims, values, label=bench_name)\
        for bench_name, values in timings.iteritems()];

ax0, = figs[0]
ax0.set_dashes([5, 10, 20, 10, 5, 10])

ax1, = figs[1]
ax1.set_marker('s')
ax1.set_markerfacecolor('r')

ax2, = figs[2]
ax2.set_linewidth(6)
ax2.set_alpha(0.3)
ax2.set_color('m')


Plot formatting

matplotlib has a number of different options for styling your plot


In [599]:
all_markers = [
'.', # point
',', # pixel
'o', # circle
'v', # triangle down
'^', # triangle up
'<', # triangle_left
'>', # triangle_right
'1', # tri_down
'2', # tri_up
'3', # tri_left
'4', # tri_right
'8', # octagon
's', # square
'p', # pentagon
'*', # star
'h', # hexagon1
'H', # hexagon2
'+', # plus
'x', # x
'D', # diamond
'd', # thin_diamond
'|', # vline
]

all_linestyles = [
'-',  # solid line style
'--', # dashed line style
'-.', # dash-dot line style
':',  # dotted line style
'None'# no line
]

all_colors = [
'b', # blue
'g', # green
'r', # red
'c', # cyan
'm', # magenta
'y', # yellow
'k', # black
'w', # white
]

Subplots

Iterating over subplots


In [622]:
n = len(timings)
experiment_names = timings.keys()

fig, axes = plt.subplots(1, n, sharey=True, figsize=(16,4))

colors = np.random.choice(list('rgbcmyk'), n, replace=False)
markers = np.random.choice(all_markers, n, replace=False)
lines = np.random.choice(all_linestyles, n, replace=False)

for ax_num, ax in enumerate(axes):
    key = experiment_names[ax_num]
    ax.semilogy(dims, timings[key], label=key,
            color=colors[ax_num],
            marker=markers[ax_num],
            markersize=8,
            linestyle=lines[ax_num],
            lw=3)
    ax.set_xlabel('matrix dimension')
    ax.set_title(key)

axes[0].set_ylabel('Time, s')
plt.suptitle('Benchmarking results', fontsize=16,  y=1.03)


Out[622]:
<matplotlib.text.Text at 0x581efb38>

Manual control of subplots


In [601]:
plt.figure()
plt.subplot(211)
plt.plot([1,2,3])

plt.subplot(212)
plt.plot([2,5,4])


Out[601]:
[<matplotlib.lines.Line2D at 0x2ef6ac88>]

Task: create subplot with 2 columns and 2 rows. Leave bottom left quarter empty. Scipy and numpy benchmarks should go into top row.

Other topics

function wrappers and decorators

installing packages

importing modules

ipyton magic

qtconsole

environment

extensions

profiles (deprecated in jupyter)

profiling

debugging

cython, numba

openmp

OOP

python 2 vs python 3

plotting in python - palletes and colormaps, styles

pandas (presenting results)

numpy strides, contiguousness, vectorize function, broadcasting, saving output

magic functions (applied to line and to code cell)

jupyter configuration

Solutions

Task 1


In [602]:
items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']

method 1


In [603]:
from collections import defaultdict

item_ids = defaultdict(lambda: len(item_ids))
map(item_ids.__getitem__, items)


Out[603]:
[0, 1, 2, 0, 2, 1]

method 2


In [604]:
import pandas as pd

pd.DataFrame({'items': items}).groupby('items', sort=False).grouper.group_info[0]


Out[604]:
array([0, 1, 2, 0, 2, 1], dtype=int64)

method 3


In [605]:
import numpy as np

np.unique(items, return_inverse=True)[1]


Out[605]:
array([2, 0, 1, 2, 1, 0])

method 4


In [606]:
last = 0
counts = {}
result = []
for item in items:
    try:
        count = counts[item]
    except KeyError:
        counts[item] = count = last
        last += 1
    result.append(count)

result


Out[606]:
[0, 1, 2, 0, 2, 1]

Task 2


In [607]:
N = 1000

In [608]:
from itertools import permutations

%timeit list(permutations(xrange(N), 2))


10 loops, best of 3: 78.6 ms per loop

Hankel matrix: $a_{ij} = a_{i-1, j+1}$


In [609]:
import numpy as np
from scipy.linalg import hankel

def pairs_idx(n):
    return np.vstack((np.repeat(xrange(n), n-1), hankel(xrange(1, n), xrange(-1, n-1)).ravel()))

In [610]:
%timeit pairs_idx(N)


100 loops, best of 3: 17.6 ms per loop