Vectorization Part 2 : Intermediate examples

We continue our exploration of vectorization with slightly longer and more "real life" examples.


In [1]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [2]:
import numpy as np

# We start with the idea of shifting indices to facilitate certain vectorized operations.

xs = np.arange(10)
print "Original with 10 elements   ", xs

# we can take different slices of xs using the numpy index notation
# this is very cheap since numpy does not make a copy of the data

center = xs[1:-1]
left = xs[:-2]
right = xs[2:]

print "Left slice with 8 elements  ", left
print "Center slice with 8 elements", center
print "Right slice with 8 elements ", right


Original with 10 elements    [0 1 2 3 4 5 6 7 8 9]
Left slice with 8 elements   [0 1 2 3 4 5 6 7]
Center slice with 8 elements [1 2 3 4 5 6 7 8]
Right slice with 8 elements  [2 3 4 5 6 7 8 9]

In [3]:
# Suppose we want to find the average value in a 3-window
# For a 10 element vector, there will only be 8 average values, since 
# the first entry has no left neighbor and the last entry has no right neighbor
# We can do this in a nested for loop
avg1 = []
for i in range(1, len(xs)-1): # loop over 8 elements of xs excluding first and last entry
    s = 0
    for j in range(i-1, i+2): # loop over 3 window consisting of current position and left and right neighbors
        s += xs[j]
    avg1.append(s/3.0)
print "3 window average using double loop                    ", np.array(avg1)


3 window average using double loop                     [ 1.  2.  3.  4.  5.  6.  7.  8.]

In [4]:
# this is faster and more elegant with vectorized operations using sliced views
# in essence we overlay the left, center and right views, then add columns using vectorized operation
# to find the sum of each overlapping 3 window, finally dividing by3 (another vectorized operation) 
# to get the average
# Recall from above that:
#
# Original with 10 elements    [0 1 2 3 4 5 6 7 8 9]
# 
# Left slice with 8 elements   [0 1 2 3 4 5 6 7]
# Center slice with 8 elements [1 2 3 4 5 6 7 8]
# Right slice with 8 elements  [2 3 4 5 6 7 8 9]
# So the first vertical sum is 0+1+2 that is, the sum of the first 3 window, 
# the second vertical sum is 1+2+3 (second shifting 3 window), and so on.

avg2 = (left + center + right)/3.0
print "3 window average using vector addditon of sliced views", np.array(avg2)


3 window average using vector addditon of sliced views [ 1.  2.  3.  4.  5.  6.  7.  8.]

Example: Finding AT rich sections in a DNA string

Some applications of vectorization are not as obvious as the examples above. In the next example, we want to scan a single stranded DNA string of length n consisting of the nucleotides 'A', 'C', 'T' and 'G' to look for segments of fixed length k where A and T are most over-represented. We can do this by simply counting the number of 'A' and 'T' is shifting windows of size k. The naive version counts A and T for a window, then shifts the window by 1 position and repeats in a loop. The fast version uses k-1 shifted views of a numpy array and sums over the k views (including unshifted). Since k is usually much smaller than n, this is much more efficient than the naive version. We compare the times taken to the built-in numpy function convolve - the naive version is 40x slower than convolve for our example, while the vectorized version is just under 2x slower.

To be more concrete, suppose we have a DNA string

CTAGAATCGCGACCCTAGTGGCAATCTTAAGGT

and we want to count the number of 'A' or 'T'' bases in a window of length 5. The first few windws are

Window 1: CTAGA (AT count = 3)

Window 2: TAGAA (AT count = 3)

Window 3: AGAAT (AT count = 4)

Window 4: GAATC (AT count = 3)

Window 5: AATCG (AT count = 3)

Window 6: ATCGC (AT count = 2)

So the most AT-rich window so far is window 3.


In [5]:
# Suppose we want to find AT rich windows of some fixed size in a DNA sequence.
nucleotides = ['A', 'C', 'T', 'G']
ss = np.random.choice(nucleotides, replace=True, size=100)

# A reasonable first step is to convert 'A' and 'T' to 1 and 'C' and 'G' to 0
ss_vals = np.zeros(len(ss))
ss_vals[(ss == 'A') | (ss == 'T')] = 1

# We can now find sums over shifting windows of the desired size
size = 10
ma = np.convolve(ss_vals, np.ones(size)) # convolve can be used to find moving sums fast
at_rich_pos = np.nonzero(ma==ma.max())[0]
for i in at_rich_pos:
    start, stop = 1+i-size, 1+i
    print '%s has %d A or T nucleotides in a %d window' % (''.join(ss[start:stop]), ma[i], size)


GAATTGATTT has 8 A or T nucleotides in a 10 window
AATTGATTTC has 8 A or T nucleotides in a 10 window
ATTGATTTCT has 8 A or T nucleotides in a 10 window
TTGATTTCTA has 8 A or T nucleotides in a 10 window

In [6]:
# If we didn't know about the convolve function, we could roll our own moving sums
def slow_ms(xs, size):
    nx = len(xs)
    ms = np.zeros(nx+size-1)
    for i in range(1, size):
        ms[i-1] = np.sum(xs[:i])
    for i in range(size, nx+size):
        ms[i-1] = np.sum(xs[(i-size):i])
    return ms

In [7]:
# Using vectorizzation speeds it up significantly
# The vectorization is a little subtle here:
# We use indexing tricks to "stack" 10 (shifted) vectors and sum them all 

# A simpler example hepls clarify what is going on here
# Suppose we want to find the moving sum for [1,2,3,4,5] for a 3-window
# We first create a zero vector of length 5+3-1 = 7
# ms = [0,0,0,0,0,0,0]
# Then we use indexing as in the first example to bascially carry out the following sums
# [0,0,0,0,0,0,0] + [1,2,3,4,5,0,0] + [0,1,2,3,4,5,0] + [0,0,1,2,3,4,5]
# Because numpy indexing does not create copies, this is very efficient 
# For exmple, if we have a string of legnth 10000 and a window of lenght 10,
# the slow version does 1000 sum operations (each vector is of length 10)
# the vectorized version does 10 sum operations (each vector is of length 10000)

def vectorized_ms(xs, size):
    nx = len(xs)
    ms = np.zeros(nx+size-1)
    for i in range(size):
        ms[i:i+nx] += xs
    return ms

In [8]:
# Finally, we compare the time taken by all 3 versions
    
print "\nTime for convolve"
%timeit ms1 = np.convolve(ss_vals, np.ones(size))
print "\nTime for loop version"
%timeit ms2 = slow_ms(ss_vals, size)
print "\nTime for vectorized version"
%timeit ms3 = vectorized_ms(ss_vals, size)


Time for convolve
10000 loops, best of 3: 18.5 µs per loop

Time for loop version
1000 loops, best of 3: 1.01 ms per loop

Time for vectorized version
10000 loops, best of 3: 33.8 µs per loop

Example - Using vectorizaiton to speed up 1D cellular automata simulations.

In this example, we will simulate a 1D cellular automata. In particular, we will simulate the 18 rules shown in the first figure at http://mathworld.wolfram.com/ElementaryCellularAutomaton.html. For the 1D CA, we need to calculate the next state of a cell given its own state and the states of its left and right neighbors. The rule to update is given by the 8-bit binary representation of an integer. For exampe, rule 30 is '00011110', where each binary digit represnts the next state for the state matching that bit position, where position 0 = 0, position 1 = 1, position 2 = 1, and so on. The binary representation of 0 in 3 bits is 000 so that means that if my current state is 0 and both my neighbors are also 0, my next state will be 0. Position 4 in 3 bits is 100 and position 4 has value 1, so that means that if my current state is 0, my left neighbor is 0 and my right neighbor is 1, then my next state will be 1.

One way to code this is to use a loop to check 3 consecutive cells, then move to the next cell and repeat, but this is very slow if the grid is large. Instead, we will use numpy indexing to shift rows one position to the left and one position to the right. If we now look across these 3 rows (one unshirted and 2 shifted), each column of 3 provides the currrent state of 3 neighboring cells. Now the next trick is to add these up as if the values came from the $2^2$ (left), $2^1$ (unshifted) and $2^0$ positions. Finally, we take these values (which will range from 0 to 7) and use the given rule to map to a new state vector using logical indexing. Using this vectorizaiation, we only ever add 3 (possibly very long) vectors at each step. Going over the code below carefully should make this explanaiton clearer.


In [9]:
def make_rule(n):
    # convert number from 0 to 255 into list of positive neighbor states
    bin_str = np.binary_repr(n)[::-1]
    return np.nonzero([int(i) for i in bin_str])[0]

# See description of rule 30 above. Cells in state 1 (001), 2 (010), 3 (011) and 4 (100) will be ON in next iteration.
print "Rule 30 viable states", make_rule(30)

# Compare with rule 54, where cells in state 1 (001), 2 (010), 4 (100) and 5 (101) will be ON in next iteration.
print "Rule 54 viable states", make_rule(54)


Rule 30 viable states [1 2 3 4]
Rule 54 viable states [1 2 4 5]

In [10]:
def ca_step(row, on_states):
    # update state of 1D CA using vectorized operations
    row[row != 0] = 1
    center = 2*row[1:-1] # state of self (second of 3 bits, so *2)
    right = 1*row[2:] # state of right neighbor (first of 3 bits, so *1)
    left = 4*row[:-2] # state of left neighbor (third of 3 bits so *4)
    # find current state, where state is a 3-bit vector expressed as an interger
    row[1:-1] = center + left + right 
    tmp = np.zeros_like(row)
    idx = np.zeros(len(row), 'bool')
    for state in on_states:
        idx |= row == state
    tmp[idx] = 1
    return tmp

In [11]:
def ca(row, nsteps, rule):
    # return a matrix where each row is the state of the 1D CA at one time step
    on_states = make_rule(rule)
    grid = np.zeros((nsteps+1, len(row)))
    _row = row.copy()    
    grid[0,:] = _row
    for i in range(nsteps):
        _row = ca_step(_row, on_states)
        grid[i+1,:] = _row
    return grid

In [12]:
# initalize with single ON state in the center of the !D CA
row = np.zeros(100, dtype='int')
mid = len(row)/2
row[mid] = 1

# replicate plots from http://mathworld.wolfram.com/ElementaryCellularAutomaton.html
rules = [30, 54, 60, 62, 90, 94, 102, 110, 122, 126, 150, 158, 182, 188, 190, 220, 222, 250]
nrows = len(rules)/3
ncols = 3
figure(figsize=(nrows*2, ncols*3))
bone() # use black and white colormap
for i, rule in enumerate(rules):
    subplot(nrows, ncols, i+1)
    grid = ca(row, nsteps=mid-1, rule=rule)
    imshow(grid, interpolation='none', aspect='equal')
    xticks([])
    yticks([])
    title('rule %d' % rule)
tight_layout()