CSE 6040, Fall 2015 [18]: Matrix storage

Today's lab continues Lab 16, which is a survey of techniques using Numpy, a Python module that provides fast primitives for multidimensional arrays.

By the way, a partial solution set for Lab 16 is also available here.

Downloads. For today's lab, you'll need two additional downloads:

The dataset is actually from your current homework (#2! Namely, it's the first million pairs of the user-user connectivity graph, in CSV format.

To repeat, the recommended importing convention for Numpy is (execute this now):


In [4]:
import numpy as np

Matrix storage: Column-major versus row-major layouts

For linear algebra, we will be especially interested in 2-D arrays, which we will use to store matrices. For this common case, there is a subtle performance issue related to how matrices are stored in memory.

By way of background, physical storage---whether it be memory or disk---is basically one big array. And because of how physical storage is implemented, it turns out that it is much faster to access consecutive elements in memory than, say, to jump around randomly.

A matrix is a two-dimensional object. Thus, when it is stored in memory, it must be mapped in some way to the one-dimensional physical array. There are many possible mappings, but the two most common conventions are known as the column-major and row-major layouts:

Exercise. Let $A$ be an $m \times n$ matrix stored in column-major format. Let $B$ be an $m \times n$ matrix stored in row-major format.

Based on the preceding discussion, recall that these objects will be mapped to 1-D arrays of length $mn$, behind the scenes. Let's call the 1-D array representations $\hat{A}$ and $\hat{B}$. Thus, the $(i, j)$ element of $a$, $a_{ij}$, will map to some element $\hat{a}_u$ of $\hat{A}$; similarly, $b_{ij}$ will map to some element $\hat{b}_v$ of $\hat{B}$.

Determine formulae to compute the 1-D index values, $u$ and $v$, in terms of $\{i, j, m, n\}$. Assume that all indices are 0-based, i.e., $0 \leq i \leq m-1$, $0 \leq j \leq n-1$, and $0 \leq u, v \leq mn-1$.


In [5]:
def calc_u (i, j, m, n): # column major
    u = j*m + i # @YOUSE: Replace with the correct formula
    return u
    
def calc_v (i, j, m, n): # row major
    v = i*n + j # @YOUSE: Replace with the correct formula
    return v

# Quick check (not exhaustive):
print "==> Testing your implementations (not exhaustive)..."
assert calc_u (7, 4, 10, 20) == 47
assert calc_v (7, 4, 10, 20) == 144
print "Passed!"


==> Testing your implementations (not exhaustive)...
Passed!

Requesting a layout in Numpy

In Numpy, you can ask for either layout. The default in Numpy is row-major.

Historically numerical linear algebra libraries were developed assuming column-major layout. This layout happens to be the default when you declare a 2-D array in the Fortran programming language. By contrast, in the C and C++ programming languages, the default convention for a 2-D array is row-major layout. So the Numpy default is the C/C++ convention.

In your programs, you can request either order of Numpy using the order parameter. For linear algebra operations (common), we recommend using the column-major convention.

In either case, here is how you would create column- and row-major matrices.


In [6]:
n = 5000
A_colmaj = np.ones ((n, n), order='F') # column-major (Fortran convention)
A_rowmaj = np.ones ((n, n), order='C') # row-major (C/C++ convention)

Exercise. Write a function that iterates over the columns of a matrix and scales them by a given value. Then use that routine to measure the difference in time when the input is row-major versus column-major.


In [8]:
def scale_colwise (A):
    """Given a Numpy matrix `A`, visits each column `A[:, j]`
    and scales it by `j`."""
    assert type (A) is np.ndarray
    
    n_cols = A.shape[1] # number of columns
    
    # @YOUSE: Fill in this code
    for j in range (n_cols):
        A[:, j] *= j

    return A

# Measure time to scale a row-major input column-wise
%timeit scale_colwise (A_rowmaj)

# Measure time to scale a column-major input column-wise
%timeit scale_colwise (A_colmaj)


1 loops, best of 3: 616 ms per loop
The slowest run took 12.25 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 35.4 ms per loop

Python vs. Numpy example: Matrix-vector multiply

Recall the definition of matrix-vector multiplication from Da's LA notes. Let's benchmark a matrix-vector multiply in native Python, and compare that to doing the same operation in Numpy.

First, some setup. (What does this code do?)


In [9]:
# Dimensions; you might shrink this value for debugging
n = 2500

In [10]:
# Generate random values, for use in populating the matrix and vector

from random import gauss

# Native Python, using lists
A_py = [gauss (0, 1) for i in range (n*n)] # Assume: Column-major
x_py = [gauss (0, 1) for i in range (n)]

In [11]:
# Convert values into Numpy arrays in column-major order
A_np = np.reshape (A_py, (n, n), order='F')
x_np = np.reshape (x_py, (n, 1), order='F')

In [12]:
# Here is how you do a "matvec" in Numpy:

%timeit A_np.dot (x_np)


The slowest run took 24.11 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 3.51 ms per loop

Exercise. Implement a matrix-vector product that operates on native Python lists. Assume the 1-D storage of the matrix, as shown above


In [13]:
def matvec_py (m, n, A, x):
    """
    Native Python-based matrix-vector multiply, using lists.
    The dimensions of the matrix A are m-by-n, and x is a
    vector of length n.
    """
    assert type (A) is list
    assert type (x) is list
    assert len (x) >= n
    assert len (A) >= (m*n)

    y = [0.] * m
    
    # @YOUSE: Fill in an implementation here
    for j in range (n):
        for i in range (m):
            y[i] += A[i + j*n] * x[j]
            
#[sum (i*j for i, j in zip (a_row, x) for a_row in A)]
    
    return y

In [14]:
# Estimate a bound on the difference between these two
EPS = np.finfo (float).eps # "machine epsilon"
CONST = 2.0 # Some constant for the error bound
dy_max = CONST * n * EPS

print ("""==> Error bound estimate:
         C*n*eps
         == %g*%g*%g
         == %g
""" % (CONST, n, EPS, dy_max))


==> Error bound estimate:
         C*n*eps
         == 2*2500*2.22045e-16
         == 1.11022e-12


In [15]:
# Run the Numpy version and your code
y_np = A_np.dot (x_np)
y_py = matvec_py (n, n, A_py, x_py)

In [16]:
# Compute the difference between these
dy = y_np - np.reshape (y_py, (n, 1), order='F')
dy_norm = np.linalg.norm (dy, ord=np.inf)

In [17]:
# Summarize the results
from IPython.display import display, Math

comparison = "\leq" if dy_norm <= dy_max else "\gt"
display (Math (
        r'||y_{\textrm{np}} - y_{\textrm{py}}||_{\infty}'
        r' = \textrm{%g} %s \textrm{%g}\ (\textrm{estimated bound})'
        % (dy_norm, comparison, dy_max)
    ))

if n <= 4: # Debug: Print all data for small inputs
    print "@A_np:\n" ; print A_np
    print "@x_np:\n" ; print x_np
    print "@y_np:\n" ; print y_np
    print "@A_py:\n" ; print A_py
    print "@x_py:\n" ; print x_py
    print "@y_py:\n" ; print y_py
    print "@dy:\n" ; print dy

# Trigger an error on likely failure
assert dy_norm <= dy_max


$$||y_{\textrm{np}} - y_{\textrm{py}}||_{\infty} = \textrm{1.13687e-13} \leq \textrm{1.11022e-12}\ (\textrm{estimated bound})$$

In [18]:
%timeit matvec_py (n, n, A_py, x_py)


1 loops, best of 3: 1.77 s per loop

Sparse matrix storage

Take a look at the slides from today's class, which cover the basics of sparse matrix storage formats: link

Let's implement and time some of these routines below. To help you get started with a realistic input, the following code loads the raw data from the Yelp! user-user connectivity subgraph (see top of this notebook to download).


In [ ]:
import cse6040utils as cse6040
import pandas as pd

In [ ]:
edges = pd.read_csv ('UserEdges-1M.csv')
display (edges.head ())

In [ ]:
# Build a set of unique vertex names
V_names = set (edges.Source)
V_names.update (set (edges.Target))

In [ ]:
m = len (edges)
n = len (V_names)
print ("==> |V| == %d, |E| == %d" % (n, m))

In [ ]:
# Create a mapping of names to IDs, and vice-versa
id2name = {}
name2id = {}
for (k, v) in enumerate (V_names):
    if k <= 5: print ("[%d] %s" % (k, v)) # for debugging
    id2name[k] = v
    name2id[v] = k

In [ ]:
# Build two different kinds of sparse matrices using nested dictionaries
A_named_keys = cse6040.sparse_matrix ()
A_numbered_keys = cse6040.sparse_matrix ()

for (k, row) in edges.iterrows ():
    ni = row['Source']
    nj = row['Target']
    A_named_keys[ni][nj] = 1.
    A_named_keys[nj][ni] = 1.
    
    i = name2id[ni]
    j = name2id[nj]
    A_numbered_keys[i][j] = 1.
    A_numbered_keys[j][i] = 1.

Exercise. Complete the following routine, which takes a sparse matrix keyed by names and multiplies it by a vector. (See also: spmv() from Lab 14.)


In [ ]:
# Build a dense vector
x = cse6040.dense_vector (n)

def spmv_named_keys (n, A, x, name2id):
    y = cse6040.dense_vector (n)
    
    # @YOUSE: Complete this routine
    
    return y

In [ ]:
%timeit spmv_named_keys (n, A_named_keys, x, name2id)

In [ ]:
%timeit cse6040.spmv (n, A_numbered_keys, x)

Exercise. Create a coordinate (COO) matrix using SciPy, and measure the time to perform a matrix-vector product.


In [ ]:
import scipy.sparse as sp

In [ ]:
rows = ... # @YOUSE: Complete this statement
cols = ... # @YOUSE: Complete this statement
assert len (rows) == len (cols)
vals = [1.] * len (rows)
A_coo = ... # @YOUSE: Complete this statement
x_np = np.array (x)

In [ ]:
%timeit A_coo.dot (x_np)