Context: a naive implementation of the DESI spectral extraction algorithm would involve eigen-decomposition of a 2M x 2M matrix, assembled from some 16M x 16M and 16M x 2M matrices. This is computationally intractable as-is, but we use the sparseness of the matrix to subdivide the problem into overlapping sub-extractions. Our current default is to extract 25 spectra x 50 wavelengths + edge effect boundaries at a time. This notebook explores potential speedups from even smaller sub-extractions.
Stephen Bailey
Lawrence Berkeley National Lab
Spring 2017
This code requires https://github.com/desihub/specter and https://github.com/sbailey/knlcode. A minimal setup is:
git clone https://github.com/sbailey/knltest
git clone https://github.com/desihub/specter
export PYTHONPATH=`pwd`/specter/py:$PYTHONPATH
cd knltest/code
export OMP_NUM_THREADS=1
python extract-size.py
In [1]:
    
%pylab inline
import numpy as np
from astropy.table import Table
    
    
In [2]:
    
knl = Table.read('../doc/data/extract-size/knl.txt', format='ascii')
hsw = Table.read('../doc/data/extract-size/hsw.txt', format='ascii')
    
In [3]:
    
hsw.colnames
    
    Out[3]:
In [4]:
    
knl.sort('ntot')
hsw.sort('ntot')
    
In [5]:
    
def table2rate2d(data):
    nwave_opts = sorted(set(data['nwave']))
    nspec_opts = sorted(set(data['nspec']))
    rate = np.zeros((len(nwave_opts), len(nspec_opts)))
    for row in data:
        i = nwave_opts.index(row['nwave'])
        j = nspec_opts.index(row['nspec'])
        rate[i,j] = row['rate']
    return rate
rate_knl = table2rate2d(knl)
rate_hsw = table2rate2d(hsw)
    
In [6]:
    
def plotimg(rate, xlabels, ylabels):
    imshow(rate)
    colorbar()
    xticks(range(len(xlabels)), xlabels)
    yticks(range(len(ylabels)), ylabels)
    xlabel('nspec'); ylabel('nwave')
    
In [7]:
    
xlabels = sorted(set(hsw['nspec']))
ylabels = sorted(set(knl['nwave']))
set_cmap('viridis')
figure(figsize=(12,4))
subplot(121); plotimg(rate_hsw, xlabels, ylabels); title('Haswell rate')
subplot(122); plotimg(rate_knl, xlabels, ylabels); title('KNL rate')
    
    Out[7]:
    
    
In [8]:
    
figure(figsize=(12,4))
subplot(121); plotimg(rate_hsw/rate_hsw[-1,-1], xlabels, ylabels); title('Haswell rate improvement')
subplot(122); plotimg(rate_knl/rate_knl[-1,-1], xlabels, ylabels); title('KNL rate improvement')
    
    Out[8]:
    
In [9]:
    
r = np.max(rate_hsw) / np.max(rate_knl)
print("Haswell/KNL = {}".format(r))
plotimg(rate_hsw/rate_knl, xlabels, ylabels)
title('Haswell / KNL performance')
    
    
    Out[9]:
    
There was a theory that L2 cache misses on KNL were hurting its performance relative to Haswell and that going to smaller extraction sizes would help. This doesn't appear to be the case -- although both Haswell and KNL prefer smaller extraction sizes, Haswell continues to significantly outperform KNL by much more than the ratio of clock speeds (2.3/1.4).
In [ ]: