Simple Array Operation with Dask

We build up dasks manually to show off simple array computations


In [1]:
import numpy as np

x = np.arange(100)  # a very small big array

In [2]:
from dask.array import ndslice

ndslice(x, (10,), 4)  # with blocks of size ten, get the fourth


Out[2]:
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

Lets build a dask to compute a sum in chunks

First we add functions to pull out each chunk at a time


In [3]:
dsk = {'x': x}
dsk.update({('x', i): (ndslice, 'x', (10,), i) 
                for i in range(10)})
dsk


Out[3]:
{'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
        51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
        68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
        85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]),
 ('x', 0): (<function dask.array.ndslice>, 'x', (10,), 0),
 ('x', 1): (<function dask.array.ndslice>, 'x', (10,), 1),
 ('x', 2): (<function dask.array.ndslice>, 'x', (10,), 2),
 ('x', 3): (<function dask.array.ndslice>, 'x', (10,), 3),
 ('x', 4): (<function dask.array.ndslice>, 'x', (10,), 4),
 ('x', 5): (<function dask.array.ndslice>, 'x', (10,), 5),
 ('x', 6): (<function dask.array.ndslice>, 'x', (10,), 6),
 ('x', 7): (<function dask.array.ndslice>, 'x', (10,), 7),
 ('x', 8): (<function dask.array.ndslice>, 'x', (10,), 8),
 ('x', 9): (<function dask.array.ndslice>, 'x', (10,), 9)}

See how dask.get computes the value of a key


In [4]:
import dask
dask.get(dsk, ('x', 4))


Out[4]:
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

Now we add functions to compute np.sum on each of those chunks


In [5]:
dsk.update({('y', i): (np.sum, ('x', i)) 
                for i in range(10)})
dsk


Out[5]:
{'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
        51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
        68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
        85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]),
 ('x', 0): (<function dask.array.ndslice>, 'x', (10,), 0),
 ('x', 1): (<function dask.array.ndslice>, 'x', (10,), 1),
 ('x', 2): (<function dask.array.ndslice>, 'x', (10,), 2),
 ('x', 3): (<function dask.array.ndslice>, 'x', (10,), 3),
 ('x', 4): (<function dask.array.ndslice>, 'x', (10,), 4),
 ('x', 5): (<function dask.array.ndslice>, 'x', (10,), 5),
 ('x', 6): (<function dask.array.ndslice>, 'x', (10,), 6),
 ('x', 7): (<function dask.array.ndslice>, 'x', (10,), 7),
 ('x', 8): (<function dask.array.ndslice>, 'x', (10,), 8),
 ('x', 9): (<function dask.array.ndslice>, 'x', (10,), 9),
 ('y', 0): (<function numpy.core.fromnumeric.sum>, ('x', 0)),
 ('y', 1): (<function numpy.core.fromnumeric.sum>, ('x', 1)),
 ('y', 2): (<function numpy.core.fromnumeric.sum>, ('x', 2)),
 ('y', 3): (<function numpy.core.fromnumeric.sum>, ('x', 3)),
 ('y', 4): (<function numpy.core.fromnumeric.sum>, ('x', 4)),
 ('y', 5): (<function numpy.core.fromnumeric.sum>, ('x', 5)),
 ('y', 6): (<function numpy.core.fromnumeric.sum>, ('x', 6)),
 ('y', 7): (<function numpy.core.fromnumeric.sum>, ('x', 7)),
 ('y', 8): (<function numpy.core.fromnumeric.sum>, ('x', 8)),
 ('y', 9): (<function numpy.core.fromnumeric.sum>, ('x', 9))}

In [6]:
dask.get(dsk, ('y', 4))


Out[6]:
445

Add a final function to aggregate and then sum again

This np.sum call depends on many keys, specified with a list.


In [7]:
dsk.update({'y-total': (np.sum, [('y', i) for i in range(10)])})
dsk


Out[7]:
{'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
        51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
        68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
        85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]),
 'y-total': (<function numpy.core.fromnumeric.sum>,
  [('y', 0),
   ('y', 1),
   ('y', 2),
   ('y', 3),
   ('y', 4),
   ('y', 5),
   ('y', 6),
   ('y', 7),
   ('y', 8),
   ('y', 9)]),
 ('x', 0): (<function dask.array.ndslice>, 'x', (10,), 0),
 ('x', 1): (<function dask.array.ndslice>, 'x', (10,), 1),
 ('x', 2): (<function dask.array.ndslice>, 'x', (10,), 2),
 ('x', 3): (<function dask.array.ndslice>, 'x', (10,), 3),
 ('x', 4): (<function dask.array.ndslice>, 'x', (10,), 4),
 ('x', 5): (<function dask.array.ndslice>, 'x', (10,), 5),
 ('x', 6): (<function dask.array.ndslice>, 'x', (10,), 6),
 ('x', 7): (<function dask.array.ndslice>, 'x', (10,), 7),
 ('x', 8): (<function dask.array.ndslice>, 'x', (10,), 8),
 ('x', 9): (<function dask.array.ndslice>, 'x', (10,), 9),
 ('y', 0): (<function numpy.core.fromnumeric.sum>, ('x', 0)),
 ('y', 1): (<function numpy.core.fromnumeric.sum>, ('x', 1)),
 ('y', 2): (<function numpy.core.fromnumeric.sum>, ('x', 2)),
 ('y', 3): (<function numpy.core.fromnumeric.sum>, ('x', 3)),
 ('y', 4): (<function numpy.core.fromnumeric.sum>, ('x', 4)),
 ('y', 5): (<function numpy.core.fromnumeric.sum>, ('x', 5)),
 ('y', 6): (<function numpy.core.fromnumeric.sum>, ('x', 6)),
 ('y', 7): (<function numpy.core.fromnumeric.sum>, ('x', 7)),
 ('y', 8): (<function numpy.core.fromnumeric.sum>, ('x', 8)),
 ('y', 9): (<function numpy.core.fromnumeric.sum>, ('x', 9))}

Our result!


In [8]:
dask.get(dsk, 'y-total')


Out[8]:
4950

In [15]:
x.sum()


Out[15]:
4950

Visualize computation


In [18]:
from dask.dot import dot_graph
dot_graph(dsk, filename='simple-sum')

from IPython.display import Image
Image(filename='simple-sum.png')


Writing graph to simple-sum.pdf
Out[18]:

Fortunately Blaze does all of this for us

The book-keeping we just did is a pain. Fortunately Blaze generates these things easily.


In [9]:
from blaze import Data, compute, into
from dask.obj import Array

d = Data(into(Array, x, blockshape=(10,), name='x'))

compute(d.sum()).dask


Out[9]:
{'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
        51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
        68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
        85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]),
 ('x', 0): (<function dask.array.ndslice>, 'x', (10,), 0),
 ('x', 1): (<function dask.array.ndslice>, 'x', (10,), 1),
 ('x', 2): (<function dask.array.ndslice>, 'x', (10,), 2),
 ('x', 3): (<function dask.array.ndslice>, 'x', (10,), 3),
 ('x', 4): (<function dask.array.ndslice>, 'x', (10,), 4),
 ('x', 5): (<function dask.array.ndslice>, 'x', (10,), 5),
 ('x', 6): (<function dask.array.ndslice>, 'x', (10,), 6),
 ('x', 7): (<function dask.array.ndslice>, 'x', (10,), 7),
 ('x', 8): (<function dask.array.ndslice>, 'x', (10,), 8),
 ('x', 9): (<function dask.array.ndslice>, 'x', (10,), 9),
 ('x_1', 0): (<function compute_it at 0x7fc69235cf50>, ('x', 0)),
 ('x_1', 1): (<function compute_it at 0x7fc69235cf50>, ('x', 1)),
 ('x_1', 2): (<function compute_it at 0x7fc69235cf50>, ('x', 2)),
 ('x_1', 3): (<function compute_it at 0x7fc69235cf50>, ('x', 3)),
 ('x_1', 4): (<function compute_it at 0x7fc69235cf50>, ('x', 4)),
 ('x_1', 5): (<function compute_it at 0x7fc69235cf50>, ('x', 5)),
 ('x_1', 6): (<function compute_it at 0x7fc69235cf50>, ('x', 6)),
 ('x_1', 7): (<function compute_it at 0x7fc69235cf50>, ('x', 7)),
 ('x_1', 8): (<function compute_it at 0x7fc69235cf50>, ('x', 8)),
 ('x_1', 9): (<function compute_it at 0x7fc69235cf50>, ('x', 9)),
 ('x_2',): (<toolz.functoolz.Compose at 0x7fc6923b5ec0>,
  [('x_1', 0),
   ('x_1', 1),
   ('x_1', 2),
   ('x_1', 3),
   ('x_1', 4),
   ('x_1', 5),
   ('x_1', 6),
   ('x_1', 7),
   ('x_1', 8),
   ('x_1', 9)])}

and more


In [11]:
expr = ((d+1) - d.mean()).dot(d ** 2)
compute(expr).dask


Out[11]:
{'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
        51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
        68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
        85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]),
 ('x', 0): (<function dask.array.ndslice>, 'x', (10,), 0),
 ('x', 1): (<function dask.array.ndslice>, 'x', (10,), 1),
 ('x', 2): (<function dask.array.ndslice>, 'x', (10,), 2),
 ('x', 3): (<function dask.array.ndslice>, 'x', (10,), 3),
 ('x', 4): (<function dask.array.ndslice>, 'x', (10,), 4),
 ('x', 5): (<function dask.array.ndslice>, 'x', (10,), 5),
 ('x', 6): (<function dask.array.ndslice>, 'x', (10,), 6),
 ('x', 7): (<function dask.array.ndslice>, 'x', (10,), 7),
 ('x', 8): (<function dask.array.ndslice>, 'x', (10,), 8),
 ('x', 9): (<function dask.array.ndslice>, 'x', (10,), 9),
 ('x_10', 0): (<function compute_it at 0x7fc69235cf50>, ('x', 0)),
 ('x_10', 1): (<function compute_it at 0x7fc69235cf50>, ('x', 1)),
 ('x_10', 2): (<function compute_it at 0x7fc69235cf50>, ('x', 2)),
 ('x_10', 3): (<function compute_it at 0x7fc69235cf50>, ('x', 3)),
 ('x_10', 4): (<function compute_it at 0x7fc69235cf50>, ('x', 4)),
 ('x_10', 5): (<function compute_it at 0x7fc69235cf50>, ('x', 5)),
 ('x_10', 6): (<function compute_it at 0x7fc69235cf50>, ('x', 6)),
 ('x_10', 7): (<function compute_it at 0x7fc69235cf50>, ('x', 7)),
 ('x_10', 8): (<function compute_it at 0x7fc69235cf50>, ('x', 8)),
 ('x_10', 9): (<function compute_it at 0x7fc69235cf50>, ('x', 9)),
 ('x_11',): (<toolz.functoolz.Compose at 0x7fc6923151d8>,
  [('x_10', 0),
   ('x_10', 1),
   ('x_10', 2),
   ('x_10', 3),
   ('x_10', 4),
   ('x_10', 5),
   ('x_10', 6),
   ('x_10', 7),
   ('x_10', 8),
   ('x_10', 9)]),
 ('x_12', 0): (<function compute_it at 0x7fc69235cf50>, ('x', 0)),
 ('x_12', 1): (<function compute_it at 0x7fc69235cf50>, ('x', 1)),
 ('x_12', 2): (<function compute_it at 0x7fc69235cf50>, ('x', 2)),
 ('x_12', 3): (<function compute_it at 0x7fc69235cf50>, ('x', 3)),
 ('x_12', 4): (<function compute_it at 0x7fc69235cf50>, ('x', 4)),
 ('x_12', 5): (<function compute_it at 0x7fc69235cf50>, ('x', 5)),
 ('x_12', 6): (<function compute_it at 0x7fc69235cf50>, ('x', 6)),
 ('x_12', 7): (<function compute_it at 0x7fc69235cf50>, ('x', 7)),
 ('x_12', 8): (<function compute_it at 0x7fc69235cf50>, ('x', 8)),
 ('x_12', 9): (<function compute_it at 0x7fc69235cf50>, ('x', 9)),
 ('x_13', 0): (<function compute_it at 0x7fc69235cf50>, ('x_9', 0), ('x_11',)),
 ('x_13', 1): (<function compute_it at 0x7fc69235cf50>, ('x_9', 1), ('x_11',)),
 ('x_13', 2): (<function compute_it at 0x7fc69235cf50>, ('x_9', 2), ('x_11',)),
 ('x_13', 3): (<function compute_it at 0x7fc69235cf50>, ('x_9', 3), ('x_11',)),
 ('x_13', 4): (<function compute_it at 0x7fc69235cf50>, ('x_9', 4), ('x_11',)),
 ('x_13', 5): (<function compute_it at 0x7fc69235cf50>, ('x_9', 5), ('x_11',)),
 ('x_13', 6): (<function compute_it at 0x7fc69235cf50>, ('x_9', 6), ('x_11',)),
 ('x_13', 7): (<function compute_it at 0x7fc69235cf50>, ('x_9', 7), ('x_11',)),
 ('x_13', 8): (<function compute_it at 0x7fc69235cf50>, ('x_9', 8), ('x_11',)),
 ('x_13', 9): (<function compute_it at 0x7fc69235cf50>, ('x_9', 9), ('x_11',)),
 ('x_14',): (<function many at 0x7fc69236c410>,
  [('x_13', 0),
   ('x_13', 1),
   ('x_13', 2),
   ('x_13', 3),
   ('x_13', 4),
   ('x_13', 5),
   ('x_13', 6),
   ('x_13', 7),
   ('x_13', 8),
   ('x_13', 9)],
  [('x_12', 0),
   ('x_12', 1),
   ('x_12', 2),
   ('x_12', 3),
   ('x_12', 4),
   ('x_12', 5),
   ('x_12', 6),
   ('x_12', 7),
   ('x_12', 8),
   ('x_12', 9)]),
 ('x_9', 0): (<function compute_it at 0x7fc69235cf50>, ('x', 0)),
 ('x_9', 1): (<function compute_it at 0x7fc69235cf50>, ('x', 1)),
 ('x_9', 2): (<function compute_it at 0x7fc69235cf50>, ('x', 2)),
 ('x_9', 3): (<function compute_it at 0x7fc69235cf50>, ('x', 3)),
 ('x_9', 4): (<function compute_it at 0x7fc69235cf50>, ('x', 4)),
 ('x_9', 5): (<function compute_it at 0x7fc69235cf50>, ('x', 5)),
 ('x_9', 6): (<function compute_it at 0x7fc69235cf50>, ('x', 6)),
 ('x_9', 7): (<function compute_it at 0x7fc69235cf50>, ('x', 7)),
 ('x_9', 8): (<function compute_it at 0x7fc69235cf50>, ('x', 8)),
 ('x_9', 9): (<function compute_it at 0x7fc69235cf50>, ('x', 9))}

In [14]:
dot_graph(compute(expr).dask, filename='less-simple-dask')
Image(filename='less-simple-dask.png')


Writing graph to less-simple-dask.pdf
Out[14]:

In [ ]: