DyND Feature Tour

DyND is a multi-dimensional array library for Python, available as a preview release, which provides functionality augmenting the NumPy/Matplotlib/SciPy computing stack. As it is a preview release, we are prepared to change anything in the library, and are currently developing it in an incubation form. Over the course of the releases leading up to 1.0, we expect it to become gradually more stable, and encourage people to experiment with the library and peruse the roadmap to get an idea of where things are going.

DyND is being developed as a component of Blaze, an ambitious project to evolve the NumPy/SciPy technology stack towards out of core and distributed computation, to become a meeting point for big data and scientific computing algorithms.

To get started, after installing the DyND Python bindings using CMake or a conda package, import DyND with the following command. This namespace is designed to always be used as a namespace, and things will not work properly if import * is used.


In [1]:
from dynd import nd, ndt

The namespace 'nd' is for array object constructions and algorithms, and 'ndt' is for the type system.

Making DyND Arrays

The primitive object is nd.array, which can be used to create multi-dimensional arrays of various types.


In [2]:
nd.array(3.14)


Out[2]:
nd.array(3.14, type="float64")

In [3]:
nd.array([1,2,3])


Out[3]:
nd.array([1, 2, 3], type="strided * int32")

In [4]:
nd.array([1,2.5,1+3j])


Out[4]:
nd.array([(1 + 0j), (2.5 + 0j), (1 + 3j)], type="strided * complex[float64]")

In [5]:
nd.array(True)


Out[5]:
nd.array(true, type="bool")

In the case of strings, the scalar string constructor can use the internal Python string data by using a type with the right encoding. To respect the immutability of Python strings, the resulting dynd array is itself immutable as well.


In [6]:
nd.array("a string")


Out[6]:
nd.array("a string", type="string")

In [7]:
nd.array(u"a string")


Out[7]:
nd.array("a string", type="string")

Converting to Python and NumPy Types

In many cases you will want to convert a dynd array back into Python objects or into NumPy. There are functions nd.as_py and nd.as_numpy to do this.


In [8]:
a = nd.array([1,2.5,1+3j])
nd.as_py(a)


Out[8]:
[(1+0j), (2.5+0j), (1+3j)]

In [9]:
a = nd.array("a string")
nd.as_py(a)


Out[9]:
'a string'

In [10]:
a = nd.array([1, 2, 3], dtype=ndt.int16)
nd.as_numpy(a)


Out[10]:
array([1, 2, 3], dtype=int16)

DyND Types

Similar to NumPy, there are data types available in the ndt namespace. The type of these objects is ndt.type, there is no separate set of scalar types as in Numpy.


In [11]:
ndt.bool


Out[11]:
ndt.bool

In [12]:
ndt.int32


Out[12]:
ndt.int32

In [13]:
ndt.complex_float64


Out[13]:
ndt.complex_float64

In [14]:
ndt.complex128


Out[14]:
ndt.complex_float64

DyND understands a subset of the Blaze datashape language (http://blaze.pydata.org/docs/datashape.html), which is used when constructing dynd types from strings.


In [15]:
ndt.type('float32')


Out[15]:
ndt.float32

In [17]:
ndt.type('strided * strided * float64')


Out[17]:
ndt.type('strided * strided * float64')

In [19]:
ndt.type('{x: int32, y: string, z: date}')


Out[19]:
ndt.type('{x : int32, y : string, z : date}')

For constructing more complicated types, the current preview release provides ndt.make_* functions which can be used.


In [20]:
ndt.make_byteswap(ndt.int32)


Out[20]:
ndt.type('byteswap[int32]')

Casting Array Types

One can convert arrays from one type to another using the ucast method. This casts the dtype, which is the array's type after all the array dimensions have been stripped from the front. Note how this casting allows floating point numbers that are actually integers to be converted without complaining.


In [21]:
a = nd.array([1.0, 2.0, 3.0])
a.ucast(ndt.int16)


Out[21]:
nd.array([1, 2, 3], type="strided * convert[to=int16, from=float64]")

What it is doing is checking that the values themselves can be converted. If we add a fraction part to one of the elements, it fails.


In [22]:
a = nd.array([1.0, 2.1, 3.0])
print(a.ucast(ndt.int16).eval())


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-22-618cc2a84645> in <module>()
      1 a = nd.array([1.0, 2.1, 3.0])
----> 2 print(a.ucast(ndt.int16).eval())

C:\Users\mwiebe\Anaconda\lib\site-packages\dynd\_pydynd.pyd in _pydynd.w_array.eval (_pydynd.cxx:7241)()

RuntimeError: fractional part lost while assigning float64 value 2.1 to int16

Similarly, if the conversion would overflow, an error is also raised.


In [24]:
a = nd.array([1.0, 33000, 3.0])
print(a.ucast(ndt.int16).eval())


---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-24-307786d88d86> in <module>()
      1 a = nd.array([1.0, 33000, 3.0])
----> 2 print(a.ucast(ndt.int16).eval())

C:\Users\mwiebe\Anaconda\lib\site-packages\dynd\_pydynd.pyd in _pydynd.w_array.eval (_pydynd.cxx:7241)()

OverflowError: overflow while assigning float64 value 33000 to int16

The mode used for error checking can be customized with a parameter to the ucast method.


In [25]:
a = nd.array([1.0, 3.1, 3.0])
a.ucast(ndt.int16, errmode='overflow')


Out[25]:
nd.array([1, 3, 3], type="strided * convert[to=int16, from=float64, errmode=overflow]")

Interoperability with NumPy

DyND has the ability to seamlessly move data to/from Numpy when the data can be represented in both systems. This allows you to experiment with new features in an existing Numpy/Scipy codebase, without requiring any new redundant copies of your data in memory.


In [26]:
import numpy as np

Numpy arrays can be passed as parameters to DyND functions.


In [28]:
a = np.arange(10)
print(np.sum(a))
# print nd.sum(a)


45

DyND arrays can be passed as parameters to NumPy functions. DyND's arrays provide the PEP3118 buffer protocol when its data is compatible.


In [29]:
a = nd.array(np.arange(6).reshape(2,3))
print(np.sum(a, axis=1))
# print nd.sum(a, axis=1)


[ 3 12]

In addition to the string data type, DyND has a fixedstring type which supports strings in the same form as numpy.


In [30]:
a = np.array(["testing", "one", "two", "three"], dtype='U16')
nd.array(a)


Out[30]:
nd.array(["testing", "one", "two", "three"], type="strided * string[16,'utf32']")

Elementwise GFuncs

(Note: these are not presently working) DyND has a very preliminary version of gfuncs, starting with elementwise operations and reductions. Elementwise gfuncs are very similar to Numpy ufuncs, but in their current form they do not yet do implicit type conversions, so to call a gfunc you must make the types match a kernel signature exactly. There is a small collection of gfuncs included in this preview release.


In [ ]:
nd.maximum(3,10)

Broadcasting of the operands works like it does in Numpy.


In [ ]:
nd.maximum([1,3,5,9,4], 5)

In [ ]:
nd.maximum([[1,5],[10,2],[4,4]], [3,3])

Elementwise Reduction GFuncs

Elementwise reduction gfuncs behave similarly to the ones in Numpy.


In [ ]:
nd.max([[1,5],[10,1],[3,3]])

In [ ]:
nd.max([[1,5],[10,1],[3,3]], axis=1)

In [ ]:
nd.max([[1,5],[10,1],[3,3]], axis=1, keepdims=True)

Groupby Reductions

DyND has a simple nd.groupby function which, when combined with elementwise reductions, can be used for groupby reductions.


In [27]:
data = np.array([0, 1, 2, 3, 4, 5, 6, 7])
by = np.array(['a', 'a', 'c', 'a', 'b', 'c', 'a', 'd'])
groups = ndt.factor_categorical(by)
gb = nd.groupby(data, by, groups)
print(groups)
print(nd.as_py(gb))
#print("max:     ", nd.max(gb, axis=1))
#print("min:     ", nd.min(gb, axis=1))
#print("sum:     ", nd.sum(gb, axis=1))
#print("product: ", nd.product(gb, axis=1))


categorical<string<1,'ascii'>, ["a", "b", "c", "d"]>
[[0, 1, 3, 6], [4], [2, 5], [7]]

Deferred Evaluation

(Note: these particular examples are not working presently) Operations in DyND which produce array results are evaluated in a deferred manner. As a result, an array may contain an arbitrary expression tree representing the computations that led up to it.


In [ ]:
a = nd.array(3)
b = nd.array([3., 4.])
c = a * b

In [ ]:
c.debug_dump()

The effect of this is that c is a view of the expression a + b, and changing the value of elements in b changes the value of c.


In [ ]:
a = nd.array(3)
b = nd.array([3., 4.])
c = a * b
print c
b[1].val_assign(50)
print c

Only a small subset of possible expression trees are currently evaluated by the system. This will change as the evaluation subsystem is fully implemented.


In [ ]:
a = nd.array(3)
b = nd.array([0, 3.1, 5])

In [ ]:
print a + b

In [ ]:
print (a + b) + 2

Evaluating Lazy Expression Arrays

To get a strided ndarray from an expression node, use one of the functions vals, eval_immutable, or eval_copy. The function vals gives you the values in any way it chooses, usually doing as little work as possible. The function eval_immutable guarantees that the result is an immutable array, so will make copies where data is not already immutable. The function eval_copy guarantees that the result is always a fresh copy.


In [ ]:
a = nd.array(3)
b = nd.array([0, 3.1, 5])
print (a + b).vals()
print (a + b).eval_immutable()
print (a + b).eval_copy()

Creating New GFuncs

In Numpy, ufuncs are created using the C API, specifying a set of builtin type signatures and corresponding kernel functions. In BlazeDyND, gfuncs are created in Python by creating an empty gfunc of the desired type, such as elementwise, then adding kernels defined through ctypes.

To demonstrate this, we use a few kernels from the basic_kernels shared library included in BlazeDyND. These kernels are already imported, and available in the nd.elwise_kernels namespace.


In [ ]:
myfunc = nd.gfunc.elwise('myfunc')
myfunc.debug_dump()

To demonstrate the dispatch mechanism, we use different operations for int and float types.


In [ ]:
myfunc.add_kernel(nd.elwise_kernels.add_int32)
myfunc.add_kernel(nd.elwise_kernels.multiply_float64)
myfunc.debug_dump()

In [ ]:
print myfunc([1,2,3,4,5], 2)
print myfunc([1.,2.,3.,4.,5.], 2.)

The same can be done to make an elementwise reduction.


In [ ]:
myred = nd.gfunc.elwise_reduce('myred')
myred.add_kernel(nd.elwise_kernels.add_int32, commutative=True, associative=True)
myred.add_kernel(nd.elwise_kernels.multiply_float64, commutative=True, associative=True)

In [ ]:
print myred([1,2,3,4,5])
print myred([1.,2.,3.,4.,5.])