In [1]:
import sys, dynd
print('Python %s' % sys.version)
print('DyND Python Bindings %s' % dynd.__version__)
print('LibDyND %s' % dynd.__libdynd_version__)


Python 3.3.3 |Continuum Analytics, Inc.| (default, Dec  3 2013, 11:56:40) [MSC v.1600 64 bit (AMD64)]
DyND Python Bindings 0.6.5.post024.g5f07540
LibDyND 0.6.5.post150.g8e9d756

Standard dynd import: The nd namespace is for array operations, and the ndt namespace is for types. We'll import numpy as well to show some comparisons.


In [2]:
from dynd import nd, ndt
import numpy as np

Basic usage is quite similar to numpy, the data type is deduced from python types when converting.


In [3]:
nd.array([1,2,3,4,5]) # dynd


Out[3]:
nd.array([1, 2, 3, 4, 5],
         type="5 * int32")

In [4]:
np.array([1,2,3,4,5]) # numpy


Out[4]:
array([1, 2, 3, 4, 5])

In [5]:
nd.array([[1, 1.5], [-1.5, 1]]) # dynd


Out[5]:
nd.array([  [  1, 1.5], [-1.5,    1]],
         type="2 * 2 * float64")

In [6]:
np.array([[1, 1.5], [-1.5, 1]]) # numpy


Out[6]:
array([[ 1. ,  1.5],
       [-1.5,  1. ]])

You can see some differences with numpy at this level already, such as the array dimensionality being included in the type.


The default string is variable-length in dynd. In numpy, you either choose a maximum size, or use object arrays with lower performance.


In [7]:
nd.array(["this", "is", "a", "test"]) # dynd


Out[7]:
nd.array(["this",   "is",    "a", "test"],
         type="4 * string")

In [8]:
np.array(["this", "is", "a", "test"]) # numpy


Out[8]:
array(['this', 'is', 'a', 'test'], 
      dtype='<U4')

Dynd has a variable-length dimension type, which supports ragged arrays. If you give this kind of data to numpy, it uses object arrays which are slower and the array programming functionality in the ragged dimension.


In [9]:
nd.array([[1,5,2], [1], [7,9,10,20,13]]) # dynd


Out[9]:
nd.array([           [1, 5, 2],                  [1], [ 7,  9, 10, 20, 13]],
         type="3 * var * int32")

In [10]:
np.array([[1,5,2], [1], [7,9,10,20,13]]) # numpy


Out[10]:
array([[1, 5, 2], [1], [7, 9, 10, 20, 13]], dtype=object)

As a part of blaze, dynd is based on its datashape type system notation. This provides a convenient way to create arrays of structs.


In [11]:
x = nd.array([('XDress - Type, But Verify', 'Anthony Scopatz', '2013-06-26T11:55', 204),
          ('matplotlib: past, present and future', 'Michael Droettboom', '2013-06-27T10:15', 106),
          ('The DyND Library', 'Mark Wiebe', '2013-06-27T14:35', 203),
          ('The advantages of a scientific IDE', 'Carlos Cordoba', '2013-06-26T11:30', 203)],
        dtype="""{
            title: string,
            presenter: string,
            time: datetime,
            room: int32
        }""")
x


Out[11]:
nd.array([["XDress - Type, But Verify", "Anthony Scopatz", 2013-06-26T11:55, 204],
          ["matplotlib: past, present and future", "Michael Droettboom", 2013-06-27T10:15, 106],
          ["The DyND Library", "Mark Wiebe", 2013-06-27T14:35, 203],
          ["The advantages of a scientific IDE", "Carlos Cordoba", 2013-06-26T11:30, 203]],
         type="4 * {title : string, presenter : string, time : datetime, room : int32}")

Field access can be done directly via python attributes.


In [12]:
x.room


Out[12]:
nd.array([204, 106, 203, 203],
         type="4 * int32")

Or, equivalently, by indexing into the dimension of the struct.


In [13]:
x[:,3]


Out[13]:
nd.array([204, 106, 203, 203],
         type="4 * int32")

In [14]:
x[:, :2]


Out[14]:
nd.array([["XDress - Type, But Verify", "Anthony Scopatz"],
          ["matplotlib: past, present and future", "Michael Droettboom"],
          ["The DyND Library", "Mark Wiebe"],
          ["The advantages of a scientific IDE", "Carlos Cordoba"]],
         type="4 * {title : string, presenter : string}")

There is a preliminary groupby operation implemented. This is very basic compared to advanced stats packages like pandas, but demonstrates how the results of operations like this can be represented well within the dynd array.


In [15]:
g = nd.groupby(x, x.room)
g.groups


Out[15]:
nd.array([106, 203, 204],
         type="3 * int32")

In [16]:
g.eval()[1]


Out[16]:
nd.array([["The DyND Library", "Mark Wiebe", 2013-06-27T14:35, 203],
          ["The advantages of a scientific IDE", "Carlos Cordoba", 2013-06-26T11:30, 203]],
         type="var * {title : string, presenter : string, time : datetime, room : int32}")

Another simple groupby, this time by date.


In [17]:
g = nd.groupby(x, x.time.date)
g.groups


Out[17]:
nd.array([2013-06-26, 2013-06-27],
         type="2 * date")

In [18]:
g.eval()[1]


Out[18]:
nd.array([["matplotlib: past, present and future", "Michael Droettboom", 2013-06-27T10:15, 106],
          ["The DyND Library", "Mark Wiebe", 2013-06-27T14:35, 203]],
         type="var * {title : string, presenter : string, time : datetime, room : int32}")

Evaluation of expressions is deferred, which in some cases can be used to make interesting views of data.


In [19]:
a = nd.array(["2011-07-11", "2012-07-16", "2013-06-24"], dtype="string[10,'A']")

In [20]:
a[0]


Out[20]:
nd.array("2011-07-11",
         type="string[10,'ascii']")

In [21]:
b = a.ucast(ndt.date)

In [22]:
b[0]


Out[22]:
nd.array(2011-07-11,
         type="convert[to=date, from=string[10,'ascii']]")

In [23]:
b[0] = b[0].replace(month=3)

In [24]:
a[0]


Out[24]:
nd.array("2011-03-11",
         type="string[10,'ascii']")

The ascii size-10 strings we created is compatible with numpy. Whenever dynd data is compatible with numpy, we can create views.


In [25]:
c = nd.as_numpy(a)
c


Out[25]:
array([b'2011-03-11', b'2012-07-16', b'2013-06-24'], 
      dtype='|S10')

In [26]:
c[1] = "2010-07-16"

In [27]:
b[1]


Out[27]:
nd.array(2010-07-16,
         type="convert[to=date, from=string[10,'ascii']]")

In [ ]: