A Low Level View of DyND Types

The dynd data structure describes memory layout using two components, a type and a block of arrmeta (array metadata). This notebook takes a tour through the types in dynd, how the arrmeta for each type is laid out, and how the corresponding data looks. For most uses of dynd, this low level perspective is unnecessary, but in cases such as JIT code generation to operate on a dynd array, it is essential.


In [1]:
from __future__ import print_function
import sys, ctypes
from pprint import pprint
import dynd
from dynd import nd, ndt, _lowlevel
print('Python:', sys.version)
print('DyND:', dynd.__version__)
print('LibDyND:', dynd.__libdynd_version__)


Python: 3.3.3 |Continuum Analytics, Inc.| (default, Dec  3 2013, 11:56:40) [MSC v.1600 64 bit (AMD64)]
DyND: 0.6.5.post024.g5f07540
LibDyND: 0.6.5.post150.g8e9d756

Since we're going to be printing information about many different dynd types, let's create a function to do the printing.


In [2]:
def print_type(t):
    print('type: %r' % t)
    print('data_size: %s' % t.data_size)
    print('data_alignment: %d' % t.data_alignment)
    print('arrmeta size: %d' % t.arrmeta_size)

Types With No Arrmeta

There are a bunch of types which have no arrmeta (array metadata). These are types whose memory layout and interpretation requires no extra information to interpret. This includes builtin types such as the integers and floating point numbers, as well as some others like the fixed_dim and cstruct.

Any time the arrmeta has size zero, any function which operates on a dynd type/arrmeta pair will accept NULL as the arrmeta, because it does not use it.

Primitive Types

ndt.bool


In [3]:
print_type(ndt.bool)


type: ndt.bool
data_size: 1
data_alignment: 1
arrmeta size: 0

The bool type is stored as one byte, which contains either the value 0 for false or 1 for true. It's using one byte because the typical dynd array pattern offsets are defined in terms of bytes, thus having booleans be bits is not as straightforward. While it doesn't exist yet, having an additional bitarray type which would act like a one dimensional array of ndt.bool would be nice as well.

ndt.int


In [4]:
print_type(ndt.int16)


type: ndt.int16
data_size: 2
data_alignment: 2
arrmeta size: 0

There are signed twos-complement integers with power of two sizes from int8 through int128. The int128 type is only partially implemented.

ndt.uint


In [5]:
print_type(ndt.uint64)


type: ndt.uint64
data_size: 8
data_alignment: 8
arrmeta size: 0

There are unsigned integers from uint8 through uint128, with the same status as for signed integers.

ndt.float


In [6]:
print_type(ndt.float64)


type: ndt.float64
data_size: 8
data_alignment: 8
arrmeta size: 0

The float# types are floating point with IEEE binary# layout. Note that the C++ long double type is not presently supported by dynd, but will be added.

ndt.complex_float


In [7]:
print_type(ndt.complex_float32)


type: ndt.complex_float32
data_size: 8
data_alignment: 4
arrmeta size: 0

The complex_float# types are complex numbers containing a pair of float#.

ndt.void


In [8]:
print_type(ndt.void)


type: ndt.void
data_size: None
data_alignment: 1
arrmeta size: 0

The void type means no data. If is used as a way for a dynd callable to indicate no return value.

void pointer


In [9]:
print_type(ndt.make_pointer(ndt.void))


type: ndt.type("pointer[void]")
data_size: 8
data_alignment: 8
arrmeta size: 0

The void pointer is a special pointer type which has no arrmeta, and is the value type for other pointer types.

ndt.date


In [10]:
print_type(ndt.date)


type: ndt.date
data_size: 4
data_alignment: 4
arrmeta size: 0

The date type represents a date as the number of days after January 1, 1970, in a 32-bit signed integer. It may be desirable to add a time zone either to the type or to the arrmeta when time zone handling is added to dynd.

String/Bytes

fixedstring


In [11]:
dt = ndt.make_fixedstring(16, 'utf-16')
print_type(dt)
print('encoding: %s' % dt.encoding)


type: ndt.type("string[16,'utf16']")
data_size: 32
data_alignment: 2
arrmeta size: 0
encoding: utf16

The fixedstring type represents a string in a fixed-size buffer, whose size may be shortened through NULL-termination. It is not quite a C string or "stringz", because the string is allowed to use up the whole buffer and not be NULL-terminated. This is equivalent to how NumPy string and unicode work.

The name fixedstring isn't quite satisfactory, but neither is cstring or stringz because NULL-termination is not guaranteed.

fixedbytes


In [12]:
print_type(ndt.make_fixedbytes(16, 4))


type: ndt.type("bytes[16, align=4]")
data_size: 16
data_alignment: 4
arrmeta size: 0

The fixedbytes type represents a fixed-size buffer of bytes, with a specified alignment.

string


In [13]:
dt = ndt.string
print_type(dt)
print('encoding: %s' % dt.encoding)


type: ndt.string
data_size: 16
data_alignment: 8
arrmeta size: 8
encoding: utf8

In [14]:
dt = ndt.make_string('utf-32')
print_type(dt)
print('encoding: %s' % dt.encoding)


type: ndt.type("string['utf32']")
data_size: 16
data_alignment: 8
arrmeta size: 8
encoding: utf32

The string type represents variable-sized strings using a blockref mechanism. The data of a string consists of two pointers, begin and end which consist of a half-open range of bytes. The arrmeta is a single memory block reference, which owns the data of all the strings. For writing strings, this memory block also has an interface for allocating memory for an output string.

bytes


In [15]:
dt = ndt.bytes
print_type(dt)
print('target_alignment: %d' % dt.target_alignment)


type: ndt.bytes
data_size: 16
data_alignment: 8
arrmeta size: 8
target_alignment: 1

In [16]:
dt = ndt.make_bytes(4)
print_type(dt)
print('target_alignment: %d' % dt.target_alignment)


type: ndt.type("bytes[align=4]")
data_size: 16
data_alignment: 8
arrmeta size: 8
target_alignment: 4

The bytes type has identical data and arrmeta as the string type, but represents variable-sized raw byte buffers instead of strings.

json


In [17]:
dt = ndt.json
print_type(dt)
print('encoding: %s' % dt.encoding)


type: ndt.json
data_size: 16
data_alignment: 8
arrmeta size: 8
encoding: utf8

The json type is a special string type whose data holds a single JSON value. Its data and arrmeta are identical to that of ndt.string.

Array Types

fixed_sym_dim


In [18]:
print_type(ndt.make_fixed_sym_dim(ndt.int32))


type: ndt.type("Fixed * int32")
data_size: None
data_alignment: 4
arrmeta size: 16

The fixed_sym_dim type represents a fixed dimension, but as a symbolic placeholder. It cannot generally be instantiated, instead when used to create an array a dimension size gets substituted to create a concrete fixed_dim.

fixed_dim


In [19]:
print_type(ndt.make_fixed_dim(3, ndt.int32))


type: ndt.type("3 * int32")
data_size: None
data_alignment: 4
arrmeta size: 16

The fixed_dim type represents a strided array whose dimension size is specified by the type, not the arrmeta. Note that while the fixed_dim type itself defines no arrmeta, its element type may, so you cannot assume there is no arrmeta because it is a fixed_dim type. The stride is specified in the arrmeta.

cfixed_dim

The cfixed_dim is like fixed_dim, but requires a specific stride as well, locking down the memory layout precisely. It can be used to specify multi-dimensional arrays in C order:


In [23]:
print_type(ndt.make_cfixed_dim((2,2), ndt.int32))


type: ndt.type("cfixed[2] * cfixed[2] * int32")
data_size: 16
data_alignment: 4
arrmeta size: 32

or F order:


In [22]:
print_type(ndt.make_cfixed_dim((2,2), ndt.int32, axis_perm=(0,1)))


type: ndt.type("cfixed[2, stride=4] * cfixed[2, stride=8] * int32")
data_size: 16
data_alignment: 4
arrmeta size: 0

var_dim


In [24]:
print_type(ndt.make_var_dim(ndt.int32))


type: ndt.type("var * int32")
data_size: 16
data_alignment: 8
arrmeta size: 24

The var_dim type represents a variable-sized array, using a blockref to the actual data. The data consists of a pointer and a size, while the arrmeta consists of a reference to the memory block owning the array data, an intptr_t stride, and an intptr_t offset which must be added to the data pointer to get the location of the actual data.

To get a typical ragged array, one needs a two-dimensional array with a var_dim as the second dimension.


In [25]:
print_type(ndt.make_fixed_sym_dim(ndt.make_var_dim(ndt.int32)))


type: ndt.type("Fixed * var * int32")
data_size: None
data_alignment: 8
arrmeta size: 40

Struct Types

cstruct


In [26]:
print_type(ndt.make_cstruct([ndt.int32, ndt.make_fixedstring(7)], ['id', 'name']))


type: ndt.type("c{id : int32, name : string[7]}")
data_size: 12
data_alignment: 4
arrmeta size: 0

The cstruct defines a struct whose data layout matches that produced by the platform C++ compiler for equivalent types. Note that while the cstruct type itself defines no arrmeta, any of its field types may, so you cannot assume there is no arrmeta because it is a cstruct type. The arrmeta of all the fields are placed contiguously in order.

struct


In [27]:
print_type(ndt.make_struct([ndt.int32, ndt.make_fixedstring(7)], ['id', 'name']))


type: ndt.type("{id : int32, name : string[7]}")
data_size: None
data_alignment: 4
arrmeta size: 16

The struct generalizes the cstruct by allowing the fields to be arbitrarily laid out with any offsets conforming to their field's alignment. Notice that the data_size is zero, because the struct requires corresponding arrmeta to have a layout defined. The alignment is the same as the alignment of the cstruct, because the struct itself must be aligned enough to guarantee alignment of its most aligned field.

The arrmeta of the struct is an intptr_t array of all the data offsets. The arrmeta of all the fields are placed contiguously in order, after the offsets array.

Expression Types

convert


In [28]:
print_type(ndt.make_convert(ndt.int32, ndt.int64))


type: ndt.type("convert[to=int32, from=int64]")
data_size: 8
data_alignment: 8
arrmeta size: 0

The convert types represents a type conversion as an expression type. Its underlying storage is that of its "from" type, but its value is that of its "to" type.

byteswap


In [29]:
print_type(ndt.make_byteswap(ndt.int32))


type: ndt.type("byteswap[int32]")
data_size: 4
data_alignment: 4
arrmeta size: 0

The byteswap type represents a value which is byte-swapped from native endianness. All dynd types which are used for calculations have native endianness, but data with non-native endianness can be used via the byteswap type.

view


In [30]:
print_type(ndt.make_view(ndt.int64, ndt.float64))


type: ndt.type("view[as=int64, original=float64]")
data_size: 8
data_alignment: 8
arrmeta size: 0

The view type represents a value whose bytes are being reinterpreted as another type. For example, a float64 being viewed as an int64. Usually, the value of the bytes reinterpreted as a different type will be different.

unaligned


In [31]:
print_type(ndt.make_unaligned(ndt.int32))


type: ndt.type("unaligned[int32]")
data_size: 4
data_alignment: 1
arrmeta size: 0

The unaligned type is a special case of the view type, where the original type is a fixedbytes with the same size as the value type. This is the mechanism by which unaligned data is handled in dynd.

Types With Lifetimes (i.e. destructors)

type


In [32]:
print_type(ndt.type('type'))


type: ndt.type("type")
data_size: 8
data_alignment: 8
arrmeta size: 0

The type type is for holding dynd types themselves. These types are reference-counted, and data for them must be zero-initialized and destructed via a reference decrement when it is done.

One place this is used is to get the list of types from a struct or cstruct.


In [33]:
dt = ndt.make_struct([ndt.int32, ndt.string], ['x', 'y'])
dt.field_types


Out[33]:
nd.array([ int32, string],
         type="2 * type")

In [ ]: