The dynd data structure describes memory layout using two components, a type and a block of arrmeta (array metadata). This notebook takes a tour through the types in dynd, how the arrmeta for each type is laid out, and how the corresponding data looks. For most uses of dynd, this low level perspective is unnecessary, but in cases such as JIT code generation to operate on a dynd array, it is essential.
In [1]:
from __future__ import print_function
import sys, ctypes
from pprint import pprint
import dynd
from dynd import nd, ndt, _lowlevel
print('Python:', sys.version)
print('DyND:', dynd.__version__)
print('LibDyND:', dynd.__libdynd_version__)
Since we're going to be printing information about many different dynd types, let's create a function to do the printing.
In [2]:
def print_type(t):
print('type: %r' % t)
print('data_size: %s' % t.data_size)
print('data_alignment: %d' % t.data_alignment)
print('arrmeta size: %d' % t.arrmeta_size)
There are a bunch of types which have no arrmeta (array metadata). These are types whose memory layout and interpretation requires no extra information to interpret. This includes builtin types such as the integers and floating point numbers, as well as some others like the fixed_dim and cstruct.
Any time the arrmeta has size zero, any function which operates on a dynd type/arrmeta pair will accept NULL as the arrmeta, because it does not use it.
In [3]:
print_type(ndt.bool)
The bool type is stored as one byte, which contains either the value 0 for false or 1 for true. It's using one byte because the typical dynd array pattern offsets are defined in terms of bytes, thus having booleans be bits is not as straightforward. While it doesn't exist yet, having an additional bitarray
type which would act like a one dimensional array of ndt.bool
would be nice as well.
In [4]:
print_type(ndt.int16)
There are signed twos-complement integers with power of two sizes from int8
through int128
. The int128
type is only partially implemented.
In [5]:
print_type(ndt.uint64)
There are unsigned integers from uint8
through uint128
, with the same status as for signed integers.
In [6]:
print_type(ndt.float64)
The float#
types are floating point with IEEE binary# layout. Note that the C++ long double
type is not presently supported by dynd, but will be added.
In [7]:
print_type(ndt.complex_float32)
The complex_float#
types are complex numbers containing a pair of float#
.
In [8]:
print_type(ndt.void)
The void
type means no data. If is used as a way for a dynd callable to indicate no return value.
In [9]:
print_type(ndt.make_pointer(ndt.void))
The void pointer is a special pointer type which has no arrmeta, and is the value type for other pointer types.
In [10]:
print_type(ndt.date)
The date
type represents a date as the number of days after January 1, 1970, in a 32-bit signed integer. It may be desirable to add a time zone either to the type or to the arrmeta when time zone handling is added to dynd.
In [11]:
dt = ndt.make_fixedstring(16, 'utf-16')
print_type(dt)
print('encoding: %s' % dt.encoding)
The fixedstring
type represents a string in a fixed-size buffer, whose size may be shortened through NULL-termination. It is not quite a C string or "stringz", because the string is allowed to use up the whole buffer and not be NULL-terminated. This is equivalent to how NumPy string
and unicode
work.
The name fixedstring
isn't quite satisfactory, but neither is cstring
or stringz
because NULL-termination is not guaranteed.
In [12]:
print_type(ndt.make_fixedbytes(16, 4))
The fixedbytes
type represents a fixed-size buffer of bytes, with a specified alignment.
In [13]:
dt = ndt.string
print_type(dt)
print('encoding: %s' % dt.encoding)
In [14]:
dt = ndt.make_string('utf-32')
print_type(dt)
print('encoding: %s' % dt.encoding)
The string
type represents variable-sized strings using a blockref mechanism. The data of a string consists of two pointers, begin
and end
which consist of a half-open range of bytes. The arrmeta is a single memory block reference, which owns the data of all the strings. For writing strings, this memory block also has an interface for allocating memory for an output string.
In [15]:
dt = ndt.bytes
print_type(dt)
print('target_alignment: %d' % dt.target_alignment)
In [16]:
dt = ndt.make_bytes(4)
print_type(dt)
print('target_alignment: %d' % dt.target_alignment)
The bytes
type has identical data and arrmeta as the string
type, but represents variable-sized raw byte buffers instead of strings.
In [17]:
dt = ndt.json
print_type(dt)
print('encoding: %s' % dt.encoding)
The json
type is a special string type whose data holds a single JSON value. Its data and arrmeta are identical to that of ndt.string
.
In [18]:
print_type(ndt.make_strided_dim(ndt.int32))
The strided_dim
type works most closely like the strided dimensions in NumPy arrays. It indicates one strided dimension of an array, and has arrmeta which consists of a dimension size and a stride, as two intptr_t
values. Notice that the data_size
is zero, because the size of the data is unknown without a corresponding arrmeta.
In [19]:
print_type(ndt.make_fixed_dim(3, ndt.int32))
The fixed_dim
type represents a strided array whose layout is fully specified by the type, not the arrmeta. Note that while the fixed_dim
type itself defines no arrmeta, its element type may, so you cannot assume there is no arrmeta because it is a fixed_dim
type.
It supports multi-dimensional arrays in multiple layouts, such as C order:
In [20]:
print_type(ndt.make_fixed_dim((2,2), ndt.int32))
or F order:
In [21]:
print_type(ndt.make_fixed_dim((2,2), ndt.int32, axis_perm=(0,1)))
In [22]:
print_type(ndt.make_var_dim(ndt.int32))
The var_dim
type represents a variable-sized array, using a blockref to the actual data. The data consists of a pointer and a size, while the arrmeta consists of a reference to the memory block owning the array data, an intptr_t
stride, and an intptr_t
offset which must be added to the data pointer to get the location of the actual data.
To get a typical ragged array, one needs a two-dimensional array with a var_dim
as the second dimension.
In [23]:
print_type(ndt.make_strided_dim(ndt.make_var_dim(ndt.int32)))
In [24]:
print_type(ndt.make_cstruct([ndt.int32, ndt.make_fixedstring(7)], ['id', 'name']))
The cstruct
defines a struct whose data layout matches that produced by the platform C++ compiler for equivalent types. Note that while the cstruct
type itself defines no arrmeta, any of its field types may, so you cannot assume there is no arrmeta because it is a cstruct
type. The arrmeta of all the fields are placed contiguously in order.
In [25]:
print_type(ndt.make_struct([ndt.int32, ndt.make_fixedstring(7)], ['id', 'name']))
The struct
generalizes the cstruct
by allowing the fields to be arbitrarily laid out with any offsets conforming to their field's alignment. Notice that the data_size
is zero, because the struct
requires corresponding arrmeta to have a layout defined. The alignment is the same as the alignment of the cstruct
, because the struct itself must be aligned enough to guarantee alignment of its most aligned field.
The arrmeta of the struct
is an intptr_t
array of all the data offsets. The arrmeta of all the fields are placed contiguously in order, after the offsets array.
In [26]:
print_type(ndt.make_convert(ndt.int32, ndt.int64))
The convert
types represents a type conversion as an expression type. Its underlying storage is that of its "from" type, but its value is that of its "to" type.
In [27]:
print_type(ndt.make_byteswap(ndt.int32))
The byteswap
type represents a value which is byte-swapped from native endianness. All dynd types which are used for calculations have native endianness, but data with non-native endianness can be used via the byteswap
type.
In [28]:
print_type(ndt.make_view(ndt.int64, ndt.float64))
The view
type represents a value whose bytes are being reinterpreted as another type. For example, a float64 being viewed as an int64. Usually, the value of the bytes reinterpreted as a different type will be different.
In [29]:
print_type(ndt.make_unaligned(ndt.int32))
The unaligned
type is a special case of the view
type, where the original type is a fixedbytes
with the same size as the value type. This is the mechanism by which unaligned data is handled in dynd.
In [30]:
print_type(ndt.type('type'))
The type
type is for holding dynd types themselves. These types are reference-counted, and data for them must be zero-initialized and destructed via a reference decrement when it is done.
One place this is used is to get the list of types from a struct
or cstruct
.
In [31]:
dt = ndt.make_struct([ndt.int32, ndt.string], ['x', 'y'])
dt.field_types
Out[31]: