JIT Compiling To DyND Kernels

This notebook explores how a JIT compiler targeting dynd kernels might be structured. DyND kernels are designed to play a role as a simple C ABI specification that multiple computation engines can consume and target. DyND itself does not contain a JIT compiler, but composes kernels in a hierarchical manner to build up array computations. With a JIT compiler, the resulting kernel hierarchy will be flat in most cases, unless the compiled expression uses another kernel which is not in a symbolic form it can inline.

In [1]:
from IPython.display import SVG

DyND Kernel Review

For full details on how the dynd kernel works, take a look at this document describing them. For here, we'll just review the structure layout and a few properties. The memory structure looks like this:

struct ckernel_prefix {
    typedef void (*destructor_fn_t)(ckernel_prefix *);

    void *function;
    destructor_fn_t destructor;

The first field, function, is function pointer with a prototype defined by the context of the kernel data. What this means is that one may request a "unary single" kernel from a kernel factory, and then the prototype will be as follows:

/** Typedef for a unary operation on a single element */
typedef void (*unary_single_operation_t)(char *dst, const char *src,
                ckernel_prefix *extra);

The way dynd generally builds up kernels is hierarchically, matching the structure of the kernel to the structure of the arrays it is working on. There are more details in this document.

There is nothing inherently hierarchical about a dynd kernel, however, it is merely designed so that this form of hierarchy is supported naturally. In the case of producing a kernel with a JIT compiler, the resulting kernel may have only one extra bit of data, a handle to the JIT compiled object which owns the code behind the function pointer.

Information Flow for a JIT Kernel

Once a kernel has been built, the only information it gets about any elements it operates on are raw pointers. Knowledge of its data type and dimensional structure must be baked into the function pointer or into the extra parameter which is at the end of every kernel function prototype. This fits well with the dynd data model, where the dtype and arrmeta together fully define what is behind a raw pointer.

In dynd, the information flow from input dynd objects through to kernel execution is typically as follows.

In [2]:

image/svg+xml Operand Arrays DType DType Metadata Metadata Data Data Kernel Factory/Linker Kernel Data Kernel Function Operand Arrays DType DType Metadata Metadata Data Data Kernel Data Kernel Function Step 1 - Link Kernel Step 2 - Execute Kernel

After the kernel is created by the factory, the only information it receives are raw data pointers. The reason for this is to keep the C ABI that code using kernels must conform to simple. The hierarchical case requires manipulating only the data pointers and the ckernel_prefix pointer on the way down the hierarchy, instead of having to also worry about the multi-dimensional or heterogeneous structure of the data itself.

One way to think of what the kernel factory in the figure is doing, is as the linking step of a compiler. It puts together a series of pre-fabricated kernel functions, using the dtype and the arrmeta to fill in the kernel data as needed. This leads us to a diagram for how kernel construction can be done when using a JIT compiler. We add a step at the beginning, which compiles the operation to a kernel function pointer using some or all of the dtype and corresponding arrmeta.

In [4]:

image/svg+xml Operand Arrays DType DType Metadata Metadata Data Data JIT Compiler Unbound Kernel Function Kernel Function Operand Arrays DType DType Metadata Metadata Data Data Kernel Data Kernel Function Step 1 - JIT Compile Function Step 2 - Bind Kernel Unbound Kernel Function Kernel Function Kernel Factory/Linker Operand Arrays DType DType Metadata Metadata Data Data Kernel Data Kernel Function Step 3 - Execute Kernel

In step 1, the dtype and arrmeta, which fully describe the data, are used to generate a JIT-compiled kernel function. This function cannot be called by itself, however, it needs to be paired with kernel data memory first. This compilation step is where some choices are made about how many times the JIT should be called versus how general the compiled function is.

At one extreme is a JIT compiler which would always hard code all the dtype/arrmeta information into the compiled function. This would only require a trivial kernel data object which includes a destructor function to free the JIT-compiled function when the kernel is no longer needed. At the other extreme is a JIT compiler which would use very little of the information in the dtype/arrmeta. This requires a kernel linker which will copy all the information about the dtype/arrmeta into the kernel data, perhaps as a reference to dtype dtype and a pointer to the arrmeta.

A happy medium between these two extremes is a goal of a JIT compiler and kernel factory pair. For example some information, like shape and strides, might go into the kernel data, but other information, like the element type of float or double gets baked into the compiled function. Probably the cached kernel function needs to be accompanied by an operation which knows how to build the kernel data out of the input dtype/arrmeta.

In [ ]: