The Harder Way: C Code generation, Custom Printers, and CSE [1 hour]

One of the most common low level programming languages in use is C. Compiled C code can be optimized for execution speed for many different computers. Python is written in C as well as many of the vectorized operations in NumPy and numerical algorithms in SciPy. It is often necessary to translate a complex mathematical expression into C for optimal execution speeds and memory management. In this notebook you will learn how to automatically translate a complex SymPy expression into C, compile the code, and run the program.

We will continue examining the complex chemical kinetic reaction ordinary differential equation introduced in the previous lesson.

Learning Objectives

After this lesson you will be able to:

  • use a code printer class to convert a SymPy expression to compilable C code
  • use an array compatible assignment to print valid C array code
  • subclass the printer class and modify it to provide custom behavior
  • utilize common sub expression elimination to simplify and speed up the code execution

Import SymPy and enable mathematical printing in the Jupyter notebook.


In [ ]:
import sympy as sym

In [ ]:
sym.init_printing()

Ordinary Differential Equations

The previously generated ordinary differential equations that describe chemical kinetic reactions are loaded below. These expressions describe the right hand side of this mathematical equation:

$$\frac{d\mathbf{y}}{dt} = \mathbf{f}(\mathbf{y}(t))$$

where the state vector $\mathbf{y}(t)$ is made up of 14 states, i.e. $\mathbf{y}(t) \in \mathbb{R}^{14}$.

Below the variable rhs_of_odes represents $\mathbf{f}(\mathbf{y}(t))$ and states represents $\mathbf{y}(t)$.

From now own we will simply use $\mathbf{y}$ instead of $\mathbf{y}(t)$ and assume an implicit function of $t$.


In [ ]:
from scipy2017codegen.chem import load_large_ode

In [ ]:
rhs_of_odes, states = load_large_ode()

Exercise [2 min]

Display the expressions (rhs_of_odes and states), inspect them, and find out their types and dimensions. What are some of the characteristics of the equations (type of mathematical expressions, linear or non-linear, etc)?

Double Click For Solution


In [ ]:
# write your solution here

Compute the Jacobian

As has been shown in the previous lesson the Jacobian of the right hand side of the differential equations is often very useful for computations, such as integration and optimization. With:

$$\frac{d\mathbf{y}}{dt} = \mathbf{f}(\mathbf{y})$$

the Jacobian is defined as:

$$\mathbf{J}(\mathbf{y}) = \frac{\partial\mathbf{f}(\mathbf{y})}{\partial\mathbf{y}}$$

SymPy can compute the Jacobian of matrix objects with the Matrix.jacobian() method.

Exercise [3 min]

Look up the Jacobian in the SymPy documentation then compute the Jacobian and store the result in the variable jac_of_odes. Inspect the resulting Jacobian for dimensionality, type, and the symbolic form.

Double Click For Solution


In [ ]:
# write your answer here

C Code Printing

The two expressions are large and will likely have to be excuted many thousands of times to compute the desired numerical values, so we want them to execute as fast as possible. We can use SymPy to print these expressions as C code.

We will design a double precision C function that evaluates both $\mathbf{f}(\mathbf{y})$ and $\mathbf{J}(\mathbf{y})$ simultaneously given the values of the states $\mathbf{y}$. Below is a basic template for a C program that includes such a function, evaluate_odes(). Our job is to populate the function with the C version of the SymPy expressions.

#include <math.h>
#include <stdio.h>

void evaluate_odes(const double state_vals[14], double rhs_result[14], double jac_result[196])
{
      // We need to fill in the code here using SymPy.
}

int main() {

    // initialize the state vector with some values
    double state_vals[14] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14};
    // create "empty" 1D arrays to hold the results of the computation
    double rhs_result[14];
    double jac_result[196];

    // call the function
    evaluate_odes(state_vals, rhs_result, jac_result);

    // print the computed values to the terminal
    int i;

    printf("The right hand side of the equations evaluates to:\n");
    for (i=0; i < 14; i++) {
        printf("%lf\n", rhs_result[i]);
    }

    printf("\nThe Jacobian evaluates to:\n");
    for (i=0; i < 196; i++) {
        printf("%lf\n", jac_result[i]);
    }

    return 0;
}

Instead of using the ccode convenience function you learned earlier let's use the underlying code printer class to do the printing. This will allow us to modify the class to for custom printing further down.


In [ ]:
from sympy.printing.ccode import C99CodePrinter

All printing classes have to be instantiated and then the .doprint() method can be used to print SymPy expressions. Let's try to print the right hand side of the differential equations.


In [ ]:
printer = C99CodePrinter()

In [ ]:
print(printer.doprint(rhs_of_odes))

In this case, the C code printer does not do what we desire. It does not support printing a SymPy Matrix (see the first line of the output). In C, on possible representation of a matrix is an array type. The array type in C stores contigous values, e.g. doubles, in a chunk of memory. You can declare an array of doubles in C like:

double my_array[10];

The word double is the data type of the individual values in the array which must all be the same. The word my_array is the variable name we choose to name the array and the [10] is the syntax to declare that this array will have 10 values.

The array is "empty" when first declared and can be filled with values like so:

my_array[0] = 5;
my_array[1] = 6.78;
my array[2] = my_array[0] * 12;

or like:

my_array = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

It is possible to declare multidimensional arrays in C that could map more directly to the indices of our two dimensional matrix, but in this case we will map our two dimensional matrix to a one dimenasional array using C contingous row ordering.

The code printers are capable of dealing with this need through the assign_to keyword argument in the .doprint() method but we must define a SymPy object that is appropriate to be assigned to. In our case, since we want to assign a Matrix we need to use an appropriately sized Matrix symbol.


In [ ]:
rhs_result = sym.MatrixSymbol('rhs_result', 14, 1)

In [ ]:
print(rhs_result)

In [ ]:
print(rhs_result[0])

In [ ]:
print(printer.doprint(rhs_of_odes, assign_to=rhs_result))

Notice that we have proper array value assignment and valid lines of C code that can be used in our function.

Excercise [5 min]

Print out valid C code for the Jacobian matrix.

Double Click For Solution


In [ ]:
# write your answer here

Changing the Behavior of the Printer

The SymPy code printers are relatively easy to extend. They are designed such that if you want to change how a particularly SymPy object prints, for example a Symbol, then you only need to modify the _print_Symbol method of the printer. In general, the code printers have a method for every SymPy object and also many builtin types. Use tab completion with C99CodePrinter._print_ to see all of the options.

Once you find the method you want to modify, it is often useful to look at the existing impelementation of the print method to see how the code is written.


In [ ]:
C99CodePrinter._print_Symbol??

Below is a simple example of overiding the Symbol printer method. Note that you should use the self._print() method instead of simply returning the string so that the proper printer, self._print_str(), is dispatched. This is most important if you are printing non-singletons, i.e. expressions that are made up of multiple singletons.


In [ ]:
C99CodePrinter._print_str??

In [ ]:
class MyCodePrinter(C99CodePrinter):
    def _print_Symbol(self, expr):
        return self._print("No matter what symbol you pass in I will always print:\n\nNi!")

In [ ]:
my_printer = MyCodePrinter()

In [ ]:
theta = sym.symbols('theta')
theta

In [ ]:
print(my_printer.doprint(theta))

Exercise [10 min]

One issue with our current code printer is that the expressions use the symbols y0, y1, ..., y13 instead of accessing the values directly from the arrays with state_vals[0], state_vals[1], ..., state_vals[13]. We could go back and rename our SymPy symbols to use brackets, but another way would be to override the _print_Symbol() method to print these symbols as we desire. Modify the code printer so that it prints with the proper array access in the expression.

Double Click For Solution: Subclassing

Double Click For Solution: Exact replacement


In [ ]:
# write your answer here

Bonus Exercise

Do this exercise if you finish the previous one quickly.

It turns out that calling pow() for low value integer exponents executes slower than simply expanding the multiplication. For example pow(x, 2) could be printed as x*x. Modify the CCodePrinter ._print_Pow method to expand the multiplication if the exponent is less than or equal to 4. You may want to have a look at the source code with printer._print_Pow??

Note that a Pow expression has an .exp for exponent and .base for the item being raised. For example $x^2$ would have:

expr = x**2
expr.base == x
expr.exp == 2

Double Click for Solution


In [ ]:
# write your answer here

Common Subexpression Elimination

If you look carefully at the expressions in the two matrices you'll see repeated expressions. These are not ideal in the sense that the computer has to repeat the exact same calculation multiple times. For large expressions this can be a major issue. Compilers, such as gcc, can often eliminate common subexpressions on their own when different optimization flags are invoked but for complex expressions the algorithms in some compilers do not do a thorough job or compilation can take an extremely long time. SymPy has tools to perform common subexpression elimination which is both thorough and reasonably efficient. In particular if gcc is run with the lowest optimization setting -O0 cse can give large speedups.

For example if you have two expressions:

a = x*y + 5
b = x*y + 6

you can convert this to these three expressions:

z = x*y
a = z + 5
b = z + 6

and x*y only has to be computed once.

The cse() function in SymPy returns the subexpression, z = x*y, and the simplified expressions: a = z + 5, b = z + 6.

Here is how it works:


In [ ]:
sm.cse?

In [ ]:
sub_exprs, simplified_rhs = sym.cse(rhs_of_odes)

In [ ]:
for var, expr in sub_exprs:
    sym.pprint(sym.Eq(var, expr))

cse() can return a number of simplified expressions and to do this it returns a list. In our case we have 1 simplified expression that can be accessed as the first item of the list.


In [ ]:
type(simplified_rhs)

In [ ]:
len(simplified_rhs)

In [ ]:
simplified_rhs[0]

You can find common subexpressions among multiple objects also:


In [ ]:
jac_of_odes = rhs_of_odes.jacobian(states)

sub_exprs, simplified_exprs = sym.cse((rhs_of_odes, jac_of_odes))

In [ ]:
for var, expr in sub_exprs:
    sym.pprint(sym.Eq(var, expr))

In [ ]:
simplified_exprs[0]

In [ ]:
simplified_exprs[1]

Exercise [15min]

Use common subexpression elimination to print out C code for your two arrays such that:

double x0 = first_sub_expression;
...
double xN = last_sub_expression;

rhs_result[0] = expressions_containing_the_subexpressions;
...
rhs_result[13] = ...;

jac_result[0] = ...;
...
jac_result[195] = ...;

The code you create can be copied and pasted into the provided template above to make a C program. Refer back to the introduction to C code printing above.

To give you a bit of help we will first introduce the Assignment class. The printers know how to print variable assignments that are defined by an Assignment instance.


In [ ]:
from sympy.printing.codeprinter import Assignment

print(printer.doprint(Assignment(theta, 5)))

The following code demonstrates a way to use cse() to simplify single matrix objects. Note that we use ImmutableDenseMatrix because all dense matrics are internally converted to this type in the printers. Check the type of your matrices to see.


In [ ]:
class CMatrixPrinter(C99CodePrinter):
    def _print_ImmutableDenseMatrix(self, expr):
        sub_exprs, simplified = sym.cse(expr)
        lines = []
        for var, sub_expr in sub_exprs:
            lines.append('double ' + self._print(Assignment(var, sub_expr)))
        M = sym.MatrixSymbol('M', *expr.shape)
        return '\n'.join(lines) + '\n' + self._print(Assignment(M, expr))

In [ ]:
p = CMatrixPrinter()
print(p.doprint(jac_of_odes))

Now create a custom printer that uses cse() on the two matrices simulatneously so that subexpressions are not repeated. Hint: think about how the list printer method, _print_list(self, list_of_exprs), might help here.

Double Click For Solution


In [ ]:
# write your answer here

Bonus Exercise: Compile and Run the C Program

Below we provide you with a template for the C program described above. You can use it by passing in a string like:

c_template.format(code='the holy grail')

Use this template and your code printer to create a file called run.c in the working directory.

To compile the code there are several options. The first is gcc (the GNU C Compiler). If you have Linux, Mac, or Windows (w/ mingw installed) you can use the Jupyter notebook ! command to send your command to the terminal. For example:

ipython
!gcc run.c -lm -o run

This will compile run.c, link against the C math library with -lm and output, -o, to a file run (Mac/Linux) or run.exe (Windows).

On Mac and Linux the program can be executed with:

ipython
!./run

and on Windows:

ipython
!run.exe

Other options are using the clang compiler or Windows cl compiler command:

ipython
!clang run.c -lm -o run
!cl run.c -lm

Double Click For Solution


In [ ]:
c_template = """\
#include <math.h>
#include <stdio.h>

void evaluate_odes(const double state_vals[14], double rhs_result[14], double jac_result[196])
{{
    // We need to fill in the code here using SymPy.
{code}
}}

int main() {{

    // initialize the state vector with some values
    double state_vals[14] = {{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}};
    // create "empty" 1D arrays to hold the results of the computation
    double rhs_result[14];
    double jac_result[196];

    // call the function
    evaluate_odes(state_vals, rhs_result, jac_result);

    // print the computed values to the terminal
    int i;
    printf("The right hand side of the equations evaluates to:\\n");
    for (i=0; i < 14; i++) {{
        printf("%lf\\n", rhs_result[i]);
    }}
    printf("\\nThe Jacobian evaluates to:\\n");
    for (i=0; i < 196; i++) {{
        printf("%lf\\n", jac_result[i]);
    }}

    return 0;
}}\
"""

In [ ]:
# write your answer here