Basic example

Multiplication of two integer numbers is pretty simple. Additional information can be found in the documentation.

Write C code

First of all you should open your favourite text editor and write C code there.

Linux

We want to multiply two integers and get integer as the result

int mul(int a, int b) {
    return a * b;
}

Notice, that it's not mandatory to have main entry point function

Windows

You should specify that function is written in C and needs to be exported to compile DLL in Visual Studio and use it with ctypes

#ifdef __cplusplus
extern "C" {
#endif
__declspec(dllexport) int mul(int a, int b) {

    return a * b;
}
#ifdef __cplusplus
}
#endif

Compile shared library

You can use gcc in Linux

gcc -shared basics.c -o lib_basics.so

Load created library in Python


In [1]:
from ctypes import cdll

In Linux


In [2]:
basics = cdll.LoadLibrary('./lib_basics.so')

In Windows

basics = cdll.LoadLibrary('lib_basics.dll')

Usage

Now it's pretty simple to call mul function just like it was a property of module basics


In [3]:
basics.mul(2, 5)


Out[3]:
10

See? It's easy

Dot product of two arrays

Array is a block of memory split into chunks of a single type and it's easy to use C arrays with ctypes

C function

Just multiply two arrays element-wise and sum the result

#include <stdlib.h>

int dot(int* a, int* b, size_t length) {
    int result = 0;
    while (length --> 0) {
        result += a[length] * b[length];
    }
    return result;
}

Create arrays

We have to import int data type from ctypes


In [4]:
from ctypes import c_int

Say, we need to multiply 3-dimensional vectors


In [5]:
first = (c_int * 3)(1, 2, 3)

We can create an alias for this data type and use it


In [6]:
vector3D = c_int * 3
second = vector3D(4, 5, 6)

Call the function


In [7]:
c_result = basics.dot(first, second, 3)
python_result = sum(a * b for a, b in zip([1, 2, 3], [4, 5, 6]))
print('C code returned', c_result, 'and Python code returned', python_result)


C code returned 32 and Python code returned 32

In [8]:
basics.dot((c_int*1)(2), (c_int*1)(*[3]), 1)


Out[8]:
6

Following examples will cause errors


In [9]:
try:
    vector3D([1, 2, 3])
except:
    print('You cannot pass lists')
try:
    vector3D(0, 1, 2, 3)
except:
    print('Forbidden to provide more elements than it should accept')


You cannot pass lists
Forbidden to provide more elements than it should accept

Available types

Following types can be used to pass arguments to C functions

ctypes type C type Python type
c_bool _Bool bool (1)
c_char char 1-character bytes object
c_wchar wchar_t 1-character string
c_byte char int
c_ubyte unsigned char int
c_short short int
c_ushort unsigned short int
c_int int int
c_uint unsigned int int
c_long long int
ctypes type C type Python type
c_ulong unsigned long int
c_longlong __int64 or long long int
c_ulonglong unsigned __int64 or unsigned long long int
c_size_t size_t int
c_ssize_t ssize_t or Py_ssize_t int
c_float float float
c_double double float
c_longdouble long double float
c_char_p char * (NUL terminated) bytes object or None
c_wchar_p wchar_t * (NUL terminated) string or None
c_void_p void * int or None

Change the long passed to your function

Large numbers (bignum) are represented as arrays of longs in Python:

{d0, d1, d2, ...}

We can change each one of them, so why not?

Read the documentation about Python API for C.

Prepare shared library

C function:

#include <Python.h>

int set_long(PyLongObject* o, long new_value,

             size_t digit) {

    o->ob_digit[digit] = new_value;

    return 0;
}

File should be compiled to shared library (dll in Windows).

Makefile for Linux:

FLAGS=-shared
LIBRARIES=-I/usr/include/python3.4
BUILD_LIBRARY=gcc $(FLAGS) $(LIBRARIES)
all:
    $(BUILD_LIBRARY) setters.c -o lib_setters.so

Use shared library

It's handy to create Python wrapper for this C function


In [10]:
from ctypes import cdll, c_long, c_size_t, c_voidp

setters = cdll.LoadLibrary('./lib_setters.so')

def change_long(a, b=0, digit=0):
    setters.set_long(c_voidp(id(a)), c_long(b), c_size_t(digit))

Don't forget that Python interpreter will not create new objects for small integers like 0, so we should avoid assigning new values to such numbers, because they will be changed everywhere they're used


In [11]:
from ctypes import c_long, c_size_t, c_voidp

def change_long(a, b=0, digit=0):
    args = (a, b, digit)
    if not all(type(a) is int for a in args):
        raise TypeError('All parameters should be of type "int", '
                        'but {} provided'.format(map(type, args)))
    if a + 0 is a:
        raise ValueError('No way. You don\'t want to break '
                         'your interpreter, right?')
    setters.set_long(c_voidp(id(a)), c_long(b), c_size_t(digit))

Recall that we cannot change values of integers inside the Python functions


In [12]:
def variable_info(text, variable):
    print('{:^30}: {:#05x} ({:#x})'.format(text, variable, id(variable)))

def foo(a, new_value):
    a = new_value

a = 2**10
variable_info('Before function call', a)
foo(a, 5)
variable_info('After function call', a)


     Before function call     : 0x400 (0x7f53d6566e50)
     After function call      : 0x400 (0x7f53d6566e50)

Now forget it and take a look at what we've done


In [13]:
a = 2**10
b = a
variable_info('Before function call', a)
change_long(a, 2, 0)
variable_info('After function call', a)
variable_info('What\'s about b? Here it is', b)


     Before function call     : 0x400 (0x7f53d6566e30)
     After function call      : 0x002 (0x7f53d6566e30)
  What's about b? Here it is  : 0x002 (0x7f53d6566e30)

Cross product

i j k
ux uy uz
vx vy vz


In [14]:
from numpy import array, cross

basis = [
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]
]

Plain Python


In [15]:
def py_cross(u, v):
    return [
        u[1] * v[2] - u[2] * v[1],
        u[2] * v[0] - u[0] * v[2],
        u[0] * v[1] - u[1] * v[0]
    ]

py_cross(basis[0], basis[1])


Out[15]:
[0, 0, 1]

NumPy


In [16]:
cross(array(basis[0]), array(basis[1]))


Out[16]:
array([0, 0, 1])

C

It's better not to create new array in C, but to provide resulting one to store result in it

int cross(float* u, float* v, float* w) {
    w[0] = u[1] * v[2] - u[2] * v[1];
    w[1] = u[2] * v[0] - u[0] * v[2];
    w[2] = u[0] * v[1] - u[1] * v[0];
    return 0;
}

In [17]:
from ctypes import cdll
from numpy import empty_like

c_cross = cdll.LoadLibrary('./lib_cross.so')
u = array(basis[0]).astype('f')
v = array(basis[1]).astype('f')
w = empty_like(u)

def cross_wrapper(u, v, w):
    return c_cross.cross(u.ctypes.get_as_parameter(),
                         v.ctypes.get_as_parameter(),
                         w.ctypes.get_as_parameter())

cross_wrapper(u, v, w)
print(w)


[ 0.  0.  1.]

Let's run performance tests


In [18]:
from numpy.random import rand

BIG_ENOUGH_INTEGER = int(1E5)

vectors_u = rand(BIG_ENOUGH_INTEGER, 3).astype('f')
vectors_v = rand(BIG_ENOUGH_INTEGER, 3).astype('f')

print('Vectors u:', vectors_u)


Vectors u: [[ 0.40594503  0.36848962  0.61843681]
 [ 0.785739    0.86852419  0.51811469]
 [ 0.81737614  0.6985932   0.9233498 ]
 ..., 
 [ 0.22998935  0.07304657  0.86156619]
 [ 0.60858935  0.91047096  0.22362016]
 [ 0.23498417  0.88949347  0.89486855]]

In [19]:
%%timeit
for i in range(BIG_ENOUGH_INTEGER):
    py_cross(vectors_u[i], vectors_v[i])


1 loop, best of 3: 387 ms per loop

In [20]:
%%timeit
cross(vectors_u, vectors_v)


100 loops, best of 3: 2.26 ms per loop

In [21]:
%%timeit
vectors_w = empty_like(vectors_u)

for i in range(BIG_ENOUGH_INTEGER):
    cross_wrapper(vectors_u[i], vectors_v[i], vectors_w[i])


1 loop, best of 3: 2.95 s per loop

Are calculations right?


In [22]:
from numpy import allclose

np_result = cross(vectors_u, vectors_v)

py_result = [py_cross(vectors_u[i], vectors_v[i])
             for i in range(BIG_ENOUGH_INTEGER)]
print(allclose(np_result, py_result))

vectors_w = empty_like(vectors_u)
assert sum([cross_wrapper(vectors_u[i], vectors_v[i], vectors_w[i])
            for i in range(BIG_ENOUGH_INTEGER)]) == 0
print(allclose(np_result, vectors_w))


True
True

NumPy versus human: final battle

What have we done wrong? C code should be faster! Maybe Python loop is an issue?

int cross_vectors(float *u, float *v, float *w,
                  size_t amount) {

    while(amount --> 0) {
        cross(&u[amount * 3], &v[amount * 3],
              &w[amount * 3]);
    }
}

It's better to compile with optimization. Also to use -fPIC flag to avoid following compilation error

relocation against symbol `cross' can not be used when making a shared object

What we get:

gcc -shared -fPIC cross.c -O3 -o lib_cross.so

Numpy arrays are flattened when got as C arrays. Also len operator returns amount of rows of matrix. If you want to get the total amount of elements, you should use size method.


In [23]:
vectors_w = empty_like(vectors_u)

c_vectors_u = vectors_u.ctypes.get_as_parameter()
c_vectors_v = vectors_v.ctypes.get_as_parameter()
c_vectors_w = vectors_w.ctypes.get_as_parameter()

In [24]:
%%timeit
vectors_w = empty_like(vectors_u)

c_vectors_w = vectors_w.ctypes.get_as_parameter()
c_cross.cross_vectors(c_vectors_u, c_vectors_v, c_vectors_w, len(vectors_u))


1000 loops, best of 3: 648 µs per loop

In [25]:
c_cross.cross_vectors(c_vectors_u, c_vectors_v, c_vectors_w, len(vectors_u))
print(allclose(np_result, vectors_w))


True

Are you surprised?