Basic example

Multiplication of two integer numbers is pretty simple. Additional information can be found in the documentation.

Write C code

First of all you should open your favourite text editor and write C code there.

Linux

We want to multiply two integers and get integer as the result

int mul(int a, int b) {
    return a * b;
}

Notice, that it's not mandatory to have main entry point function

Windows

You should specify that function is written in C and needs to be exported to compile DLL in Visual Studio and use it with ctypes

#ifdef __cplusplus
extern "C" {
#endif
__declspec(dllexport) int mul(int a, int b) {

    return a * b;
}
#ifdef __cplusplus
}
#endif

Compile shared library

You can use gcc in Linux

gcc -shared basics.c -o lib_basics.so

Load created library in Python



In [1]:

    
from ctypes import cdll

In Linux



In [2]:

    
basics = cdll.LoadLibrary('./lib_basics.so')

In Windows

basics = cdll.LoadLibrary('lib_basics.dll')

Usage

Now it's pretty simple to call mul function just like it was a property of module basics



In [3]:

    
basics.mul(2, 5)









    Out[3]:





10

See? It's easy

Dot product of two arrays

Array is a block of memory split into chunks of a single type and it's easy to use C arrays with ctypes

C function

Just multiply two arrays element-wise and sum the result

#include <stdlib.h>

int dot(int* a, int* b, size_t length) {
    int result = 0;
    while (length --> 0) {
        result += a[length] * b[length];
    }
    return result;
}

Create arrays

We have to import int data type from ctypes



In [4]:

    
from ctypes import c_int

Say, we need to multiply 3-dimensional vectors



In [5]:

    
first = (c_int * 3)(1, 2, 3)

We can create an alias for this data type and use it



In [6]:

    
vector3D = c_int * 3
second = vector3D(4, 5, 6)

Call the function



In [7]:

    
c_result = basics.dot(first, second, 3)
python_result = sum(a * b for a, b in zip([1, 2, 3], [4, 5, 6]))
print('C code returned', c_result, 'and Python code returned', python_result)









    



C code returned 32 and Python code returned 32



In [8]:

    
basics.dot((c_int*1)(2), (c_int*1)(*[3]), 1)









    Out[8]:





6

Following examples will cause errors



In [9]:

    
try:
    vector3D([1, 2, 3])
except:
    print('You cannot pass lists')
try:
    vector3D(0, 1, 2, 3)
except:
    print('Forbidden to provide more elements than it should accept')









    



You cannot pass lists
Forbidden to provide more elements than it should accept

Available types

Following types can be used to pass arguments to C functions

ctypes	type	C type Python type
c_bool	_Bool	bool (1)
c_char	char	1-character bytes object
c_wchar	wchar_t	1-character string
c_byte	char	int
c_ubyte	unsigned	char int
c_short	short	int
c_ushort	unsigned	short int
c_int	int	int
c_uint	unsigned	int int
c_long	long	int

ctypes	type	C type Python type
c_ulong	unsigned long	int
c_longlong	__int64 or long long	int
c_ulonglong	unsigned __int64 or unsigned long long	int
c_size_t	size_t	int
c_ssize_t	ssize_t or Py_ssize_t	int
c_float	float	float
c_double	double	float
c_longdouble	long double	float
c_char_p	char * (NUL terminated)	bytes object or None
c_wchar_p	wchar_t * (NUL terminated)	string or None
c_void_p	void *	int or None

Change the long passed to your function

Large numbers (bignum) are represented as arrays of longs in Python:

{d0, d1, d2, ...}

We can change each one of them, so why not?

Read the documentation about Python API for C.

Prepare shared library

C function:

#include <Python.h>

int set_long(PyLongObject* o, long new_value,

             size_t digit) {

    o->ob_digit[digit] = new_value;

    return 0;
}

File should be compiled to shared library (dll in Windows).

Makefile for Linux:

FLAGS=-shared
LIBRARIES=-I/usr/include/python3.4
BUILD_LIBRARY=gcc $(FLAGS) $(LIBRARIES)
all:
    $(BUILD_LIBRARY) setters.c -o lib_setters.so

Use shared library

It's handy to create Python wrapper for this C function



In [10]:

    
from ctypes import cdll, c_long, c_size_t, c_voidp

setters = cdll.LoadLibrary('./lib_setters.so')

def change_long(a, b=0, digit=0):
    setters.set_long(c_voidp(id(a)), c_long(b), c_size_t(digit))

Don't forget that Python interpreter will not create new objects for small integers like 0, so we should avoid assigning new values to such numbers, because they will be changed everywhere they're used



In [11]:

    
from ctypes import c_long, c_size_t, c_voidp

def change_long(a, b=0, digit=0):
    args = (a, b, digit)
    if not all(type(a) is int for a in args):
        raise TypeError('All parameters should be of type "int", '
                        'but {} provided'.format(map(type, args)))
    if a + 0 is a:
        raise ValueError('No way. You don\'t want to break '
                         'your interpreter, right?')
    setters.set_long(c_voidp(id(a)), c_long(b), c_size_t(digit))

Recall that we cannot change values of integers inside the Python functions



In [12]:

    
def variable_info(text, variable):
    print('{:^30}: {:#05x} ({:#x})'.format(text, variable, id(variable)))

def foo(a, new_value):
    a = new_value

a = 2**10
variable_info('Before function call', a)
foo(a, 5)
variable_info('After function call', a)









    



     Before function call     : 0x400 (0x7f53d6566e50)
     After function call      : 0x400 (0x7f53d6566e50)

Now forget it and take a look at what we've done



In [13]:

    
a = 2**10
b = a
variable_info('Before function call', a)
change_long(a, 2, 0)
variable_info('After function call', a)
variable_info('What\'s about b? Here it is', b)









    



     Before function call     : 0x400 (0x7f53d6566e30)
     After function call      : 0x002 (0x7f53d6566e30)
  What's about b? Here it is  : 0x002 (0x7f53d6566e30)

Cross product

i	j	k
ux	uy	uz
vx	vy	vz



In [14]:

    
from numpy import array, cross

basis = [
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]
]

Plain Python



In [15]:

    
def py_cross(u, v):
    return [
        u[1] * v[2] - u[2] * v[1],
        u[2] * v[0] - u[0] * v[2],
        u[0] * v[1] - u[1] * v[0]
    ]

py_cross(basis[0], basis[1])









    Out[15]:





[0, 0, 1]

NumPy



In [16]:

    
cross(array(basis[0]), array(basis[1]))









    Out[16]:





array([0, 0, 1])

C

It's better not to create new array in C, but to provide resulting one to store result in it

int cross(float* u, float* v, float* w) {
    w[0] = u[1] * v[2] - u[2] * v[1];
    w[1] = u[2] * v[0] - u[0] * v[2];
    w[2] = u[0] * v[1] - u[1] * v[0];
    return 0;
}



In [17]:

    
from ctypes import cdll
from numpy import empty_like

c_cross = cdll.LoadLibrary('./lib_cross.so')
u = array(basis[0]).astype('f')
v = array(basis[1]).astype('f')
w = empty_like(u)

def cross_wrapper(u, v, w):
    return c_cross.cross(u.ctypes.get_as_parameter(),
                         v.ctypes.get_as_parameter(),
                         w.ctypes.get_as_parameter())

cross_wrapper(u, v, w)
print(w)









    



[ 0.  0.  1.]

Let's run performance tests



In [18]:

    
from numpy.random import rand

BIG_ENOUGH_INTEGER = int(1E5)

vectors_u = rand(BIG_ENOUGH_INTEGER, 3).astype('f')
vectors_v = rand(BIG_ENOUGH_INTEGER, 3).astype('f')

print('Vectors u:', vectors_u)









    



Vectors u: [[ 0.40594503  0.36848962  0.61843681]
 [ 0.785739    0.86852419  0.51811469]
 [ 0.81737614  0.6985932   0.9233498 ]
 ..., 
 [ 0.22998935  0.07304657  0.86156619]
 [ 0.60858935  0.91047096  0.22362016]
 [ 0.23498417  0.88949347  0.89486855]]



In [19]:

    
%%timeit
for i in range(BIG_ENOUGH_INTEGER):
    py_cross(vectors_u[i], vectors_v[i])









    



1 loop, best of 3: 387 ms per loop



In [20]:

    
%%timeit
cross(vectors_u, vectors_v)









    



100 loops, best of 3: 2.26 ms per loop



In [21]:

    
%%timeit
vectors_w = empty_like(vectors_u)

for i in range(BIG_ENOUGH_INTEGER):
    cross_wrapper(vectors_u[i], vectors_v[i], vectors_w[i])









    



1 loop, best of 3: 2.95 s per loop

Are calculations right?



In [22]:

    
from numpy import allclose

np_result = cross(vectors_u, vectors_v)

py_result = [py_cross(vectors_u[i], vectors_v[i])
             for i in range(BIG_ENOUGH_INTEGER)]
print(allclose(np_result, py_result))

vectors_w = empty_like(vectors_u)
assert sum([cross_wrapper(vectors_u[i], vectors_v[i], vectors_w[i])
            for i in range(BIG_ENOUGH_INTEGER)]) == 0
print(allclose(np_result, vectors_w))









    



True
True

NumPy versus human: final battle

What have we done wrong? C code should be faster! Maybe Python loop is an issue?

int cross_vectors(float *u, float *v, float *w,
                  size_t amount) {

    while(amount --> 0) {
        cross(&u[amount * 3], &v[amount * 3],
              &w[amount * 3]);
    }
}

It's better to compile with optimization. Also to use -fPIC flag to avoid following compilation error

relocation against symbol `cross' can not be used when making a shared object

What we get:

gcc -shared -fPIC cross.c -O3 -o lib_cross.so

Numpy arrays are flattened when got as C arrays. Also len operator returns amount of rows of matrix. If you want to get the total amount of elements, you should use size method.



In [23]:

    
vectors_w = empty_like(vectors_u)

c_vectors_u = vectors_u.ctypes.get_as_parameter()
c_vectors_v = vectors_v.ctypes.get_as_parameter()
c_vectors_w = vectors_w.ctypes.get_as_parameter()



In [24]:

    
%%timeit
vectors_w = empty_like(vectors_u)

c_vectors_w = vectors_w.ctypes.get_as_parameter()
c_cross.cross_vectors(c_vectors_u, c_vectors_v, c_vectors_w, len(vectors_u))









    



1000 loops, best of 3: 648 µs per loop



In [25]:

    
c_cross.cross_vectors(c_vectors_u, c_vectors_v, c_vectors_w, len(vectors_u))
print(allclose(np_result, vectors_w))









    



True

Are you surprised?