Prerequisites

In order to run these examples, it is recommended to use gcc as the default compiler, with OpenMP installed.

Numba and Cython can be installed easily with conda:

conda install numba

conda install cython

The include path of the NumPy C header files might have to be added to the .bashrc (Linux) or .bash_profile (Mac) files to make Numba run:


In [1]:
import numpy as np
print "export CFLAGS=\"-I",np.__path__[0]+'/core/include/ $CFLAGS\"'


export CFLAGS="-I /home/ghajdu/bin/miniconda2/lib/python2.7/site-packages/numpy/core/include/ $CFLAGS"

Append the output of the print command above to the .bashrc (Linux) or .bash_profile (Mac) file in the default user library, if Numba does not work out of the box.

Faster computations in Python

Python's dynamically typed nature makes it easy to quickly write code that works, however, this comes at the cost of execution speed, as each time an operation is executed on a variable, its type has to be checked by the interpreter, in order to execute the appropriate subroutine for the given combination of variable type and operation.

The speed of computations can be greatly increased by utilizing NumPy, where the data are stored in homogeneous C arrays inside array objects. NumPy also provides specialized commands to do calculations quickly on these arrays.

In this example, we will compare different implementations of a truncated Fourier sum, which is calculated for a number of different positions. In astronomical situations, these positions can be, for example, times of measurement of the magnitude of a star. The sum has the form:

$m_i (t_i) = \sum_{j=1}^{n} A_j \cdot \sin( 2 \cdot \pi \cdot f_j \cdot t_i +\phi_j )$,

where $m_i$ is the $i$th magnitude of the star at time $t_i$, $n$ is the number of Fourier terms, $A_j$ is the amplitude, $f_j$ is the frequency, and $\phi_j$ is the phase of the $j$th Fourier term.

Preparation

First, we import the packages that we will be using, and prepare the data. We store the $t_i$, $A_j$, $f_j$ and $\phi_j$ parameters in NumPy arrays.

We also define two functions, one which does the above sum using two for cycles, and another one exploiting array operations of NumPy.

Furthermore, we prepare for the usage of Cython within the Notebook by loading the Cython magic commands into it.


In [2]:
import numpy as np
from numba import jit, autojit
%load_ext Cython


times=np.arange(0,70,0.01)
print "The size of the time array:", times.size

freq = np.arange(0.1,6.0,0.1)*1.7763123
freq[-20:] = freq[:20]*0.744
amp  = 0.05/(freq**2)+0.01
phi  = freq

def fourier_sum_naive(times, freq, amp, phi):
    mags = np.zeros_like(times)
    for i in xrange(times.size):
        for j in xrange(freq.size):
            mags[i] += amp[j] * np.sin( 2 * np.pi * freq[j] * times[i] + phi[j] )
            
    return mags
    
def fourier_sum_numpy(times, freq, amp, phi):
    return np.sum(amp.T.reshape(-1,1) * np.sin( 2 * np.pi * freq.T.reshape(-1,1) * times.reshape(1,-1) + phi.T.reshape(-1,1)), axis=0)


The size of the time array: 7000

Numba

We use the autojit function from Numba to prepare the translation of the Python function to machine code. By default usage, the functions get translated during runtime (in a Just-In-Time JIT manner), when the first call is made to the function. Numba produces optimized machine code, taking into account the type of input the function receives when called for the first time.

Alternatively, the function can be defined like normal, but preceded by the @jit decorator, in order to notify Numba about the functions to optimize, as show in the commented area below.

Note that Numba can be called eagerly, telling it the type of the expected variable, as well as the return type of the function. This can be used to fine-tune the behavior of the function. See more in the Numba documentation: http://numba.pydata.org/numba-doc/dev/user/jit.html

Note that functions can also be compiled ahead of time. For more information, see: http://numba.pydata.org/numba-doc/0.32.0/reference/aot-compilation.html


In [3]:
fourier_sum_naive_numba = autojit(fourier_sum_naive)
fourier_sum_numpy_numba = autojit(fourier_sum_numpy)

#@jit
#def fourier_sum_naive_numba(times, freq, amp, phi):
#    mags = np.zeros_like(times)
#    for i in xrange(times.size):
#        for j in xrange(freq.size):
#            mags[i] += amp[j] * np.sin( 2 * np.pi * freq[j] * times[i] + phi[j] )
#            
#    return mags

#@jit()
#def fourier_sum_numpy_numba(times, freq, amp, phi):
#    return np.sum(amp.T.reshape(-1,1) * np.sin( 2 * np.pi * freq.T.reshape(-1,1) * times.reshape(1,-1) + phi.T.reshape(-1,1)), axis=0)

Cython

Cython works different than Numba: It produces C code that gets compiled before calling the function. NumPy arrays store the data internally in simple C arrays, which can be accessed with the Typed Memoryview feature of Cython. This allows operations on these arrays that completely bypass Python. We can also import C functions, as we could writing pure C by importing the corresponding header files.

In the example implementation of the Fourier sum below, we use define two funtions. The first one handles interactions with Python, while the second one handles the actual calculations. Note that we also pass a temp array, in order to provide a reserved space for the function to work on, eliminating the need to create a NumPy array within the function.

Important note

Normal usage of Cython involves creating a separate .pyx file, with the corresponding Cython code inside, which then gets translated into a source object (.so on Unix-like systems, .dll on Windows), which can be imported into Python like a normal .py file. See the Cython documentation for more information: http://docs.cython.org/en/latest/src/quickstart/build.html


In [4]:
%%cython -a

cimport cython
import numpy as np
from libc.math cimport sin, M_PI

def fourier_sum_cython(times, freq, amp, phi, temp):
    return np.asarray(fourier_sum_purec(times, freq, amp, phi, temp))

@cython.boundscheck(False)
@cython.wraparound(False)
cdef fourier_sum_purec(double[:] times, double[:] freq, double[:] amp, double[:] phi, double[:] temp):
    cdef int i, j, irange, jrange
    irange=len(times)
    jrange=len(freq)
    for i in xrange(irange):
        temp[i]=0
        for j in xrange(jrange):
            temp[i] += amp[j] * sin( 2 * M_PI * freq[j] * times[i] + phi[j] )
    return temp


Out[4]:
Cython: _cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

 01: 
+02: cimport cython
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+03: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 04: from libc.math cimport sin, M_PI
 05: 
+06: def fourier_sum_cython(times, freq, amp, phi, temp):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_1fourier_sum_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_1fourier_sum_cython = {"fourier_sum_cython", (PyCFunction)__pyx_pw_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_1fourier_sum_cython, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_1fourier_sum_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  PyObject *__pyx_v_times = 0;
  PyObject *__pyx_v_freq = 0;
  PyObject *__pyx_v_amp = 0;
  PyObject *__pyx_v_phi = 0;
  PyObject *__pyx_v_temp = 0;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fourier_sum_cython (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_times,&__pyx_n_s_freq,&__pyx_n_s_amp,&__pyx_n_s_phi,&__pyx_n_s_temp,0};
    PyObject* values[5] = {0,0,0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_times)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        case  1:
        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_freq)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython", 1, 5, 5, 1); __PYX_ERR(0, 6, __pyx_L3_error)
        }
        case  2:
        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_amp)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython", 1, 5, 5, 2); __PYX_ERR(0, 6, __pyx_L3_error)
        }
        case  3:
        if (likely((values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_phi)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython", 1, 5, 5, 3); __PYX_ERR(0, 6, __pyx_L3_error)
        }
        case  4:
        if (likely((values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_temp)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython", 1, 5, 5, 4); __PYX_ERR(0, 6, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "fourier_sum_cython") < 0)) __PYX_ERR(0, 6, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 5) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
      values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
      values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
    }
    __pyx_v_times = values[0];
    __pyx_v_freq = values[1];
    __pyx_v_amp = values[2];
    __pyx_v_phi = values[3];
    __pyx_v_temp = values[4];
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("fourier_sum_cython", 1, 5, 5, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 6, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e.fourier_sum_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_pf_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_fourier_sum_cython(__pyx_self, __pyx_v_times, __pyx_v_freq, __pyx_v_amp, __pyx_v_phi, __pyx_v_temp);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_fourier_sum_cython(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_times, PyObject *__pyx_v_freq, PyObject *__pyx_v_amp, PyObject *__pyx_v_phi, PyObject *__pyx_v_temp) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fourier_sum_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_5, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_7, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_8, 1);
  __Pyx_XDECREF(__pyx_t_9);
  __Pyx_XDECREF(__pyx_t_10);
  __Pyx_AddTraceback("_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e.fourier_sum_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__14 = PyTuple_Pack(5, __pyx_n_s_times, __pyx_n_s_freq, __pyx_n_s_amp, __pyx_n_s_phi, __pyx_n_s_temp); if (unlikely(!__pyx_tuple__14)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__14);
  __Pyx_GIVEREF(__pyx_tuple__14);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_1fourier_sum_cython, NULL, __pyx_n_s_cython_magic_ec5a80ebe17c2abd87); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_fourier_sum_cython, __pyx_t_1) < 0) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(5, 0, 5, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_home_ghajdu_cache_ipython_cytho, __pyx_n_s_fourier_sum_cython, 6, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) __PYX_ERR(0, 6, __pyx_L1_error)
+07:     return np.asarray(fourier_sum_purec(times, freq, amp, phi, temp))
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_asarray); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_4 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_times);
  if (unlikely(!__pyx_t_4.memview)) __PYX_ERR(0, 7, __pyx_L1_error)
  __pyx_t_5 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_freq);
  if (unlikely(!__pyx_t_5.memview)) __PYX_ERR(0, 7, __pyx_L1_error)
  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_amp);
  if (unlikely(!__pyx_t_6.memview)) __PYX_ERR(0, 7, __pyx_L1_error)
  __pyx_t_7 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_phi);
  if (unlikely(!__pyx_t_7.memview)) __PYX_ERR(0, 7, __pyx_L1_error)
  __pyx_t_8 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_temp);
  if (unlikely(!__pyx_t_8.memview)) __PYX_ERR(0, 7, __pyx_L1_error)
  __pyx_t_2 = __pyx_f_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_fourier_sum_purec(__pyx_t_4, __pyx_t_5, __pyx_t_6, __pyx_t_7, __pyx_t_8); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1);
  __pyx_t_4.memview = NULL;
  __pyx_t_4.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_5, 1);
  __pyx_t_5.memview = NULL;
  __pyx_t_5.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);
  __pyx_t_6.memview = NULL;
  __pyx_t_6.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_7, 1);
  __pyx_t_7.memview = NULL;
  __pyx_t_7.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_8, 1);
  __pyx_t_8.memview = NULL;
  __pyx_t_8.data = NULL;
  __pyx_t_9 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_9 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_9)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_9);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_9) {
    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_GOTREF(__pyx_t_1);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_9, __pyx_t_2};
      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_9); __pyx_t_9 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_9, __pyx_t_2};
      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_9); __pyx_t_9 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    } else
    #endif
    {
      __pyx_t_10 = PyTuple_New(1+1); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_10);
      __Pyx_GIVEREF(__pyx_t_9); PyTuple_SET_ITEM(__pyx_t_10, 0, __pyx_t_9); __pyx_t_9 = NULL;
      __Pyx_GIVEREF(__pyx_t_2);
      PyTuple_SET_ITEM(__pyx_t_10, 0+1, __pyx_t_2);
      __pyx_t_2 = 0;
      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_10, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;
 08: 
 09: @cython.boundscheck(False)
 10: @cython.wraparound(False)
+11: cdef fourier_sum_purec(double[:] times, double[:] freq, double[:] amp, double[:] phi, double[:] temp):
static PyObject *__pyx_f_46_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e_fourier_sum_purec(__Pyx_memviewslice __pyx_v_times, __Pyx_memviewslice __pyx_v_freq, __Pyx_memviewslice __pyx_v_amp, __Pyx_memviewslice __pyx_v_phi, __Pyx_memviewslice __pyx_v_temp) {
  int __pyx_v_i;
  int __pyx_v_j;
  int __pyx_v_irange;
  int __pyx_v_jrange;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fourier_sum_purec", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("_cython_magic_ec5a80ebe17c2abd87108e86b8d3c94e.fourier_sum_purec", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
 12:     cdef int i, j, irange, jrange
+13:     irange=len(times)
  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_times, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = PyObject_Length(__pyx_t_1); if (unlikely(__pyx_t_2 == -1)) __PYX_ERR(0, 13, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_irange = __pyx_t_2;
+14:     jrange=len(freq)
  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_freq, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = PyObject_Length(__pyx_t_1); if (unlikely(__pyx_t_2 == -1)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_jrange = __pyx_t_2;
+15:     for i in xrange(irange):
  __pyx_t_3 = __pyx_v_irange;
  for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
    __pyx_v_i = __pyx_t_4;
+16:         temp[i]=0
    __pyx_t_5 = __pyx_v_i;
    *((double *) ( /* dim=0 */ (__pyx_v_temp.data + __pyx_t_5 * __pyx_v_temp.strides[0]) )) = 0.0;
+17:         for j in xrange(jrange):
    __pyx_t_6 = __pyx_v_jrange;
    for (__pyx_t_7 = 0; __pyx_t_7 < __pyx_t_6; __pyx_t_7+=1) {
      __pyx_v_j = __pyx_t_7;
+18:             temp[i] += amp[j] * sin( 2 * M_PI * freq[j] * times[i] + phi[j] )
      __pyx_t_8 = __pyx_v_j;
      __pyx_t_9 = __pyx_v_j;
      __pyx_t_10 = __pyx_v_i;
      __pyx_t_11 = __pyx_v_j;
      __pyx_t_12 = __pyx_v_i;
      *((double *) ( /* dim=0 */ (__pyx_v_temp.data + __pyx_t_12 * __pyx_v_temp.strides[0]) )) += ((*((double *) ( /* dim=0 */ (__pyx_v_amp.data + __pyx_t_8 * __pyx_v_amp.strides[0]) ))) * sin(((((2.0 * M_PI) * (*((double *) ( /* dim=0 */ (__pyx_v_freq.data + __pyx_t_9 * __pyx_v_freq.strides[0]) )))) * (*((double *) ( /* dim=0 */ (__pyx_v_times.data + __pyx_t_10 * __pyx_v_times.strides[0]) )))) + (*((double *) ( /* dim=0 */ (__pyx_v_phi.data + __pyx_t_11 * __pyx_v_phi.strides[0]) ))))));
    }
  }
+19:     return temp
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_temp, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 19, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

We called the cython command with the -a argument, that makes it produce an html summary of the translated code. White parts show code that don't interact with Python at all. Optimizing Cython involves minimizing the code that is "yellow", making the code "whiter", that is, executing more code in C.

Cython + OpenMP

We can parallelize the execution of the code using OpenMP in the parts where the code is executed completely outside of Python, but we need to release the Global Interpreter Lock (GIL) first. The prange command replaces the range or xrange command in the for cycle we would like to execute in parallel. We can also call OpenMP functions, for example, to get the number of processor cores of the system.

Note that the number of threads, the scheduler and the chunksize can have a large effect on the performance of the code. While optimizing, you should try different chunksizes (the default is 1).


In [5]:
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force -a

cimport cython
cimport openmp
import numpy as np
from libc.math cimport sin, M_PI
from cython.parallel import parallel, prange

def fourier_sum_cython_omp(times, freq, amp, phi, temp):
    return np.asarray(fourier_sum_purec_omp(times, freq, amp, phi, temp))

@cython.boundscheck(False)
@cython.wraparound(False)
cdef fourier_sum_purec_omp(double[:] times, double[:] freq, double[:] amp, double[:] phi, double[:] temp):
    cdef int i, j, irange, jrange
    irange=len(times)
    jrange=len(freq)
    #print openmp.omp_get_num_procs()
    with nogil, parallel(num_threads=4):
        for i in prange(irange, schedule='dynamic', chunksize=10):
            temp[i]=0
            for j in xrange(jrange):
                temp[i] += amp[j] * sin( 2 * M_PI * freq[j] * times[i] + phi[j] )
    return temp


Out[5]:
Cython: _cython_magic_1cd56569f3d2a9008e919ac54e3d0f01.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

 01: 
+02: cimport cython
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 03: cimport openmp
+04: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 05: from libc.math cimport sin, M_PI
 06: from cython.parallel import parallel, prange
 07: 
+08: def fourier_sum_cython_omp(times, freq, amp, phi, temp):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_1fourier_sum_cython_omp(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_1fourier_sum_cython_omp = {"fourier_sum_cython_omp", (PyCFunction)__pyx_pw_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_1fourier_sum_cython_omp, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_1fourier_sum_cython_omp(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  PyObject *__pyx_v_times = 0;
  PyObject *__pyx_v_freq = 0;
  PyObject *__pyx_v_amp = 0;
  PyObject *__pyx_v_phi = 0;
  PyObject *__pyx_v_temp = 0;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fourier_sum_cython_omp (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_times,&__pyx_n_s_freq,&__pyx_n_s_amp,&__pyx_n_s_phi,&__pyx_n_s_temp,0};
    PyObject* values[5] = {0,0,0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_times)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        case  1:
        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_freq)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython_omp", 1, 5, 5, 1); __PYX_ERR(0, 8, __pyx_L3_error)
        }
        case  2:
        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_amp)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython_omp", 1, 5, 5, 2); __PYX_ERR(0, 8, __pyx_L3_error)
        }
        case  3:
        if (likely((values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_phi)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython_omp", 1, 5, 5, 3); __PYX_ERR(0, 8, __pyx_L3_error)
        }
        case  4:
        if (likely((values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_temp)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("fourier_sum_cython_omp", 1, 5, 5, 4); __PYX_ERR(0, 8, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "fourier_sum_cython_omp") < 0)) __PYX_ERR(0, 8, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 5) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
      values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
      values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
    }
    __pyx_v_times = values[0];
    __pyx_v_freq = values[1];
    __pyx_v_amp = values[2];
    __pyx_v_phi = values[3];
    __pyx_v_temp = values[4];
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("fourier_sum_cython_omp", 1, 5, 5, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 8, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01.fourier_sum_cython_omp", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_pf_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_fourier_sum_cython_omp(__pyx_self, __pyx_v_times, __pyx_v_freq, __pyx_v_amp, __pyx_v_phi, __pyx_v_temp);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_fourier_sum_cython_omp(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_times, PyObject *__pyx_v_freq, PyObject *__pyx_v_amp, PyObject *__pyx_v_phi, PyObject *__pyx_v_temp) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fourier_sum_cython_omp", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_5, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_7, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_t_8, 1);
  __Pyx_XDECREF(__pyx_t_9);
  __Pyx_XDECREF(__pyx_t_10);
  __Pyx_AddTraceback("_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01.fourier_sum_cython_omp", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__14 = PyTuple_Pack(5, __pyx_n_s_times, __pyx_n_s_freq, __pyx_n_s_amp, __pyx_n_s_phi, __pyx_n_s_temp); if (unlikely(!__pyx_tuple__14)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__14);
  __Pyx_GIVEREF(__pyx_tuple__14);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_1fourier_sum_cython_omp, NULL, __pyx_n_s_cython_magic_1cd56569f3d2a9008e); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_fourier_sum_cython_omp, __pyx_t_1) < 0) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(5, 0, 5, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_home_ghajdu_cache_ipython_cytho, __pyx_n_s_fourier_sum_cython_omp, 8, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) __PYX_ERR(0, 8, __pyx_L1_error)
+09:     return np.asarray(fourier_sum_purec_omp(times, freq, amp, phi, temp))
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_asarray); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_4 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_times);
  if (unlikely(!__pyx_t_4.memview)) __PYX_ERR(0, 9, __pyx_L1_error)
  __pyx_t_5 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_freq);
  if (unlikely(!__pyx_t_5.memview)) __PYX_ERR(0, 9, __pyx_L1_error)
  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_amp);
  if (unlikely(!__pyx_t_6.memview)) __PYX_ERR(0, 9, __pyx_L1_error)
  __pyx_t_7 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_phi);
  if (unlikely(!__pyx_t_7.memview)) __PYX_ERR(0, 9, __pyx_L1_error)
  __pyx_t_8 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_temp);
  if (unlikely(!__pyx_t_8.memview)) __PYX_ERR(0, 9, __pyx_L1_error)
  __pyx_t_2 = __pyx_f_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_fourier_sum_purec_omp(__pyx_t_4, __pyx_t_5, __pyx_t_6, __pyx_t_7, __pyx_t_8); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1);
  __pyx_t_4.memview = NULL;
  __pyx_t_4.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_5, 1);
  __pyx_t_5.memview = NULL;
  __pyx_t_5.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);
  __pyx_t_6.memview = NULL;
  __pyx_t_6.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_7, 1);
  __pyx_t_7.memview = NULL;
  __pyx_t_7.data = NULL;
  __PYX_XDEC_MEMVIEW(&__pyx_t_8, 1);
  __pyx_t_8.memview = NULL;
  __pyx_t_8.data = NULL;
  __pyx_t_9 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_9 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_9)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_9);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_9) {
    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_GOTREF(__pyx_t_1);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_9, __pyx_t_2};
      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_9); __pyx_t_9 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_9, __pyx_t_2};
      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_9); __pyx_t_9 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    } else
    #endif
    {
      __pyx_t_10 = PyTuple_New(1+1); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_10);
      __Pyx_GIVEREF(__pyx_t_9); PyTuple_SET_ITEM(__pyx_t_10, 0, __pyx_t_9); __pyx_t_9 = NULL;
      __Pyx_GIVEREF(__pyx_t_2);
      PyTuple_SET_ITEM(__pyx_t_10, 0+1, __pyx_t_2);
      __pyx_t_2 = 0;
      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_10, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;
 10: 
 11: @cython.boundscheck(False)
 12: @cython.wraparound(False)
+13: cdef fourier_sum_purec_omp(double[:] times, double[:] freq, double[:] amp, double[:] phi, double[:] temp):
static PyObject *__pyx_f_46_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01_fourier_sum_purec_omp(__Pyx_memviewslice __pyx_v_times, __Pyx_memviewslice __pyx_v_freq, __Pyx_memviewslice __pyx_v_amp, __Pyx_memviewslice __pyx_v_phi, __Pyx_memviewslice __pyx_v_temp) {
  int __pyx_v_i;
  int __pyx_v_j;
  CYTHON_UNUSED int __pyx_v_irange;
  int __pyx_v_jrange;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fourier_sum_purec_omp", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("_cython_magic_1cd56569f3d2a9008e919ac54e3d0f01.fourier_sum_purec_omp", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
 14:     cdef int i, j, irange, jrange
+15:     irange=len(times)
  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_times, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 15, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = PyObject_Length(__pyx_t_1); if (unlikely(__pyx_t_2 == -1)) __PYX_ERR(0, 15, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_irange = __pyx_t_2;
+16:     jrange=len(freq)
  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_freq, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 16, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = PyObject_Length(__pyx_t_1); if (unlikely(__pyx_t_2 == -1)) __PYX_ERR(0, 16, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_jrange = __pyx_t_2;
 17:     #print openmp.omp_get_num_procs()
+18:     with nogil, parallel(num_threads=4):
  {
      #ifdef WITH_THREAD
      PyThreadState *_save;
      Py_UNBLOCK_THREADS
      #endif
      /*try:*/ {
        {
            #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
                #undef likely
                #undef unlikely
                #define likely(x)   (x)
                #define unlikely(x) (x)
            #endif
            #ifdef _OPENMP
            #pragma omp parallel  num_threads(4)
            #endif /* _OPENMP */
            {
/* … */
      /*finally:*/ {
        /*normal exit:*/{
          #ifdef WITH_THREAD
          Py_BLOCK_THREADS
          #endif
          goto __pyx_L5;
        }
        __pyx_L5:;
      }
  }
+19:         for i in prange(irange, schedule='dynamic', chunksize=10):
                __pyx_t_3 = __pyx_v_irange;
                if (1 == 0) abort();
                {
                    __pyx_t_5 = (__pyx_t_3 - 0 + 1 - 1/abs(1)) / 1;
                    if (__pyx_t_5 > 0)
                    {
                        #ifdef _OPENMP
                        #pragma omp for firstprivate(__pyx_v_i) lastprivate(__pyx_v_i) lastprivate(__pyx_v_j)
/* … */
                __pyx_t_3 = __pyx_v_irange;
                if (1 == 0) abort();
                {
                    __pyx_t_5 = (__pyx_t_3 - 0 + 1 - 1/abs(1)) / 1;
                    if (__pyx_t_5 > 0)
                    {
                        #ifdef _OPENMP
                        #pragma omp for firstprivate(__pyx_v_i) lastprivate(__pyx_v_i) lastprivate(__pyx_v_j) schedule(dynamic, __pyx_t_6)
                        #endif /* _OPENMP */
                        for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_5; __pyx_t_4++){
                            {
                                __pyx_v_i = (int)(0 + 1 * __pyx_t_4);
                                /* Initialize private variables to invalid values */
                                __pyx_v_j = ((int)0xbad0bad0);
+20:             temp[i]=0
                                __pyx_t_7 = __pyx_v_i;
                                *((double *) ( /* dim=0 */ (__pyx_v_temp.data + __pyx_t_7 * __pyx_v_temp.strides[0]) )) = 0.0;
+21:             for j in xrange(jrange):
                                __pyx_t_8 = __pyx_v_jrange;
                                for (__pyx_t_9 = 0; __pyx_t_9 < __pyx_t_8; __pyx_t_9+=1) {
                                  __pyx_v_j = __pyx_t_9;
+22:                 temp[i] += amp[j] * sin( 2 * M_PI * freq[j] * times[i] + phi[j] )
                                  __pyx_t_10 = __pyx_v_j;
                                  __pyx_t_11 = __pyx_v_j;
                                  __pyx_t_12 = __pyx_v_i;
                                  __pyx_t_13 = __pyx_v_j;
                                  __pyx_t_14 = __pyx_v_i;
                                  *((double *) ( /* dim=0 */ (__pyx_v_temp.data + __pyx_t_14 * __pyx_v_temp.strides[0]) )) += ((*((double *) ( /* dim=0 */ (__pyx_v_amp.data + __pyx_t_10 * __pyx_v_amp.strides[0]) ))) * sin(((((2.0 * M_PI) * (*((double *) ( /* dim=0 */ (__pyx_v_freq.data + __pyx_t_11 * __pyx_v_freq.strides[0]) )))) * (*((double *) ( /* dim=0 */ (__pyx_v_times.data + __pyx_t_12 * __pyx_v_times.strides[0]) )))) + (*((double *) ( /* dim=0 */ (__pyx_v_phi.data + __pyx_t_13 * __pyx_v_phi.strides[0]) ))))));
                                }
                            }
                        }
                    }
                }
            }
        }
        #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
            #undef likely
            #undef unlikely
            #define likely(x)   __builtin_expect(!!(x), 1)
            #define unlikely(x) __builtin_expect(!!(x), 0)
        #endif
      }
+23:     return temp
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_temp, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 23, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

Comparison

Finally, we compare the execution times of the implementations of the funtions.


In [6]:
temp=np.zeros_like(times)

amps_naive      = fourier_sum_naive(times, freq, amp, phi)
amps_numpy      = fourier_sum_numpy(times, freq, amp, phi)
amps_numba1     = fourier_sum_naive_numba(times, freq, amp, phi)
amps_numba2     = fourier_sum_numpy_numba(times, freq, amp, phi)
amps_cython     = fourier_sum_cython(times, freq, amp, phi, temp)
amps_cython_omp = fourier_sum_cython_omp(times, freq, amp, phi, temp)

%timeit -n 5  -r 5  fourier_sum_naive(times, freq, amp, phi)
%timeit -n 10 -r 10 fourier_sum_numpy(times, freq, amp, phi)
%timeit -n 10 -r 10 fourier_sum_naive_numba(times, freq, amp, phi)
%timeit -n 10 -r 10 fourier_sum_numpy_numba(times, freq, amp, phi)
%timeit -n 10 -r 10 fourier_sum_cython(times, freq, amp, phi, temp)
%timeit -n 10 -r 10 fourier_sum_cython_omp(times, freq, amp, phi, temp)


5 loops, best of 5: 887 ms per loop
10 loops, best of 10: 20.1 ms per loop
10 loops, best of 10: 19 ms per loop
10 loops, best of 10: 20 ms per loop
10 loops, best of 10: 19.3 ms per loop
10 loops, best of 10: 9.35 ms per loop

In [7]:
import matplotlib.pylab as plt

print amps_numpy-amps_cython

fig=plt.figure()
fig.set_size_inches(16,12)

plt.plot(times,amps_naive         ,'-', lw=2.0)
#plt.plot(times,amps_numpy      - 2,'-', lw=2.0)
plt.plot(times,amps_numba1     - 4,'-', lw=2.0)
#plt.plot(times,amps_numba2     - 6,'-', lw=2.0)
plt.plot(times,amps_cython     - 8,'-', lw=2.0)
#plt.plot(times,amps_cython_omp -10,'-', lw=2.0)

plt.show()


[ 0.  0.  0. ...,  0.  0.  0.]

In [ ]: