A Review of EuroSciPy2015

0. IPython Notebooks

1. Tutorial 1: An Introduction to Python (Joris Vankerschaver)

2. Tutorial 2: Never get in a data battle without Numpy arrays (Valerio Maggio)

3. numexpr

4. Interesting talks

0. Jupyter aka IPython Notebook

  • Interactive programming interface

  • Simultaneous development & documentation

  • All tutorials and most lectures at EuroSciPy2015 were given using IPython Notebooks.

  • We have adopted the Notebook for this presentation as a practical exercise.


In [ ]:
pip install ipython-notebook

Create a local host for the notebook in the directory of interest by running the command:


In [ ]:
ipython notebook

Images, data files, etc. should be stored in a subdirectory of the Notebook host.

1. Tutorial #1: An Introduction to Python (Joris Vankerschaver)

IPython Notebook: https://github.com/jvkersch/python-tutorial-files

An overview of basic Python syntax and data structures, including:

  • lists, tuples, dictionaries
  • mutable vs immutable objects
  • set, enumerate
  • read from / write to files
  • namespaces

In [ ]:
# Mutable objects can be changed in place (e.g. lists),
# Immutable objects can NOT (e.g. ints, strings, tuples)
tup = ('a','0','@')
tup

In [ ]:
tup[0] = 8

In [ ]:
record = {}
record['first'] = 'Alan'
record['last'] = 'Turing'
record

In [ ]:
record.update({'workplace':'Bletchley Park'})
record

In [ ]:
# Comma after print statement removes implicit \n, prints to same line
for x in range(0,4):
    print x,

In [ ]:
# xrange(#) more efficient than range(#), because:
# range() creates the whole sequence of numbers,
# while xrange() creates them as needed!

%timeit range(1000000)

In [ ]:
%timeit xrange(1000000)

In [ ]:
# Namespaces are evil
pi = 3.14
from numpy import pi
pi
Accessing Python's Source Code

cf. http://stackoverflow.com/questions/8608587/finding-the-source-code-for-built-in-python-functions

We can get help for a built-in Python function, such as range, with a single question mark:


In [ ]:
range?

...and we can read the source code of built-in functions by downloading the source from the Python.org Mercurial repositories): https://hg.python.org/


In [ ]:
# import inspect
# inspect.getsourcefile(range) # doesn't work for built-in functions

In [ ]:
# Python's built-in "Counter" class defines the control flow of iterators,
# used in functions such as "for i in range(0,10) ..."

class Counter(object):
    def __init__(self, low, high):
        self.current = low
        self.high = high

    def __iter__(self):
        'Returns itself as an iterator object'
        return self

    def __next__(self):
        'Returns the next value till current is lower than high'
        if self.current > self.high:
            raise StopIteration
        else:
            self.current += 1
            return self.current - 1

We can view the source code for a particular function, such as range, in the Python.org Mercurial Repository: https://hg.python.org/cpython/file/c6880edaf6f3/Objects/rangeobject.c


In [ ]:
for i in range(0,10):
    print i*i

In [ ]:
# List comprehension (with filter)
[a*2 for a in range(0,10) if a>3]

Sneak-peek @ Advanced Topics

  • NumPy & SciPy
  • Pandas
  • Cython
    • Translate Python scripts into C code, and compile to machine code.

In [ ]:
# The Zen of Python
import this

2. Tutorial 2: Never get in a data battle without Numpy arrays (Valerio Maggio)

IPython Notebook: https://github.com/leriomaggio/numpy_euroscipy2015

Arrays and Data Types


In [ ]:
# We can infer the data type of an array structure (but not of int, list, etc.)
import numpy as np

a = np.array([1, 2, 3], dtype=np.int16)
a.dtype

In [ ]:
arev = a[::-1]
arev

In [ ]:
# Typecast variables into float, complex numbers,
b = np.float64(64)
c = np.complex(b)
print "R(c) = ", c.real
print "I(c) = ", c.imag

In [ ]:
# Specify type of array elements
x = np.ones(4, 'int8')
x

In [ ]:
# Wrap-around
x[0] = 256
x

In [ ]:
# Define a new record and create an array of corresponding data types
rt = np.dtype([('artist', np.str_, 40),('title', np.str_, 40), ('year', np.int16)])
music = np.array([('John Cage','4\'33\'\'',1952)], dtype=rt)
music

Matrix Tricks


In [ ]:
# Flatten a matrix into a 1-D array
r = np.array([[1, 2, 3], [4, 5, 6]])
r.ravel()

In [ ]:
# Save a .csv file using arbitrary precision
M = np.random.rand(3,3)
np.savetxt("data/random-matrix2.csv", M, fmt='%.5f')

In [ ]:
# Create a matrix using list comprehension
coolmx = np.array([[10*j+i for i in range(6)] for j in range(6)])
coolmx

In [ ]:
# Machine Learning in Python is a oneliner!
centroids, variance = vq.kmeans(data, 3)
#... after preparing the data and importing the scikit-learn package

A look into the future

numpy-100

"...a quick reference for new and old users and to provide also a set of exercices for those who teach."

https://github.com/rougier/numpy-100

3. numexpr

@ https://github.com/pydata/numexpr (previously @ https://code.google.com/p/numexpr/)

  • JIT (Just-in-time) compilation for significant speed-up of numerial calculations

    • numexpr evaluates multiple-operator array expressions many times faster than NumPy can. It accepts the expression as a string, analyzes it, rewrites it more efficiently, and compiles it on the fly into code for its internal virtual machine (VM). Due to its integrated just-in-time (JIT) compiler, it does not require a compiler at runtime.
  • Multithreading to make use of multiple CPU cores

    • numexpr implements support for multi-threading computations straight into its internal virtual machine, written in C. This allows to bypass the GIL in Python, and allows near-optimal parallel performance in your vector expressions, most specially on CPU-bounded operations (memory-bounded ones were already the strong point of numexpr).
  • Can be used to evaluate expressions in NumPy and Pandas

cf. https://github.com/leriomaggio/numpy_euroscipy2015/blob/master/06_Numexpr.ipynb

The Speed of NumExpr

  • The speed advantage of NumExpr is due to using fewer temporary variables to store data. Instead of using temp variables, the results are stored successively in the output argument.
  • For this reason, NumExpr outperforms standard Python when array sizes are larger than the processor cache. For small computations, it is actually slower...
    • ...so use only when necessary!
  • More info about how it is done: https://github.com/pydata/numexpr#how-numexpr-can-achieve-such-a-high-performance

In [ ]:
import numexpr as ne
import numpy as np

a = np.arange(1e8)
b = np.arange(1e8)

print "NumPy   >> " 
%timeit a**2 + b**2 + 2*a*b

In [ ]:
print "NumExpr >> "
%timeit ne.evaluate('a**2 + b**2 + 2*a*b')

4. Interesting talks

ReScience (Nicolas Rougier)

https://www.euroscipy.org/2015/schedule/presentation/17/

@ https://github.com/ReScience/ReScience/wiki

  • "...an article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data that produced the result." (Buckheit & Donoho, 1995)

  • Papers are submitted to GitHub and can be reproduced by making a Pull Request from the ReScience repository.

  • The problem of reproducible science has long been a problem... https://github.com/ReScience/rescience.github.io/blob/master/00-about.md ...but now we have a potential solution.

Massively parallel implementation in Python of a pseudo-spectral DNS code for turbulent flows (Mikael Mortensen)

https://www.euroscipy.org/2015/schedule/presentation/6/

https://github.com/hplgit/SpectralDNS-paper

  • A Navier-Stokes equations solver using only Python's numpy & mpi4py that can perform nearly as fast as a C++ implementation on multiple cores.
  • Adding some Cython further optimises performance, getting even closer to C++ (see red & blue @ graph below)

Dashboarding with the IPython notebook for online introspection into long-running experiments (Thomas Greg Corcoran)

https://www.euroscipy.org/2015/schedule/presentation/19/

Query data while an experiment is running.

HoloViews: Building complex visualizations easily for reproducible science (Jean-Luc Stevens, Philipp Rudiger)

https://www.euroscipy.org/2015/schedule/presentation/18/

http://ioam.github.io/holoviews/