Python for Scientific Computing

Quentin CAUDRON

Ecology and Evolutionary Biology

qcaudron@princeton.edu

@QuentinCAUDRON

Princeton University Python Community

princetonpy.com

@PrincetonPy

  • Discussion Forums
  • Events Calendar
  • All past events
  • All code and slides

Today's Workshop

Slides

  • Slides are available on princetonpy.com/course
  • All code is provided as static or dynamic Notebooks
  • Notes and slides are designed to be read later, so may be wordy at times
  • ( by the way ) they're entirely written in Python

Session

  • Interrupt, shout, scream, hurl something if you need

Aims :

  • Provide a very quick refresher / introduction to Python
  • Demonstrate Python's core scientific stack

Helping me out...

  • Paul Gauthier, postdoc, Geosciences
  • Matthew Cahn, Linux sysadmin, MolBio

Thanks !

A note on Python 2 and Python 3

Python 2 is still predominantly used in most academic circles.

Python 3 came out in 2008, but isn't completely backwards-compatible, so it's been slow to adopt.

There are very few differences for our purposes. All code presented here is compatible with either version.

Not everything said today will interest everyone. Not all aspects of "scientific computing" interest everybody.

If you're interested in something specific, please ask.

What is Python ?

  • general-purpose
  • object-oriented
  • dynamically typed
  • interpreted

It has a thriving community of developers, especially in science.


The Zen of Python

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Readability counts.

Context

C, C++, and Fortran

These languages are very fast, and great for heavy computations. However, they're slow and painful to write - there's no interactivity, the syntax gets complicated, and have manual memory management.

R

A tool for advanced statistics, but the language is exactly that : a tool aimed at stats. It's not very good for general-purpose coding. I have a strange aversion to it. Hey, at least it's free and open-source.

Matlab and Octave

Matlab has a great development environment, and a huge number of optimised, implemented toolsets. It's very expensive though. Octave is a great free clone, but it's not as pleasant to use.

So, Python ?

Huge range of scientific tools - nonlinear function fitting, MCMC, spectral analysis, ODEs and PDEs, signal and image processing, great data science tools. Vast community, active development, and very high quality due to the way the language is developed and the way we code it. It's batteries-included. Downsides are that the IDE isn't as shiny as Matlab, but I'm well over it - IPython Notebooks ( we'll see later ) are awesome.

The Scientific Python Stack

Python's standard library is huge. Still, as scientists, we require some fairly specific things that pure programmers might not immediately need : reasonable vector notation and manipulation, matrix and linear algebra, optimisation, interpolation, random numbers and statistical functions, plotting, etc.. The "standard Python stack" puts together a few modules and extensions to Python to give us these.

  • Numpy : arrays, matrices and their operations, random numbers, ...
  • Scipy : linear algebra, symbolic operations, signal tools, optimisation, ...
  • Matplotlib : plotting
  • Pandas : data analysis and manipulation

Also very awesome, not covered today :

  • IPython : interactivity, Notebooks ( like this one ), ...
  • Sympy : symbolic mathematics

Then, you may need more specific tools, like a good MCMC sampler ( PyMC ), or constrained, nonlinear function fitting ( lmfit ), or machine learning algorithms ( scikit-learn ), image processing ( scikit-image ), etc.. The list goes on !

In this workshop, we'll get set up with the basic Python stack and explore how they work. Then we'll demo some other packages for fun.

I find that Anaconda is a great distribution. It comes will most of the packages you'll need, and a great command-line package manager to help keep them up to date and install others. If you want to grab the faster distro ( free for academics ), head to store.continuum.io/cshop/academicanaconda and register with an @blah.edu address to grab an academic license. If you can't be bothered with that, it's at continuum.com/downloads.

  1. Linux : just run the .sh
  2. OSX : just run the .dmg. Might need to select "install for me only" if you don't have admin rights.
  3. Windows : just run the .exe

If you don't want Anaconda, you can install Python on its own, and then add packages and modules as you need them. I won't be covering this in the interest of time. If you're under Linux, use your package manager; if you're under OSX, then HomeBrew has what you need. If you're running Windows and are interested in installing everything yourself, cry a little and then head to python.org/getit.

A note on what we're installing : Anaconda comes with Python 2.7. This is by far the most widespread version of Python. It goes by Python 2 for short. There has been, for years, a Python 3, but it isn't backwards compatible, and whilst many of the main scientific packages are working to fix that, then the vast majority of scientists use Python 2 ( actually, just about everyone : the Python dudes themselves recommend starting with 2.7, due to compatibility; this is changing however, as more packages move towards Py3 compatibility ).

Front-End / Interface

Here, we have several options. With Python now installed, you just need an environment to write it in. You could use a standard text editor. I recommend Atom, for all platforms.

Then, you could go with something more advanced, like an IDE ( integrated development environment ). An IDE is a one-stop shop to write and run your code. For Python, I like Spyder. It's available for all three of the above OSs. If you've installed Anaconda, you already have Spyder. Linux and OSX users, call spyder from the command line. Windows users can call the Anaconda Launcher, and you'll have it there.

Finally, there's my favourite way to write Python for development : the IPython Notebook. If you installed Anaconda, you already have IPython. Otherwise, go get it, you won't regret it. The IPython Notebook concept will be familiar to you if you've used Mathematica, and some aspects of Matlab ( though in Matlab, it's not done so well ). You have cells in which you write code, and you can execute cells independently. With a quick command-line hack, you can even get plots inline, such that all plots show up under the relevant cells. To call IPython Notebooks, OSX and Linux users can just call ipython notebook from the command line, and Windows users with Anaconda have an IPython Notebook shortcut in their Start Menu ( in theory ). For inline plots and sexiness all around, I prefer calling ipython notebook --script. Here, --script tells IPython to also save a .py as well as the .ipynb extension, so you can just run your code from the command line or on a remote computer if you want to. Linux and OSX users can write this as an alias if they want : drop the line

alias pynb='ipython notebook --script

in ~/.bashrc, and Windows users can edit the shortcut to their IPython Notebook to get the same result.

Take the time to consider your workflow and select an option.

  • IPython Notebook for code / algorithm development
  • Spyder if you prefer an integrated development environment a la R-Studio or Matlab
  • IPython / Python terminal for old-school play

Summary

  1. Lightning refresher of Python syntax
  2. Numpy
  3. Pandas
  4. Matplotlib
  5. Scipy
  6. Demos

We'll be doing a few exercises throughout.

Grab these slides and the data from princetonpy.com/course to follow along.