Encouraging open, exploratory, collaborative and reproducible scientific computing

Brian Granger (ellisonbg)

Code and Data Interoperability Workshop

Sustainable Software for Chemistry and Materials

Virginia Tech, July 18-19, 2013

Let's style this IPython Notebook:


In [1]:
a = 10

In [2]:
print a


10

In [3]:
%load_ext load_style
%load_style talk.css


My background

Physics Professor

  • I teach Physics and do research with undergraduates
  • Cal Poly, San Luis Obispo
  • Background in AMO Physics: B-splines, Variational R-Matrix, low T quantum gases
  • Research in symbolic quantum mechanics and quantum computing
  • A user of scientific and technical computing tools and libraries

Open source hacker

  • I am a developer of tools and libraries for scientific and technical computing
  • Core developer of IPython (Interactive computing environment)
  • Creator of PyZMQ (Python bindings to ZeroMQ, high performance messaging library)
  • Core developer to SymPy (symbolic mathematics library for Python)

Software based science

Almost all science requires computation

Computation requires software

Idea 1: Software is one of the foundations of science

The nature of software and science

Idea 2: The attributes of science follows from the attributes of software

Expensive software = expensive science

Buggy software = buggy science

Fast software = fast science

Friendly software = friendly science

Fragmented software = fragmented science

What qualities/attributes do we want science to have?

Fast, cheap, open, transparent, accessible

Efficient, lightweight, flexible

Repeatable, testable, verifiable

Collaborative, social, societal

Error free, bias free

Accountable, recordable, teachable

Interoperable, friendly

Fun!

We need software tools that have these attributes!

The role of software extends far beyond simulation and data analysis

We need to consider the entire lifecycle of scientific research:

  • Individual exploration (MATLAB, Mathematica, Python,...)
  • Collaborative development (git, svn,...)
  • Production execution (C++, C, Fortran, MPI)
  • Building and deploying code (autotools, cmake,...)
  • Debugging (gdb, Python, valgrind)
  • Analysis and visualization (Perl, bash, Python, MATLAB,..)
  • Publication (LaTeX, Word)
  • Presentation (LaTeX, Powerpoint, Keynote)
  • Education (Powerpoint, chalk/white boards,...)

Traditional software tools provide little help in making the transitions between these phases. Each phase involves a new set of software tools that don't integrate with the rest. This is horribly painful for users. The result is an unnecessary and heavy cognitive load that reduces users ability to do science.

Software tools that are transforming scientific computing

Development tools

  • Git/GitHub
  • Travis CI

Performance tools

  • Cython
  • Numba

Tools I am working on

  • IPython
  • SymPy