IPython is a tool for interactive and exploratory computing. We have seen that IPython's kernel provides a mechanism for interactive remote computation, and we have extended this same mechanism for interactive remote parallel computation, simply by having multiple kernels.
At a high level, there are three basic components to parallel IPython:
These components live in the IPython.parallel
package and are installed with IPython.
This leads to a usage model where you can create as many engines as you want per compute node, and then control them all from your clients via a central 'controller' object that encapsulates the hub and schedulers:
The Controller is a collection of processes:
Together, these processes provide a single connection point for your clients and engines. Each Scheduler is a small GIL-less function in C provided by pyzmq (the Python load-balancing scheduler being an exception).
The Hub can be viewed as an über-logger, which monitors all communication between clients and engines, and can log to a database (e.g. SQLite or MongoDB) for later retrieval or resubmission. The Hub is not involved in execution in any way, and a slow Hub cannot slow down submission of tasks.
There is one primary object, the Client
, for connecting to a cluster.
For each execution model there is a corresponding View
,
and you determine how your work should be executed on the cluster by creating different views
or manipulating attributes of views.
The two basic views:
DirectView
class for explicitly running code on particular engine(s).LoadBalancedView
class for destination-agnostic scheduling.You can use as many views of each kind as you like, all at the same time.
The quickest way to get started is to visit the 'clusters' tab, and start some engines with the 'default' profile.
To follow along with this tutorial, you will need to start the IPython
controller and some IPython engines. The simplest way of doing this is
visit the 'clusters' tab,
and start some engines with the 'default' profile,
or to use the ipcluster
command:
$ ipcluster start -n 4
There isn't time to go into it here, but ipcluster can be used to start engines and the controller with various batch systems including:
More information on starting and configuring the IPython cluster in the IPython.parallel docs.
Once you have started the IPython controller and one or more engines, you are ready to use the engines to do something useful.
To make sure everything is working correctly, let's do a very simple demo:
In [1]:
from IPython import parallel
rc = parallel.Client()
rc.block = True
In [2]:
rc.ids
Out[2]:
Let's define a simple function
In [3]:
def mul(a,b):
return a*b
In [4]:
mul(5,6)
Out[4]:
In [5]:
dview = rc[:]
dview
Out[5]:
In [6]:
e0 = rc[0]
e0
Out[6]:
What does it look like to call this function remotely?
Just turn f(*args, **kwargs)
into view.apply(f, *args, **kwargs)
!
In [7]:
e0.apply(mul, 5, 6)
Out[7]:
And the same thing in parallel?
In [8]:
dview.apply(mul, 5, 6)
Out[8]:
Python has a builtin map for calling a function with a sequence of arguments
In [9]:
map(mul, range(1,10), range(2,11))
Out[9]:
So how do we do this in parallel?
In [10]:
dview.map(mul, range(1,10), range(2,11))
Out[10]:
We can also run code in strings with execute
In [11]:
dview.execute("import os")
dview.execute("a = os.getpid()")
Out[11]:
And treat the view as a dict lets you access the remote namespace:
In [12]:
dview['a']
Out[12]:
When you do async execution, the calls return an AsyncResult object immediately
In [13]:
def wait_here(t):
import time, os
time.sleep(t)
return os.getpid()
In [14]:
ar = dview.apply_async(wait_here, 2)
In [15]:
ar.get()
Out[15]:
In [ ]: