This notebook just contains rough notes on using parallel processing in an ipython notebook and with Scikit-learn.

What processor do we have available on this server?


In [1]:
!lscpu


Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Stepping:              2
CPU MHz:               1862.106
BogoMIPS:              3724.08
Virtualisation:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              18432K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23

For parallel processing with ipython there are additional dependencies. Looks like all we need is pyzmq:


In [5]:
!pip search pyzmq


pyzmq                     - Python bindings for 0MQ
  INSTALLED: 14.3.1 (latest)
gevent_zeromq             - gevent compatibility layer for pyzmq
pyzmq-static              - Obsolete fork of pyzmq
pyzmqrpc                  - A simple ZMQ RPC extension with JSON for message serialization
pyzmq-ctypes              - Python bindings for 0MQ (ctypes version).
pyzmq-wrapper             - Wrapper classes for pyzmq
pseud                     - Bi-directional RPC API on top of pyzmq
pyzmq-mdp                 - ZeroMQ MDP protocol in Python using pyzmq

This presentation may come in handy.

Ipython.parallel

Looking at the documentation from here on how to use Ipython's parallel tools. Not clear from the getting started where you're supposed to use the ipcluster command in an ipython notebook. Appears that in an ipython notebook it is as easy as going into the "clusters" tab on the home page. This notebook explains how to start the cluster through ssh, which might be a better idea than what we're currently doing, but would probably take too long to set up at this point.


In [2]:
profile = "default"

In [3]:
from IPython.parallel import *

In [4]:
client = Client(profile=profile)

In [5]:
client.ids


Out[5]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [7]:
print client[:].apply_sync(lambda : "Hello, World")


['Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World']

Using these cores

Looks like there's two ways to use these: Direct and Loadbalanced.

Direct

Using these directly is described on this page.

The basic idea behind the multiengine interface is that the capabilities of each engine are directly and explicitly exposed to the user. Thus, in the multiengine interface, each engine is given an id that is used to identify the engine and give it work to do. This interface is very intuitive and is designed with interactive usage in mind, and is the best place for new users of IPython to begin.

So it's simpler and easier to work with, sounds good.


In [8]:
dview = client[:]

In [17]:
%%timeit
serial_result = map(lambda x:x**10, range(1000000))


1 loops, best of 3: 2.41 s per loop

In [18]:
%%timeit
parallel_result = dview.map_sync(lambda x: x**10, range(1000000))


1 loops, best of 3: 652 ms per loop

Would have thought that would speed up a bit more.

Trying it out asynchronously:


In [19]:
parallel_result = dview.map_async(lambda x: x**10, range(1000000))

Now the parallel_result will be an AsyncResult object:


In [20]:
print parallel_result


<AsyncMapResult: <lambda>>

We can get the actual result, if it's ready, using the .get method:


In [22]:
print type(parallel_result.get())


<type 'list'>

Could probably hack together most of what I'll want to do with just that dview.map_async method.

Ipython parallel magics

Doing the above with the ipython parallel magics.

Don't really understand the point of executing the same thing on many cores. What I want to do is execute different things, surely?


In [25]:
%%px --noblock
x = "something"


Out[25]:
<AsyncResult: finished>

In [26]:
%pxresult

In [28]:
%%px
print x


[stdout:0] something
[stdout:1] something
[stdout:2] something
[stdout:3] something
[stdout:4] something
[stdout:5] something
[stdout:6] something
[stdout:7] something
[stdout:8] something
[stdout:9] something
[stdout:10] something
[stdout:11] something
[stdout:12] something
[stdout:13] something
[stdout:14] something
[stdout:15] something
[stdout:16] something
[stdout:17] something
[stdout:18] something
[stdout:19] something

Would be useful to be able to execute a non-blocking job on a single core and be able to just leave it running. Then later, come back to it and get the results.


In [29]:
%pxconfig --noblock --targets 0

In [30]:
%px print "something"


Out[30]:
<AsyncResult: execute>

In [31]:
%pxresult


something

In [45]:
%autopx


%autopx enabled

In [46]:
a = "Can execute arbitrary code on this core."

In [47]:
%autopx


%autopx disabled

In [53]:
a = dview.pull('a',targets=0)

In [54]:
a.get()


Out[54]:
'Can execute arbitrary code on this core.'

Actually, executing the same thing on different cores might not be such a bad idea. I could send different folds of the data to different cores and then execute the same fitting and test commands on all of them. Then, I could just pull in the results from all the cores.