Using Dask with fMKS

fMKS is currently being developed with Dask support. Currently the generate_cahn_hilliard_data function generates data using Dask. This is an embarrisegly parallel workflow as typically for MKS many Cahn-Hilliard simulations are required to calibrate the model. The following is tested using both the threaded and multiprocessing schedulers. Currently the author can not get the distributed scheduler working.


In [15]:
import numpy as np
import dask.array as da
from fmks.data.cahn_hilliard import generate_cahn_hilliard_data
import dask.threaded
import dask.multiprocessing

The function time_ch calls generate_cahn_hilliard_data to generate the data. generate_cahn_hilliard_data returns the microstructure and response as a tuple. compute is called on the response field with certain number of workers and with a scheduler.


In [10]:
def time_ch(num_workers,
            get,
            shape=(48, 200, 200),
            chunks=(1, 200, 200),
            n_steps=100):
    generate_cahn_hilliard_data(shape,
                                chunks=chunks,
                                n_steps=n_steps)[1].compute(num_workers=num_workers,
                                                            get=get)

Threaded Timings


In [13]:
for n_proc in (8, 4, 2, 1):
    print(n_proc, "thread(s)")
    %timeit time_ch(n_proc, dask.threaded.get)


8 thread(s)
1 loop, best of 3: 7.15 s per loop
4 thread(s)
1 loop, best of 3: 9.87 s per loop
2 thread(s)
1 loop, best of 3: 17.9 s per loop
1 thread(s)
1 loop, best of 3: 33.6 s per loop

Multiprocessing Timings


In [16]:
for n_proc in (8, 4, 2, 1):
    print(n_proc, "process(es)")
    %timeit time_ch(n_proc, dask.multiprocessing.get)


8 process(es)
1 loop, best of 3: 6.41 s per loop
4 process(es)
1 loop, best of 3: 9.24 s per loop
2 process(es)
1 loop, best of 3: 17.6 s per loop
1 process(es)
1 loop, best of 3: 34.2 s per loop