Intendation is significant in Python, preventing the worst mess. The remaining stylistic questions, like naming conventions, used in the standard library are summarized in the Python Enhancement Proposal 8. Most Python programmers adhere to this style guide (Google has their own which is similar and also popular), making it easy to read someone elses code.
⚠ Some editors and tools like autopep8 help check for violations. Other tools, like yapf, go further and reformat your code, e.g.
def forces(t,
X, N):
"""Calculate forces from neighbouring cells"""
for i, x in enumerate(X):
r = X[N[i]] - x
norm_r = np.minimum (np.linalg.norm(r,
axis=1), r_max)
norm_F = 2*(r_min - norm_r)*(norm_r - r_max) - (norm_r - r_max)**2
F[i] = np.sum(r*(
norm_F/
norm_r)[:, None], axis=0)
return F.ravel()
becomes:
def forces(t, X, N):
"""Calculate forces from neighbouring cells"""
for i, x in enumerate(X):
r = X[N[i]] - x
norm_r = np.minimum(np.linalg.norm(r, axis=1), r_max)
norm_F = 2 * (r_min - norm_r) * (norm_r - r_max) - (norm_r - r_max)**2
F[i] = np.sum(r * (norm_F / norm_r)[:, None], axis=0)
return F.ravel()
In [1]:
import this
# File module/example.py
x = 'Example data'
def example_function():
pass
if __name__ == '__main__':
whatever()
# File in .
from module import example
example.x
if ((j != k) and (d_type[d_lattice->d_cell_id[j]] == mesenchyme)
and (d_type[d_lattice->d_cell_id[k]] == mesenchyme)
and (dist < r_link)
and (fabs(r.w/(d_X[d_lattice->d_cell_id[j]].w +
d_X[d_lattice->d_cell_id[k]].w)) > 0.2)) {
d_link[i].a = d_lattice->d_cell_id[j];
d_link[i].b = d_lattice->d_cell_id[k];
}
to
auto both_mesenchyme = (d_type[d_lattice->d_cell_id[j]] == mesenchyme)
and (d_type[d_lattice->d_cell_id[k]] == mesenchyme);
auto along_w = fabs(r.w/(d_X[d_lattice->d_cell_id[j]].w +
d_X[d_lattice->d_cell_id[k]].w)) > 0.2;
if (both_mesenchyme and (dist < r_link) and along_w) {
d_link[i].a = d_lattice->d_cell_id[j];
d_link[i].b = d_lattice->d_cell_id[k];
}
docstrings: def documented_function():
"""This is a docstring"""
pass
docopt:"""A Docopt Example.
Usage:
docoptest.py [flag] [--parameter <x>]
docoptest.py (-h | --help)
Options:
-h --help Show this screen.
--parameter <x> Pass parameter.
"""
if __name__ == '__main__':
from docopt import docopt
args = docopt(__doc__)
print(args)
Results in
$ python docoptest.py wrong usage
Usage:
docoptest.py [flag] [--parameter <x>]
docoptest.py (-h | --help)
$ python docoptest.py -h
A Docopt Example.
Usage:
docoptest.py [flag] [--parameter <x>]
docoptest.py (-h | --help)
Options:
-h --help Show this screen.
--parameter <x> Pass parameter.
$ python docoptest.py flag
{'--help': False,
'--parameter': None,
'flag': True}
In [5]:
mass = -1
assert mass > 0, 'Mass cannot be negative!'
Make sure your code does what it is supposed to do by testing it with simple examples where you know what to expect
Save those tests (e.g. in if __name__ == __main__ of each module) to keep your code correct while changing parts
import unittest
def square(x):
return x*x
class TestSquare(unittest.TestCase):
def test_square(self):
self.assertEqual(square(2), 4)
if __name__ == '__main__':
unittest.main()
hypothesis:from hypothesis import given
from hypothesis.strategies import text, floats
@given(text(min_size=10), text(min_size=10), text(min_size=10))
def test_triangle_inequality(self, a, b, c):
self.assertTrue(zipstance(a, c) <= zipstance(a, b) + zipstance(b, c))
doctest:def square(x):
"""Return the square of x.
>>> square(2)
0
"""
return x*x
if __name__ == '__main__':
import doctest
doctest.testmod()
Will give
$ python doctestest.py
**********************************************************************
File "doctestest.py", line 4, in __main__.square
Failed example:
square(2)
Expected:
0
Got:
4
**********************************************************************
1 items had failures:
1 of 1 in __main__.square
***Test Failed*** 1 failures.
< with > to find out whether your tests cover everything.
In [46]:
%timeit [i**2 for i in range(100000)]
In [47]:
%timeit np.arange(100000)**2
We tried to speed-up the simulation of an N-body problem describing cells interacting in pairs via the spherical potential $U(r) = -(r - r_{min})(r - r_{max})^2$, where $r = |\vec x_j - \vec x_i|$. The resulting forces can be calculated as
def forces(t, X, N):
"""Calculate forces from neighbouring cells"""
for i, x in enumerate(X):
r = X[N[i]] - x
norm_r = np.minimum(np.linalg.norm(r, axis=1), r_max)
norm_F = 2*(r_min - norm_r)*(norm_r - r_max) - (norm_r - r_max)**2
F[i] = np.sum(r*(norm_F/norm_r)[:, None], axis=0)
return F.ravel()
where N is a matrix giving the $k$ nearest neighbours.
In [55]:
import pstats
p = pstats.Stats('nbodystats')
p.strip_dirs().sort_stats('cumulative').print_stats(10);
While the initial function was already using numpy and vectors it still involves a for-loop that can be vectorized ... well, actually tensorized:
def forces(t, X, N):
r = X[N] - np.tile(X, (k, 1, 1)).transpose(1, 0, 2)
norm_r = np.minimum(np.linalg.norm(r, axis=2), r_max)
norm_F = 2*(r_min - norm_r)*(norm_r - r_max) - (norm_r - r_max)**2
F = np.sum(r*(norm_F/norm_r)[:, None].transpose(0, 2, 1), axis=1)
return F.ravel()
Broadcasting gives just a "view" instead of a copy returned by np.tile:
def forces(t, X, N):
r = X[N] - X[:, None, :]
norm_r = np.minimum(np.linalg.norm(r, axis=2), r_max)
norm_F = 2*(r_min - norm_r)*(norm_r - r_max) - (norm_r - r_max)**2
F = np.sum(r*(norm_F/norm_r)[:, None].transpose(0, 2, 1), axis=1)
return F.ravel()
The standard inplace operators +=, -=, *=, and /= can speed up large numpy calculations a little, Pandas operations often have an inplace flag, e.g. df.reset_index(inplace=True). Sometimes allocating memory with np.empty or pd.DataFrame pays off. Make sure to use the fastest data type for each column of a pd.DataFrame.
Finally code can be compiled in various ways. We choose Theano because it requires little rewritting (and can run code on the GPU as well):
from theano import Tensor as T
from theano import function
X = T.matrix('X', dtype='floatX')
N = T.imatrix('N')
r = X[N] - X[:, None, :]
norm_r = T.minimum(r.norm(2, axis=2), 1)
norm_F = 2*(0.5 - norm_r)*(norm_r - 1) - (norm_r - 1)**2
F = T.sum(r*(norm_F/norm_r).dimshuffle(0, 1, 'x'), axis=1)
f = function([X, N], F.ravel(), allow_input_downcast=True)
def forces(t, X, N):
return f(X.reshape(-1, 3), self.N)
In [58]:
p = pstats.Stats('theanostats')
p.strip_dirs().sort_stats('cumulative').print_stats(10);
The file function_module.py belongs to Theano, there is likely not much to be gained. Still most time is spent calculating the derivatives, but allocating the DataFrame for the results could shave another 17% (35% when using a GPU) off in the best case. This would result in a ~50x (100x) faster code than the initial version.
This prototype allowed us to simulate a few hundred interacting cells in seconds. However, we were aiming for at least half a million cells and thus built a CUDA/C++ implementation after three months.