All text below adapted from: https://rigtorp.se/2011/01/01/rolling-statistics-numpy.html
To compute rolling or moving statistics such as mean and standard deviation, the simplest way compute that is to use a for loop:
In [5]:
import numpy as np
def rolling_apply(fun, a, w):
r = np.empty(a.shape)
r.fill(np.nan)
for i in range(w - 1, a.shape[0]):
r[i] = fun(a[(i-w+1):i+1])
return r
A loop in Python are however very slow compared to a loop in C code. Fortunately there is a trick to make NumPy perform this looping internally in C code. This is achieved by adding an extra dimension with the same size as the window and an appropriate stride:
In [6]:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
Using this function it is easy to calculate for example a rolling mean without looping in Python:
In [7]:
x=np.arange(10).reshape((2,5))
rolling_window(x, 3)
Out[7]:
In [8]:
np.mean(rolling_window(x, 3), -1)
Out[8]:
More info on strides here: https://scipy-cookbook.readthedocs.io/items/SegmentAxis.html
and here: https://scipy-cookbook.readthedocs.io/items/GameOfLifeStrides.html
Code and text below taken from: https://stackoverflow.com/questions/48215914/vectorized-2-d-moving-window-in-numpy-including-edges
Question:
Is it possible to do a vectorized 2D moving window (rolling window) which includes so-called edge effects? What would be the most efficient way to do this?
That is, I would like to slide the center of a moving window across my grid, such that the center can move over each cell in the grid. When moving along the margins of the grid, this operation would return only the portion of the window that overlaps the grid. Where the window is entirely within the grid, the full window is returned.
Answer:
You could define a function that yields a generator and use that. The window would be the floor of the shape you want divided by 2 and the trick would be just indexing the array along that window as you move along the rows and columns.
In [9]:
def window(arr, shape=(3, 3)):
# Find row and column window sizes
r_win = np.floor(shape[0] / 2).astype(int)
c_win = np.floor(shape[1] / 2).astype(int)
x, y = arr.shape
for i in range(x):
xmin = max(0, i - r_win)
xmax = min(x, i + r_win + 1)
for j in range(y):
ymin = max(0, j - c_win)
ymax = min(y, j + c_win + 1)
yield arr[xmin:xmax, ymin:ymax]
In [10]:
arr = np.array([[1,2,3,4],
[2,3,4,5],
[3,4,5,6],
[4,5,6,7]])
gen = window(arr)
print(arr.shape)
# Print each step of the generator (runs for all elementrs of array)
i=0
while i < (arr.shape[0]*arr.shape[1]):
print(next(gen))
print("")
i+=1
It's not vectorized, but I'm not sure there is an existing vectorized function that returns different sized arrays. As @PaulPanzer points out you could pad your array to the size you need and use a np.lib.stride_tricks.as_strided to generate a view of the slices. Something like so:
In [11]:
def rolling_window(a, shape):
s = (a.shape[0] - shape[0] + 1,) + (a.shape[1] - shape[1] + 1,) + shape
strides = a.strides + a.strides
return np.lib.stride_tricks.as_strided(a, shape=s, strides=strides)
def window2(arr, shape=(3, 3)):
r_extra = np.floor(shape[0] / 2).astype(int)
c_extra = np.floor(shape[1] / 2).astype(int)
out = np.empty((arr.shape[0] + 2 * r_extra, arr.shape[1] + 2 * c_extra))
out[:] = np.nan
out[r_extra:-r_extra, c_extra:-c_extra] = arr
view = rolling_window(out, shape)
return view
window2(arr, (3,3))
Out[11]:
This version pads the edges with np.nan to avoid confusion with any other values in your array. It is about 3x faster with the given array than the window function, but I am not sure how having padded output will impact anything you want to do downstream.
Plot the moving window example above
Possible to map the generator function to run in parallel?
In [ ]: