A Series
object is a collection of 1D arrays, all of which share a common index. Under the hood, it wraps an n-dimensional array, and supports either distributed operations via Spark or local operations via numpy
, with an identical API. The final dimension of the array indexes the series, and the initial dimensions are arbitrary.
The most common series
data is time series data, in which case the index is time and each record is a different signal, like a channel or pixel.
Here, we show examples of loading and manipulating series
data.
In [1]:
%matplotlib inline
In [22]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
sns.set_context('notebook')
from showit import image
In [3]:
import thunder as td
In [4]:
series = td.series.fromexample('fish')
In [5]:
examples = series.filter(lambda x: x.std() > 4).sample(n=50).toarray()
plt.plot(examples.T);
Note the variation in raw intensity levels.
In [6]:
examples = series.filter(lambda x: x.std() > 4).center().sample(n=50).toarray()
plt.plot(examples.T);
In [7]:
examples = series.squelch(150).zscore().filter(lambda x: x.std() > 0.1).toarray()
plt.plot(examples.T);
Related methods include standardize
, detrend
, and normalize
.
In [8]:
series.index
Out[8]:
For example, to select a range:
In [9]:
series.between(0,8).shape
Out[9]:
Note that the index changes to reflect the subselected range:
In [10]:
series.between(0,8).index
Out[10]:
In this case we could have also just use bracket notation
In [11]:
series[:, :, :, 0:8].shape
Out[11]:
But explicitly referencing the index can be useful when the index encodes information (see below).
We can also select based on an arbitrary criterion function:
In [12]:
series.select(lambda x: x < 5).index
Out[12]:
The default index generated for Series
objects will be the range of integers starting at zero and ending one before the length of the series data, as shown in these examples. However, other data types can also be used as the index for a series object, such as a sequence of strings, providing text labels for each element in the series array, or a tuple with indices at different levels.
In [13]:
plt.plot(series.normalize().max());
plt.plot(series.normalize().mean());
plt.plot(series.normalize().min());
To compute statistics within records, we can make use of the map
method, which executes an artbirary function on each record
In [14]:
means = series.map(lambda x: x.mean()).flatten().toarray()
stds = series.map(lambda x: x.std()).flatten().toarray()
In [15]:
plt.plot(means, stds, '.');
We can also correlate each record with a signal of interest. As expected, for a random signal, the correlation should be near 0.
In [16]:
from numpy import random
signal = random.randn(20)
correlations = series.filter(lambda x: x.std() > 0).correlate(signal).flatten().toarray()
In [17]:
plt.hist(correlations);
We can use the fourier
method to compute the statistics of a Fourier transform
In [18]:
fourier = series.filter(lambda x: x.std() > 1).flatten().fourier(freq=1)
fourier.index
Out[18]:
And plot phase
as a function of coherence
In [19]:
plt.plot(fourier.select('coherence').toarray(), fourier.select('phase').toarray(), '.')
Out[19]:
We can also detrend
over time, which will remove the periodic structure
In [20]:
plt.plot(series.mean())
plt.plot(series.detrend('nonlinear', order=5).mean());
And we can compute a covariance matrix.
In [21]:
image(series.flatten().cov());
There are more methods on series
data, and algorithm packages that take series
data as input. See the other tutorials and documentation for more!