TCTC stands for Temporal Communities by Trajectory Clustering. It is an algorithm designed to find temporal communities on time series data.
The kind of data needed for TCTC are:
Most community detection requires to first create an "edge inference" step where the edges of the different nodes are first calculated.
TCTC first finds clusters of trajectories in the time series without inferring edges. A trajectory is a time series moving through some space. Trajectory clustering tries to group together nodes that have similar paths through a space.
The hyperparameters of TCTC dictate what type of trajectory is found in the data. There are four hyperparameters:
This example shows only how TCTC is run and how the different hyperparameters effect the community detection. These hyperparameters can be trained (saved for another example).
TCTC is outlined in more detail in this article
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from teneto.communitydetection import tctc
import pandas as pd
In [2]:
data = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 1, 2, 1],
[0, 0, 0, 0, 1, 1, 1, 0, 2, 2, 2, 2, 1],
[1, 0, 1, 1, 1, 1, 1, 1, 2, 2, 1, 0, 0], [-1, 0, 1, 1, 0, -1, 0, -1, 0, 2, 1, 0, -1]], dtype=float)
data = data.transpose()
np.random.seed(2019)
data += np.random.uniform(-0.2, 0.2, data.shape)
In [3]:
# Lets have a look at the data
fig, ax = plt.subplots(1)
p = ax.plot(data)
ax.legend(p, [0,1,2,3])
ax.set_xlabel('time')
ax.set_ylabel('amplitude')
print(data.shape)
There are two different outputs that TCTC can produce. TCTC allows for multilabel communities (i.e. the same node can belong to multiple communities). The output of TCTC can either be:
The default output is option one.
So let us run TCTC on the data we have above.
In [4]:
parameters = {
'epsilon': 0.5,
'tau': 3,
'sigma': 2,
'kappa': 0
}
tctc_array = tctc(data, **parameters)
print(tctc_array.shape)
For now ignore the values in the "parameters" dictionary, we will go through that later.
In order to get the dataframe output, just add output='df'.
In [5]:
parameters = {
'epsilon': 0.5,
'tau': 3,
'sigma': 2,
'kappa': 0
}
tctc_df = tctc(data, **parameters, output='df')
print(tctc_df.head())
Here we can see when the different communities start, end, the size, and the length.
Below we define a function which plots each community on the original data.
In [6]:
def community_plot(df, data):
nrows = int(np.ceil((len(df)+1)/2))
fig, ax = plt.subplots(nrows, 2, sharex=True, sharey=True, figsize=(8, 2+nrows))
ax = ax.flatten()
p = ax[0].plot(data)
ax[0].set_xlabel('time')
ax[0].set_ylabel('amplitude')
ax[0].set_title('Original data')
for i, row in enumerate(df.iterrows()):
ax[i+1].plot(data, alpha=0.15, color='gray')
ax[i+1].plot(np.arange(row[1]['start'],row[1]['end']),data[row[1]['start']:row[1]['end'], row[1]['community']],color=plt.cm.Set2.colors[i])
ax[i+1].set_title('Community: ' + str(i))
plt.tight_layout()
return fig, ax
fig, ax = community_plot(tctc_df, data)
The multiple community labels can be seed in 0 and 2 above. Where 2 contains three nodes and community 0 contains 2 nodes.
In [7]:
parameters = {
'epsilon': 1.5,
'tau': 3,
'sigma': 2,
'kappa': 0
}
tctc_df_largeep = tctc(data, **parameters, output='df')
fig, ax = community_plot(tctc_df_largeep, data)
In [8]:
parameters = {
'epsilon': 0.5,
'tau': 2,
'sigma': 2,
'kappa': 0
}
tctc_df_shorttau = tctc(data, **parameters, output='df')
fig, ax = community_plot(tctc_df_shorttau, data)
In [9]:
parameters = {
'epsilon': 0.5,
'tau': 5,
'sigma': 2,
'kappa': 0
}
tctc_df_longtau = tctc(data, **parameters, output='df')
fig, ax = community_plot(tctc_df_longtau, data)
In [10]:
parameters = {
'epsilon': 0.5,
'tau': 3,
'sigma': 3,
'kappa': 0
}
tctc_df_longsigma = tctc(data, **parameters, output='df')
fig, ax = community_plot(tctc_df_longsigma, data)
If we make $\kappa$ larger, it allows for that many number of "noisey" time-points to exist to see if the trajectory continues.
In the data we have been looking at, node 0 and 1 are close to each other except for time-point 7 and 10. If we let $\kappa$ be 1, if will ignore these time-points and allow the trajectory to continue.
In [11]:
parameters = {
'epsilon': 0.5,
'tau': 3,
'sigma': 2,
'kappa': 1
}
tctc_df_withkappa = tctc(data, **parameters, output='df')
fig, ax = community_plot(tctc_df_withkappa, data)