This notebook presents the code for UnADevs Method: Unbounded Unsupervised Activity Discovery using the Temporal Behaviour Assumption.
The method is based on online clustering, and also includes the temporal information of the occuring activities. Therefore, it is able to discover clusters of repating/periodic activities as the occur and additionally keeps track of the time interval of the discovered cluster. This way, the system may prompt the user about the discovered activity with the appropriate time interval and ask for feedback.
The detailed description of the algorithm is in the paper "Unsupervised Online Activity Discovery Using Temporal Behaviour Assumption". [1]
More information about the projet: http://www.sussex.ac.uk/strc/research/wearable/research-ll
The following code is an example clustering and visualization of Subject 1 from the JSI-ADL dataset [2][3]. It corresponds to the visualization shown with Figure 2 in the ISWC paper [1].
In [5]:
import time
import numpy as np
import Online_temporal_clustering_JSI_release as OTC
import Utilities_JSI_release as Util
from sklearn.preprocessing import scale
###########################################
# parameters
np.random.seed(2)
tolerance = 22
activePool = 3
minDur = 16
OTC.deltaT = tolerance #bigger number bigger clusters, tends to combine small clusters with big ones
OTC.memoryDelta = tolerance +1 #constant
OTC.num_clusterss = activePool #bigger number scattered clusters, lots of empty space... if you increase this, also increase the memory parameters
OTC.threshold_cluster_size = minDur
In [6]:
# Load the data (features already extracted)
# data Format: [timestamp, f1, f2, f3, ... fn, label]
data_features = np.loadtxt('data_JSI/data_features_1.csv', delimiter=';')
features_list = [1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 23, 24]
data_features = sorted(data_features, key=lambda a_entry: a_entry[0])
data_array = np.array(data_features)
# Select features
data_array[:, features_list] = scale(data_array[:, features_list])
dataAll = np.column_stack((data_array[:, [3, 5]], data_array[:, 0], data_array[:, -1]))
points = data_array[:, features_list]
timestamps = dataAll[:, [2]]
n = len(points)
start = time.time()
In [7]:
# Perform the clustering
c = OTC.OnlineCluster(OTC.num_clusterss)
for ind1, point in enumerate(points):
c.cluster(point, timestamps[ind1])
clusters = c.trimclusters()
n_clusters_ = len(clusters)
print "Clustered %d points in %.2f seconds and found %d clusters." % (n, time.time() - start, n_clusters_)
In [8]:
# Validation and Visualization of the clusters
clusters = Util.removeContained(clusters)
data_array2 = Util.remove_small_activities(data_array, dataAll[:, [3]], minDur)
dataAll2 = np.column_stack((data_array2[:, [3, 5]], data_array2[:, 0], data_array2[:, -1]))
activity_means = Util.get_activity_means(np.column_stack((data_array2[:, features_list], data_array2[:, [0, -1]])))
activities_set = list(set(dataAll2[:, [3]].T[0]))
dict_activity_index_colour = dict(zip(activities_set, np.arange(len(activities_set)))) # {1:0, 2:1, 6:2, 32:3}
# find the closest activity to each cluster and assign color
cluster_segments, cluster_segments_complex, cluster_colors_set, cluster_array, ratios = \
Util.findClosestActivity(clusters, activity_means, dict_activity_index_colour)
#Validate and visualize
confusion_matrix_detailed, hungarian_matrix, result = \
Util.validation(cluster_colors_set, dataAll, dict_activity_index_colour, activities_set,
cluster_segments_complex, True, [], cluster_array, [], n_clusters_,
cluster_segments, minDur, True)
In [ ]: