Introduction

This notebook presents the code for UnADevs Method: Unbounded Unsupervised Activity Discovery using the Temporal Behaviour Assumption.

The method is based on online clustering, and also includes the temporal information of the occuring activities. Therefore, it is able to discover clusters of repating/periodic activities as the occur and additionally keeps track of the time interval of the discovered cluster. This way, the system may prompt the user about the discovered activity with the appropriate time interval and ask for feedback.

The detailed description of the algorithm is in the paper "Unsupervised Online Activity Discovery Using Temporal Behaviour Assumption". [1]

More information about the projet: http://www.sussex.ac.uk/strc/research/wearable/research-ll

The following code is an example clustering and visualization of Subject 1 from the JSI-ADL dataset [2][3]. It corresponds to the visualization shown with Figure 2 in the ISWC paper [1].

[1] H. Gjoreski, D. Roggen. Unsupervised Online Activity Discovery Using Temporal Behaviour Assumption. In: 21th International Symposium Wearable Computers (ISWC) 2017, 11-15 September 2017, Maui, Hawaii, USA

[2] H. Gjoreski et al. “Context-based ensemble method for human energy expenditure estimation”. Appl. Soft Comput., vol. 37, pp. 960–970, 2015.

[3] B. Cvetković, R. Milić, and M. Luštrek, “Estimating Energy Expenditure with Multiple Models using Different Wearable Sensors,” IEEE J. Biomed. Heal. informatics, vol. 20, 2016.

Setup



In [5]:

    
import time
import numpy as np
import Online_temporal_clustering_JSI_release as OTC
import Utilities_JSI_release as Util
from sklearn.preprocessing import scale
###########################################
# parameters
np.random.seed(2)

tolerance = 22
activePool = 3
minDur = 16

OTC.deltaT = tolerance #bigger number bigger clusters, tends to combine small clusters with big ones
OTC.memoryDelta = tolerance +1 #constant
OTC.num_clusterss = activePool #bigger number scattered clusters, lots of empty space... if you increase this, also increase the memory parameters
OTC.threshold_cluster_size = minDur

Data

The data from Subject 1 from the JSI_ADL dataset is used. In this example, we use the already extracted features. We also provide the dataprocessing code (segmentation, feature extraction) in the "Preprocessing" folder.



In [6]:

    
# Load the data (features already extracted)
# data Format: [timestamp, f1, f2, f3, ... fn, label]
data_features = np.loadtxt('data_JSI/data_features_1.csv', delimiter=';')
features_list = [1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 23, 24]
data_features = sorted(data_features, key=lambda a_entry: a_entry[0])
data_array = np.array(data_features)

# Select features
data_array[:, features_list] = scale(data_array[:, features_list])
dataAll = np.column_stack((data_array[:, [3, 5]], data_array[:, 0], data_array[:, -1]))

points = data_array[:, features_list]
timestamps = dataAll[:, [2]]
n = len(points)
start = time.time()

Clustering



In [7]:

    
# Perform the clustering
c = OTC.OnlineCluster(OTC.num_clusterss)
for ind1, point in enumerate(points):
    c.cluster(point, timestamps[ind1])
clusters = c.trimclusters()
n_clusters_ = len(clusters)
print "Clustered %d points in %.2f seconds and found %d clusters." % (n, time.time() - start, n_clusters_)









    



Clustered 7123 points in 2.21 seconds and found 28 clusters.

Validation and Visualization



In [8]:

    
# Validation and Visualization of the clusters
clusters = Util.removeContained(clusters)
data_array2 = Util.remove_small_activities(data_array, dataAll[:, [3]], minDur)
dataAll2 = np.column_stack((data_array2[:, [3, 5]], data_array2[:, 0], data_array2[:, -1]))

activity_means = Util.get_activity_means(np.column_stack((data_array2[:, features_list], data_array2[:, [0, -1]])))
activities_set = list(set(dataAll2[:, [3]].T[0]))
dict_activity_index_colour = dict(zip(activities_set, np.arange(len(activities_set))))  # {1:0, 2:1, 6:2, 32:3}

# find the closest activity to each cluster and assign color
cluster_segments, cluster_segments_complex, cluster_colors_set, cluster_array, ratios = \
    Util.findClosestActivity(clusters, activity_means, dict_activity_index_colour)

#Validate and visualize
confusion_matrix_detailed, hungarian_matrix, result = \
    Util.validation(cluster_colors_set, dataAll, dict_activity_index_colour, activities_set,
                    cluster_segments_complex, True, [], cluster_array, [], n_clusters_,
                    cluster_segments, minDur, True)









    



EVALUATION:
		\Accu	F-meas	Fragmentation
Supervised:	0.79	0.81	1.58823529412
Unsupervised:	0.82	0.86	1.58823529412

Number of not found activities (supervised identification): 5 out of: 18
Number of not found activities (unsupervised discovery ):0 out of: 18

Activities

• 'standing': 0
• 'lying_excercising': 1
• 'walking': 2
• 'cycling': 3
• 'working_pc': 4 • 'sitting': 5
• 'transition': 6
• 'shovelling': 7
• 'washing_dishes': 8
• 'running': 9
• 'allfours_move': 10
• 'lying_back': 11
• 'allfours': 12
• 'kneeling': 13
• 'scrubbing_floor': 14



In [ ]: