Scenario: Classify the type of motion from a smartphone's accelerometer and gyroscope sensors.
This is based on a kaggle competition and example: https://www.kaggle.com/morrisb/what-does-your-smartphone-know-about-you
Reference: Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.
In [ ]:
%reload_ext autoreload
from util.dependencies import *
CC = Kernel("/home/md2k/cc_conf/")
from settings import USER_ID
import pandas as pd
pd.options.display.max_rows=20
In [ ]:
both_datastream = CC.get_stream('Kaggle-Features')
label_datastream = CC.get_stream('Kaggle-ActivityLabels')
both_dataframe = both_datastream.to_pandas().data
label_dataframe = label_datastream.to_pandas().data
both_dataframe = both_dataframe.drop(['timestamp','localtime','version','user'], axis=1)
label_dataframe = label_dataframe.drop(['timestamp','localtime','version','user'], axis=1)
In [ ]:
label_dataframe.groupby('Activity').size().reset_index(name='Counts')
In [ ]:
both_dataframe
The required packages are imported into the notebook before transforming the data.
StandardScalerLabelEncoder, due to most ML algorithms not working well with text
In [ ]:
from lightgbm import LGBMClassifier
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
# Create datasets
tsne_data = both_dataframe.copy()
data_data = tsne_data.pop('Data')
subject_data = tsne_data.pop('subject')
# Scale data
tsne_data = StandardScaler().fit_transform(tsne_data)
# Reduce dimensions (speed up)
tsne_data = PCA(n_components=0.95, random_state=3).fit_transform(tsne_data)
# Split the data
label_encoded = LabelEncoder().fit_transform(label_dataframe.Activity)
X_train, X_test, y_train, y_test = train_test_split(tsne_data, label_encoded, random_state=3)
A Gradient Boosting Machine (GBM) is trained using Gradient Boosting Decision Trees with the features identified through PCA. This example uses 50 boosted trees to identify the best model. The resulting classification accuracy peaks at 0.959 when utilizing 500+ boosted trees; however, with only 50, this is still 0.937.
In [ ]:
number_of_estimators=50
# Create the model
lgbm = LGBMClassifier(n_estimators=number_of_estimators)
lgbm = lgbm.fit(X_train, y_train)
# Test the model
score = accuracy_score(y_true=y_test, y_pred=lgbm.predict(X_test))
print('Classification Accuracy:',score)