Training the IoT Back Brace Machine Learning Model

Creating a Random Forest classification model using Spark and MLLib


In [1]:
import pyspark
from pyspark import SparkContext
import urllib
from pyspark.mllib.regression import LabeledPoint
from numpy import array
from pyspark.mllib.tree import RandomForest, RandomForestModel
from pyspark.sql import SQLContext
from time import time
#  Custom imports
import MySQLConnection

Getting the training data and creating the RDD

The data is stored in a local MySQL database. A connection is made and data is read from the "SensorTrainingReadings" table.


In [2]:
sqlContext = SQLContext(sc)
#  Get username and password from file in this format: {"user":"yourusername","password":"yourpassword"}
connectionProperties = MySQLConnection.getDBConnectionProps('/home/erik/mysql_credentials.txt')
# Get training data from the database...biosensor database and SensorTrainingReadings table
data = sqlContext.read.jdbc("jdbc:mysql://localhost/biosensor", "SensorTrainingReadings", properties=connectionProperties).selectExpr("deviceID","metricTypeID","uomID","positionID","actualPitch")
print "Train data size is {}".format(data.count())


Train data size is 34

Split training data into training set and test set

In order to evaluate the model, we need to hold back some data. In this case 30% hold-back should be good.


In [12]:
# Split data into training and test dataasets
(trainingDataTable, testDataTable) = data.randomSplit([0.9, 0.1])

trainingDataTable.show()
testDataTable.show()


+--------------+------------+-----+----------+-----------+---------+
|      deviceID|metricTypeID|uomID|positionID|actualPitch|actualYaw|
+--------------+------------+-----+----------+-----------+---------+
|ac423eb65d4a32|           6|    4|         0|       42.1|    -13.7|
|ac423eb65d4a32|           6|    4|         0|       43.5|    -41.1|
|ac423eb65d4a32|           6|    4|         0|       44.5|    -40.2|
|ac423eb65d4a32|           6|    4|         0|       46.2|    -40.9|
|ac423eb65d4a32|           6|    4|         0|       46.9|    -41.3|
|ac423eb65d4a32|           6|    4|         0|       47.6|    -42.3|
|ac423eb65d4a32|           6|    4|         0|       47.6|    -41.6|
|ac423eb65d4a32|           6|    4|         0|       48.8|    -40.6|
|ac423eb65d4a32|           6|    4|         1|       18.8|    -35.1|
|ac423eb65d4a32|           6|    4|         1|       18.9|    -32.8|
|ac423eb65d4a32|           6|    4|         1|       19.0|    -33.6|
|ac423eb65d4a32|           6|    4|         1|       20.5|    -34.4|
|ac423eb65d4a32|           6|    4|         1|       21.9|    -35.1|
|ac423eb65d4a32|           6|    4|         1|       24.4|    -35.6|
|ac423eb65d4a32|           6|    4|         1|       30.9|    -18.9|
|ac423eb65d4a32|           6|    4|         1|       31.7|    -38.1|
|ac423eb65d4a32|           6|    4|         1|       32.3|    -42.1|
|ac423eb65d4a32|           6|    4|         1|       37.8|      8.3|
|ac423eb65d4a32|           6|    4|         2|       -2.8|      1.7|
|ac423eb65d4a32|           6|    4|         2|       -2.1|     10.7|
+--------------+------------+-----+----------+-----------+---------+
only showing top 20 rows

+--------------+------------+-----+----------+-----------+---------+
|      deviceID|metricTypeID|uomID|positionID|actualPitch|actualYaw|
+--------------+------------+-----+----------+-----------+---------+
|ac423eb65d4a32|           6|    4|         2|      -10.5|      6.4|
|ac423eb65d4a32|           6|    4|         2|        7.3|     11.7|
|ac423eb65d4a32|           6|    4|         2|       13.7|     13.3|
|ac423eb65d4a32|           6|    4|         2|       13.9|     12.7|
|ac423eb65d4a32|           6|    4|         2|       18.8|     12.8|
+--------------+------------+-----+----------+-----------+---------+

Create an RDD of LabeledPoints

The featurize method returns a LabeledPoint with the label and an vector array of features.

An example for a reading from the stooped position would be:

  • 0, [-40,15]

In [13]:
# The model requires labeldPoints which is a row with label and a vector of features.
def featurize(t):
	return LabeledPoint(t.positionID, [t.actualPitch])

trainingData = trainingDataTable.map(featurize)

Training the model

For this example we are choosing a Random Forest model wich is multiple decision trees averaged together. In this case since we know there will only be 3 distinct values of "labels", numClasses = 3.


In [14]:
# Train the classifier/Build the model
startTime = time()

#Random Forest Model
model = RandomForest.trainClassifier(
                                    trainingData, 
                                    numClasses=3, 
                                    categoricalFeaturesInfo={},
                                    numTrees=6, 
                                    featureSubsetStrategy="auto",
                                    impurity='gini', 
                                    maxDepth=4, 
                                    maxBins=32
                                    )

elapsedTime = time() - startTime

print "Classifier trained in {} seconds".format(round(elapsedTime,3))

# Save the madel for use in evaluating readings
model.save(sc,"models/IoTBackBraceRandomForest.model")


Classifier trained in 0.959 seconds

Evaluating the accuracy of the model

Since we use 70% of the training data for actually training the model, we have the remaining 30% that we can use as a test dataset. Since these values are still known, we can see if the model does a good job of classifying.


In [15]:
# Evaluate model on test instances and compute test error
testData = testDataTable.map(featurize)
predictions = model.predict(testData.map(lambda x: x.features))
labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions)
testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(testData.count())
print('Test Error = ' + str(testErr))


Test Error = 0.2

Another handy feature is that you can view the model logic tree by using the "toDebugString()" method


In [16]:
print('Random Forest Classifcation Model:')
print(model.toDebugString())


Random Forest Classifcation Model:
TreeEnsembleModel classifier with 6 trees

  Tree 0:
    If (feature 0 <= 16.9)
     Predict: 2.0
    Else (feature 0 > 16.9)
     If (feature 0 <= 37.8)
      Predict: 1.0
     Else (feature 0 > 37.8)
      Predict: 0.0
  Tree 1:
    If (feature 0 <= 31.7)
     If (feature 0 <= 16.9)
      Predict: 2.0
     Else (feature 0 > 16.9)
      Predict: 1.0
    Else (feature 0 > 31.7)
     Predict: 0.0
  Tree 2:
    If (feature 0 <= 16.9)
     Predict: 2.0
    Else (feature 0 > 16.9)
     If (feature 0 <= 37.8)
      Predict: 1.0
     Else (feature 0 > 37.8)
      Predict: 0.0
  Tree 3:
    If (feature 0 <= 16.9)
     Predict: 2.0
    Else (feature 0 > 16.9)
     If (feature 0 <= 37.8)
      Predict: 1.0
     Else (feature 0 > 37.8)
      Predict: 0.0
  Tree 4:
    If (feature 0 <= 16.9)
     Predict: 2.0
    Else (feature 0 > 16.9)
     If (feature 0 <= 37.8)
      Predict: 1.0
     Else (feature 0 > 37.8)
      Predict: 0.0
  Tree 5:
    If (feature 0 <= 37.8)
     If (feature 0 <= 16.9)
      Predict: 2.0
     Else (feature 0 > 16.9)
      Predict: 1.0
    Else (feature 0 > 37.8)
     Predict: 0.0

Using the model for analysis of raw data

Once the model is saved, it can be loaded again in any script by referring to the path where it was saved.


In [17]:
loadedModel = RandomForestModel.load(sc, "models/IoTBackBraceRandomForest.model")

The example below passes a value to the model from a range of -50 degrees (stooped) to +10 degrees (standing).


In [18]:
for i in range(-50,50):
    prediction = loadedModel.predict([i])
    positions = {
                  0 : "upright",
                  1 : "back bent",
                  2 : "stooped"
                }
    print str(i) + " => " + str(positions[prediction])


-50 => stooped
-49 => stooped
-48 => stooped
-47 => stooped
-46 => stooped
-45 => stooped
-44 => stooped
-43 => stooped
-42 => stooped
-41 => stooped
-40 => stooped
-39 => stooped
-38 => stooped
-37 => stooped
-36 => stooped
-35 => stooped
-34 => stooped
-33 => stooped
-32 => stooped
-31 => stooped
-30 => stooped
-29 => stooped
-28 => stooped
-27 => stooped
-26 => stooped
-25 => stooped
-24 => stooped
-23 => stooped
-22 => stooped
-21 => stooped
-20 => stooped
-19 => stooped
-18 => stooped
-17 => stooped
-16 => stooped
-15 => stooped
-14 => stooped
-13 => stooped
-12 => stooped
-11 => stooped
-10 => stooped
-9 => stooped
-8 => stooped
-7 => stooped
-6 => stooped
-5 => stooped
-4 => stooped
-3 => stooped
-2 => stooped
-1 => stooped
0 => stooped
1 => stooped
2 => stooped
3 => stooped
4 => stooped
5 => stooped
6 => stooped
7 => stooped
8 => stooped
9 => stooped
10 => stooped
11 => stooped
12 => stooped
13 => stooped
14 => stooped
15 => stooped
16 => stooped
17 => back bent
18 => back bent
19 => back bent
20 => back bent
21 => back bent
22 => back bent
23 => back bent
24 => back bent
25 => back bent
26 => back bent
27 => back bent
28 => back bent
29 => back bent
30 => back bent
31 => back bent
32 => back bent
33 => back bent
34 => back bent
35 => back bent
36 => back bent
37 => back bent
38 => upright
39 => upright
40 => upright
41 => upright
42 => upright
43 => upright
44 => upright
45 => upright
46 => upright
47 => upright
48 => upright
49 => upright

In [ ]: