Stay Alert! The Ford Challenge

by Scott Josephson

Driving while distracted, fatigued or drowsy may lead to accidents. Activities that divert the driver's attention from the road ahead, such as engaging in a conversation with other passengers in the car, making or receiving phone calls, sending or receiving text messages, eating while driving or events outside the car may cause driver distraction. Fatigue and drowsiness can result from driving long hours or from lack of sleep.

The data for this Kaggle challenge shows the results of a number of "trials", each one representing about 2 minutes of sequential data that are recorded every 100 ms during a driving session on the road or in a driving simulator. The trials are samples from some 100 drivers of both genders, and of different ages and ethnic backgrounds. The files are structured as follows:

The first column is the Trial ID - each period of around 2 minutes of sequential data has a unique trial ID. For instance, the first 1210 observations represent sequential observations every 100ms, and therefore all have the same trial ID The second column is the observation number - this is a sequentially increasing number within one trial ID The third column has a value X for each row where

           X = 1     if the driver is alert

           X = 0     if the driver is not alert

The next 8 columns with headers P1, P2 , …….., P8 represent physiological data;

The next 11 columns with headers E1, E2, …….., E11 represent environmental data;

The next 11 columns with headers V1, V2, …….., V11 represent vehicular data;

Import Libraries



In [1]:

    
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

Get the Data

Read in the fordtrain.csv file and set it to a data frame called ford_train.

Split the data into training set and testing set using train_test_split



In [2]:

    
ford_train = pd.read_csv('fordtrain.csv')

Check the head of ad_data



In [3]:

    
ford_train.head()









    Out[3]:







  
    
      
      TrialID
      ObsNum
      IsAlert
      P1
      P2
      P3
      P4
      P5
      P6
      P7
      ...
      V2
      V3
      V4
      V5
      V6
      V7
      V8
      V9
      V10
      V11
    
  
  
    
      0
      0
      0
      0
      34.7406
      9.84593
      1400
      42.8571
      0.290601
      572
      104.895
      ...
      0.175
      752
      5.99375
      0
      2005
      0
      13.4
      0
      4
      14.8004
    
    
      1
      0
      1
      0
      34.4215
      13.41120
      1400
      42.8571
      0.290601
      572
      104.895
      ...
      0.455
      752
      5.99375
      0
      2007
      0
      13.4
      0
      4
      14.7729
    
    
      2
      0
      2
      0
      34.3447
      15.18520
      1400
      42.8571
      0.290601
      576
      104.167
      ...
      0.280
      752
      5.99375
      0
      2011
      0
      13.4
      0
      4
      14.7736
    
    
      3
      0
      3
      0
      34.3421
      8.84696
      1400
      42.8571
      0.290601
      576
      104.167
      ...
      0.070
      752
      5.99375
      0
      2015
      0
      13.4
      0
      4
      14.7667
    
    
      4
      0
      4
      0
      34.3322
      14.69940
      1400
      42.8571
      0.290601
      576
      104.167
      ...
      0.175
      752
      5.99375
      0
      2017
      0
      13.4
      0
      4
      14.7757
    
  

5 rows × 33 columns



In [4]:

    
ford_train.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 604329 entries, 0 to 604328
Data columns (total 33 columns):
TrialID    604329 non-null int64
ObsNum     604329 non-null int64
IsAlert    604329 non-null int64
P1         604329 non-null float64
P2         604329 non-null float64
P3         604329 non-null int64
P4         604329 non-null float64
P5         604329 non-null float64
P6         604329 non-null int64
P7         604329 non-null float64
P8         604329 non-null int64
E1         604329 non-null float64
E2         604329 non-null float64
E3         604329 non-null int64
E4         604329 non-null int64
E5         604329 non-null float64
E6         604329 non-null int64
E7         604329 non-null int64
E8         604329 non-null int64
E9         604329 non-null int64
E10        604329 non-null int64
E11        604329 non-null float64
V1         604329 non-null float64
V2         604329 non-null float64
V3         604329 non-null int64
V4         604329 non-null float64
V5         604329 non-null int64
V6         604329 non-null int64
V7         604329 non-null int64
V8         604329 non-null float64
V9         604329 non-null int64
V10        604329 non-null int64
V11        604329 non-null float64
dtypes: float64(14), int64(19)
memory usage: 152.2 MB

Logistic Regression

Now it's time to do a train test split, and train our model!

Choose columns that you want to train on!



In [19]:

    
X_train, X_test, y_train, y_test = train_test_split(ford_train.drop('IsAlert',axis=1),ford_train['IsAlert'],
                                                    test_size=0.30,random_state=101)

Train and fit a logistic regression model on the training set.



In [21]:

    
logmodel = LogisticRegression()



In [22]:

    
logmodel.fit(X_train, y_train)









    Out[22]:





LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

Predictions and Evaluations

Now predict values for the testing data.



In [23]:

    
predictions = logmodel.predict(X_test)

Create a classification report for the model.



In [25]:

    
print(classification_report(y_test,predictions))









    



             precision    recall  f1-score   support

          0       0.82      0.73      0.77     76334
          1       0.82      0.88      0.85    104965

avg / total       0.82      0.82      0.82    181299

	ObsNum	P1	P2	P3	P4	P5	P6	P7	...	V2	V3	V4	V6	V8	V10	V11
0	0	34.7406	9.84593	1400	42.8571	0.290601	572	104.895	...	0.175	752	5.99375	2005	13.4	4	14.8004
1	1	34.4215	13.41120	1400	42.8571	0.290601	572	104.895	...	0.455	752	5.99375	2007	13.4	4	14.7729
2	2	34.3447	15.18520	1400	42.8571	0.290601	576	104.167	...	0.280	752	5.99375	2011	13.4	4	14.7736
3	3	34.3421	8.84696	1400	42.8571	0.290601	576	104.167	...	0.070	752	5.99375	2015	13.4	4	14.7667
4	4	34.3322	14.69940	1400	42.8571	0.290601	576	104.167	...	0.175	752	5.99375	2017	13.4	4	14.7757