In this dataset I have information about 10 cars driving on October 8, 2016. I'm going to get as many insights as possible from this data and generate ideas about the ways to sell this data and get profit.
A very important point first. October 8, 2016 is Saturday, which is a weekend in United Arab Emirates (cars coordinates are in this country), so this influences the people activity: some could work, some could entertain themselves, some could buy products for the week and so on.
Also some additional information: As far as I know speed limit in United Arab Emirates is up to 120 km/h on main roads and 140 km/h on E11. Many drivers maintain higher speed between the speed cameras. So drivers with speeds exceeding the limits should be considered risky.
My analysis goes through the following steps:
In [1]:
#Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import folium
from folium import plugins
import datetime
from datetime import datetime
In [2]:
#Reading the data.
data = pd.read_excel('/devices_data.xlsx', encoding='utf-8')
In [3]:
data.info()
No missing values, which is convenient. Two columns have data in datetime format, other columns (except DeviceNumber) are numerical. Also LockRightRear has type 'object', which is strange, I'll look into it.
In [4]:
data.head()
Out[4]:
This is how the data looks like. Let's start the analysis.
In [5]:
len(data.FuelLevel.unique()), len(data.AlarmOn.unique())
Out[5]:
FuelLevel and AlarmOn have only one distinct value each, this means that fuel level isn't measured. Alarm is also not measured or isn't activated. So these columns can be dropped.
In [6]:
data.LockRightRear.value_counts()
Out[6]:
LockRightRear column has two distinct values and they are strange. I suppose that there was a problem with formatting, so '00:00:00' is 0 and the other value is 1. I'll fix it.
In [7]:
data['LockRightRear'] = data['LockRightRear'].apply(lambda x: 0 if str(x) == '00:00:00' else 1)
In [8]:
data.loc[data.Version == 87].DeviceNumber.unique(), data.loc[data.Version == 94].DeviceNumber.unique()
Out[8]:
Two cars have one version, the rest have another. I suppose that this is software version. Not much useful information there, so I'll drop it.
In [9]:
data.loc[data.TimeUtc != data.LastSync]
Out[9]:
There is a column with LastSync time, but it differs from TimeUtc only in 5 rows and the difference isn't high. So this column can also be dropped.
In [10]:
for i in range(1, 11):
dev = 'device_' + str(i)
print(dev, data.loc[data.DeviceNumber == str(dev)].LastNumber.unique())
Nine devices have one value of LastNumber and the 8th device has 6 distinct values. I have no idea about what 'LastNumber' is, so I'll drop it.
In [11]:
#Drop columns
data.drop(['FuelLevel', 'AlarmOn', 'LastSync', 'Version', 'LastNumber'], axis=1, inplace=True)
In [12]:
len(data.drop_duplicates(subset=['DeviceNumber', 'TimeUtc'], keep=False)) / len(data)
Out[12]:
Almost 23% of all rows are duplicates, this means there are many lines with the same time for a device. I decided to completely drop these columns: data granularity is high, so this won't have a serious impact. And if some data looks strange after dropping the rows - I can get them back.
In [13]:
data.drop_duplicates(subset=['DeviceNumber', 'TimeUtc'], keep=False, inplace=True)
In [14]:
data.loc[data.DeviceNumber == 'device_1']['Mileage'].max() - data.loc[data.DeviceNumber == 'device_1']['Mileage'].min()
Out[14]:
In [15]:
data.Speed.max()
Out[15]:
So max speed is 208, I think this is km/hour (if this is miles/hour, then the speed is ~344 km/hour, which is too high for normal car). And day distance for the first car is 314000. I think these are meters, so I'll convert them to kilometers.
In [16]:
data['Mileage'] = data['Mileage'] / 1000
In [17]:
#All the data is in one day, so I'll leave just the time.
data['TimeUtc'] = data['TimeUtc'].dt.time
In [18]:
data.head()
Out[18]:
This is how the data looks like after the changes.
At first I wasn't sure how to interpret information in columns LockTrunk, LockHood, LockDriver, LockTPassenger, LockLeftRear, LockRightRear. They have a lot of zero values and only several percent of values '1'. Then I found out that values '1' appeared in columns LockTrunk, LockHood and LockRightRear only when speed was zero. And '1' appeared in other columns at zero or low speeds. I colclude that '1' means that trunk, hood or a door was opened.
If the hood is opened, this could mean that there are some problems with the car or that it had a planned check-up.
If the trunk is opened, I assume that some baggage was put into it or was taken from it.
Opened driver door means that the driver enteres or leaves the car.
Opened passenger or rear door means that there was a passenger on the relevant seat. Well, it could also mean that something was put onto or taken from the relevant seat, but I have no way to distinguish. Also one should notice that if only one back door was opened, it doesn't mean that there is only one passenger on the back seat.
Now let's get some basic information.
In [19]:
for i in range (1,11):
device = 'device_' + str(i)
data_device = data.loc[data.DeviceNumber == str(device)]
distance = data_device['Mileage'].max() - data_device['Mileage'].min()
trunk = '' if data_device['LockTrunk'].max() == 0 else 'Opened trunk. '
hood = '' if data_device['LockHood'].max() == 0 else 'Opened hood. '
tpassenger = '' if data_device['LockTPassenger'].max() == 0 else 'Has tpassenger. '
rear = '' if data_device['LockLeftRear'].max() == 0 or data_device['LockRightRear'].max() == 0 else 'Has back passenger.'
print(device[7:], 'Mileage:', data_device['Mileage'].min(), '+', distance, 'or',
'{:.2f}%.'.format((distance / data_device['Mileage'].min()) * 100),
trunk + hood + tpassenger + rear)
Seven cars used trunk, 3 cars opened hood, all had a passeanger at the front seat, 8 had rear passengers.
One car (5) didn't drive at all. Cars 7 and 8 are quite new - mileage is really low. Car 9 drove a little, cars 3 and 4 drove a bit more. Cars 2 and 8 drove quite far. And car 2 has the highest starting mileage.
This is a basic analysis and the conclusions may be changed.
In [20]:
data1 = data.loc[data.DeviceNumber == 'device_1']
data1['acceleration1'] = data1.Speed.shift(-1) - data1.Speed
data1.loc[data1.acceleration1 > 40].acceleration1
Out[20]:
I wanted to analyse acceleration of cars to determine drivers who are prone to abrupt changes is speed, which could cause incidents. But there are a lot of peaks in the data - such high acceleration isn't physically possible. Cleaning or smoothing the data is complicated, so I'll just analyse the speeds.
In [21]:
#The function to plot a column for each device on a separate graph.
def plot_column(column):
fig, axes = plt.subplots(nrows=5, ncols=2)
#This sets axes based on device number.
for i in range(1,11):
dev = 'device_' + str(i)
if i in (1,2):
data.loc[data.DeviceNumber == str(dev)].plot(
x='TimeUtc', y=column, figsize=(15, 24), ax=axes[0,i-1]); axes[0,i-1].set_title(str(dev));
elif i in (3,4):
data.loc[data.DeviceNumber == str(dev)].plot(
x='TimeUtc', y=column, figsize=(15, 24), ax=axes[1,i-3]); axes[1,i-3].set_title(str(dev));
elif i in (5,6):
data.loc[data.DeviceNumber == str(dev)].plot(
x='TimeUtc', y=column, figsize=(15, 24), ax=axes[2,i-5]); axes[2,i-5].set_title(str(dev));
elif i in (7,8):
data.loc[data.DeviceNumber == str(dev)].plot(
x='TimeUtc', y=column, figsize=(15, 24), ax=axes[3,i-7]); axes[3,i-7].set_title(str(dev));
elif i in (9,10):
data.loc[data.DeviceNumber == str(dev)].plot(
x='TimeUtc', y=column, figsize=(15, 24), ax=axes[4,i-9]); axes[4,i-9].set_title(str(dev));
In [22]:
plot_column('Temperature')
The temperature is roughly the same for all cars. In some cases it is lower than 30 or higher than 50. I suppose that it could reach high levels during long stops at sunny places.
In [23]:
plot_column('SatellitesCount')
This gives me little information. Maybe the number of satellites descreases during long stops, as less presicion is required. And more satellites are necessary when the car is driving.
Now I analyse the data for each car. I take the following steps:
In [24]:
def car_data(device):
#Use the data only for the current device.
data_device = data.loc[data.DeviceNumber == str(device)]
#Starting coordinates, coordinates of all stops will be added to this list.
loc = [(data_device.iloc[0,3], data_device.iloc[0,4])]
#Time of stops.
stop_time = [data_device.iloc[0,1]]
#Length of stops.
stop_length = []
#Time of the last stop. Mainly to check that the length of the stop is greater than 3 minutes.
last_stop = str(data_device.iloc[0,1])
#Time of leaving the stops.
move_time = []
#I go through each row to determine whether the car stops or starts moving (speed becomes 0 or becomes > 0 respectively).
#If the car moves for the first time, I write first moving time and the length of the initial stops.
#Then if a car stops and the length of the stop is > 180 seconds, I add information about the coordinates and the time.
for i in range(1, len(data_device)):
if data_device.iloc[i,5] > 0 and data_device.iloc[i-1,5] == 0:
if last_stop == str(data_device.iloc[0,1]):
stop_length.append(data_device.iloc[i,1])
move_time.append(data_device.iloc[i,1])
else:
if (datetime.strptime(str(data_device.iloc[i,1]), "%H:%M:%S") \
- datetime.strptime(str(last_stop), "%H:%M:%S")).seconds < 180:
pass
else:
stop_length.append(datetime.strptime(str(data_device.iloc[i,1]), "%H:%M:%S") \
- datetime.strptime(str(last_stop), "%H:%M:%S"))
loc.append(loc_temp)
stop_time.append(stop_time_temp)
move_time.append(data_device.iloc[i,1])
if data_device.iloc[i,5] == 0 and data_device.iloc[i-1,5] > 0:
last_stop = data_device.iloc[i,1]
loc_temp = (data_device.iloc[i,3], data_device.iloc[i,4])
stop_time_temp = data_device.iloc[i,1]
#Adding information about the final stop.
loc.append(loc_temp)
stop_time.append(stop_time_temp)
stop_length.append('the rest of the day')
#Writing information about distinct latitudes and longitudes for plotting.
lat_device = list(data_device['Latitude'])
lon_device = list(data_device['Longitude'])
lat = [lat_device[0]]
lon = [lon_device[0]]
for i in range(1, len(lat_device)):
if lat_device[i] != lat_device[i-1] or lon_device[i] != lon_device[i-1]:
lat.append(lat_device[i])
lon.append(lon_device[i])
#Plotting the car path on the map with markers for stops. First line centers map on the mean coordinates.
cars_map = folium.Map(location=[np.mean(lat), np.mean(lon)], zoom_start=10)
marker_cluster = folium.MarkerCluster().add_to(cars_map)
#Adding markers for each stop with information about the stop.
for i in range(len(loc)):
folium.Marker([loc[i][1], loc[i][0]], popup="Stopped at {0} for {1}".format(stop_time[i],
stop_length[i])).add_to(marker_cluster)
#Plotting car path.
folium.PolyLine(zip(lat, lon), color="blue", weight=2.5, opacity=1).add_to(cars_map)
folium.LatLngPopup().add_to(cars_map)
#Plotting graphs for various variables.
fig, axes = plt.subplots(nrows=2, ncols=2)
data_device.plot(x='TimeUtc', y='Speed', figsize=(18, 15), ax=axes[0,0]); axes[0,0].set_title('Speed');
data_device.loc[data_device.Speed > 0].plot(
x='TimeUtc', y='Speed', figsize=(18, 15), ax=axes[0,1]); axes[0,1].set_title('Non zero Speed');
data_device.plot(x='TimeUtc', y='EngineRPM', figsize=(18, 15), ax=axes[1,0]); axes[1,0].set_title('EngineRPM');
data_device.loc[data_device.Speed > 0].plot(
x='TimeUtc', y='EngineRPM', figsize=(18, 15), ax=axes[1,1]); axes[1,1].set_title('EngineRPM at non zero Speed');
#Showing car data.
#Show mileage at the beginning of the day and driving distance for the day.
mileage = int(data_device['Mileage'].min())
distance = int(data_device['Mileage'].max()) - mileage
print('Mileage at the beginning of the day:', str(mileage) + ' km.', 'Drove today', str(distance) + ' km',
'or increased mileage by', '{:.2f}%.'.format((distance / mileage) * 100))
#Information about the car at all parts of the travel. Two main states of the car are driving and staying.
#What can be opened in a car during a stop.
actions_list = ['LockTrunk', 'LockHood', 'LockDriver', 'LockTPassenger', 'LockLeftRear', 'LockRightRear']
objects = ['Trunk', 'Hood', 'Driver door', 'Passenger door', 'Left rear door', 'Right rear door']
#I need a list of timepoints. Didn't find a better way to do it.
time_points = [None]*(len(stop_time) + len(move_time))
time_points[::2] = stop_time
time_points[1::2] = move_time
#I go through timepoints. For each part of travel I get starting and ending time and the difference between them.
#For stops I see whether something was opened in a car(column has at leat one 1). And average temperature.
#For travels I get travel distance and average values of speed, EngineRPM and temperature.
#For the final stop I get data separately as it differs from usual stops.
for i in range(0, len(time_points), 2):
#Actions - temporal list for i.
actions = []
if i < len(time_points) - 1:
for j in range(len(actions_list)):
if (data_device.loc[(data_device.TimeUtc > time_points[i])
& (data_device.TimeUtc < time_points[i+1])][actions_list[j]] > 0).any():
actions.append(objects[j])
#If nothing was opened, show "nothing"
actions = actions if len(actions) > 0 else ['nothing']
print(str(time_points[i]) + ' - ' + str(time_points[i+1]),
'Stayed for {0}.'.format((datetime.strptime(str(time_points[i+1]), "%H:%M:%S") \
- datetime.strptime(str(time_points[i]), "%H:%M:%S"))),
'Average temperature was {0}.'.format(round(data_device.loc[(data_device.TimeUtc > time_points[i])
& (data_device.TimeUtc < time_points[i+1])]['Temperature'].mean())),
'Opened: {0}.'.format(', '.join(actions)))
#Temporal data for the time interval.
local_data = data_device.loc[(data_device.TimeUtc > time_points[i+1]) & (data_device.TimeUtc < time_points[i+2])]
#There are cases when there is one strange line with nonzero speed among lines with zero speed. This is to check.
if len(local_data) < 2:
pass
else:
print(str(time_points[i+1]) + ' - ' + str(time_points[i+2]), 'Drove for {0}.'.format(
(datetime.strptime(str(time_points[i+2]), "%H:%M:%S") \
- datetime.strptime(str(time_points[i+1]), "%H:%M:%S"))),
'Traveled {0} km, with average speed {1}, EngineRPM {2} and temperature {3}.'.format(
local_data.Mileage.max()-local_data.Mileage.min(),
round(local_data.Speed.mean()),
round(local_data.EngineRPM.mean()), round(local_data.Temperature.mean())))
elif i == len(time_points)-1:
for j in range(len(actions_list)):
if (data_device.loc[data_device.TimeUtc > time_points[i]][actions_list[j]] > 0).any():
actions.append(objects[j])
actions = actions if len(actions) > 0 else ['nothing']
print(str(time_points[i]) + ' - ' + 'midnight', 'Stayed till the end of the day.',
'Average temperature was {0}.'.format(
round(data_device.loc[data_device.TimeUtc > time_points[i]]['Temperature'].mean())),
'Opened: {0}.'.format(', '.join(actions)))
return cars_map
In [25]:
car_data('device_1')
Out[25]:
So the car was at home at first. Then the family got into the car (all doors were opened) and some baggage was put into the trunk. They drove for 68 km to a place near a supermarket and KFC. I think that the whole family left the car and went to eat or to the supermarket. After this the driver returned to the car and for some time drove around the area. He/she drove to barber (or to a little cafe), then visited Starbucks. At last car went back to the supermarket to get the family and possible some more things from supermarket. I may guess that during the first stop the products were bought, and then the family went to purchase other items and this was too boring for the driver. Also it is possible that the driver prefers Starbucks, and his family - KFC.
Then they drove to a place in the city and the family left the car here. I suppose that this is their home in the city. After this the driver traveled alone. At first he/she stayed at National Galleria: there are shops, restaurants and other interesting places. Then he/she drove to some place, where he/she didn't even leave the car - maybe someone brought him something? After this he/she stopped near the ministry of foreign affairs (for some business maybe). At last he/she visited the gas station and went home after this at 17 hours.
Average speed was okay, but there were peaks of high speed occasionally and sometimes engime prm was too high. Also he/she drove a lot - almost half of the mileage. This could mean that the car is quite new or is rarely used.
So the driver is a family person. The family buys products on Sunday and eats breakfast in cafe. Then they go to their home in the city and stay their. The driver likes starbucks and deals with some matters in the ministry of foreign affairs. The speed is often risky and engine prm is sometimes pushed to high values.
In [26]:
car_data('device_2')
Out[26]:
The car was at home at first. Only the driver entered the car. He/she drove to a cafe and stayed there for an hour. Then he/she returned home for several hours and took passengers. I noticed that he/she reparked several times. The car visited a couple of places nearby and then drove to a hotel in Abu Dhabi. Maybe they are going to stay there for a weekend. It is worth noticing that there was no baggage, as trunk wasn't opened.
It seems that the driver likes fast driving and pushing the engine to limits. Also he/she drives a lot - mileage of the car is quite high.
So the driver has a family, he/she likes to have breakfast in cafe and goes to a big city for the weekend with the family.
In [27]:
car_data('device_3')
Out[27]:
The car was at home until ~ 14 hours. Then the driver and the passenger drove to eat in a nearby place. Strange, but passenger's door wasn't opened here. Maybe the passenger used the driver's door? After eating the car returned home and the passenger left. After this the car drove to Emirates Islamic bank for several minutes - maybe take cash from ATM? The next stop took almost three hours and it was near French international school - maybe learning French? Then one more stop for ~3 hours nearby - a lot of places there - indian high school, catholic church, supermarket, mosque... And then the driver returned home.
Car mileage isn't high, but driving distance also isn't big. Maybe the car was bought several weeks ago? Average speed and Engine RPM are okay.
The driver could be a young person, who likes sleeping late on weekend; he/she eats in a restaurant and visits high school for some courses.
In [28]:
car_data('device_4')
Out[28]:
Stayed at home until 17 hours! Took a passenger and drove. Interesting point is that back doors were opened, but other data indicates that there were no one there when the car started driving. So only one passenger.
The car drove to the hospital and stayed there for an hour. Maybe the driver or the passenger had a medical check-up. Or they visited someone. Then they returned home for several hours.
At night the car drove to the pharmacy. After this the car drove to a gas station near supermarket. It seems that the driver isn't going to sleep yet.
Graphs show high speed when the car drove to hospital and back. It seems that the passenger could be relative who urgently needed the medical help.
So the driver has someone, who needs medicine and hospital visits. Also he/she doesn't sleep at late night - maybe it is the stress of the day or it could be a usual situation. And he/she doesn't drive a lot.
In [29]:
#Plotting the path of the car.
lat_device = list(data.loc[data.DeviceNumber == 'device_5']['Latitude'])
lon_device = list(data.loc[data.DeviceNumber == 'device_5']['Longitude'])
lat = [lat_device[0]]
lon = [lon_device[0]]
for i in range(1, len(lat_device)):
if lat_device[i] != lat_device[i-1] or lon_device[i] != lon_device[i-1]:
lat.append(lat_device[i])
lon.append(lon_device[i])
cars_map = folium.Map(location=[np.mean(lat), np.mean(lon)], zoom_start=10)
marker_cluster = folium.MarkerCluster().add_to(cars_map)
folium.PolyLine(zip(lat, lon), color="blue", weight=2.5, opacity=1).add_to(cars_map)
folium.LatLngPopup().add_to(cars_map)
cars_map
Out[29]:
This car differs from other cars: it speed is zero for all the day, but it still moves (a little), and it's doors, hood and trunk are opened. The only reason I could think of is that the car is being repaired.
In [30]:
car_data('device_6')
Out[30]:
The driver left home alone in the morning. He/she drove to Dubai and during the way stopped at Epco Gas Station. In Dubai he/she stopped near a supermarket and medical center. Then he/she drove to Town Centre Jumeirah - a shopping center. After this he/she drove back and parked near a big shopping center. Later he/she returned home. When the car left, there was a passenger. They arrived to a Tim Hortons - multinational fast food restaurant. Then they (and more passengers) drove back home - maybe they were guests. Some time later the car drove to a place at E11 and left passengers from back seats there. And this was the last destination, as the car returned home at ~ 17 hours.
The car drove at high speeds, but mostly within limits. Maybe sometimes the speed was a bit too high between the speed cameras. It seems that the car is quite new and the driver may use it a lot.
The driver shops a lot and drives many people. A social person.
In [31]:
car_data('device_7')
Out[31]:
A new car (very low mileage). It seems that there were some problems with the car - the hood was opened in the morning and then the car drove to something like a car shop. Probably there the car was repaired. Then there was a stop at a parking... well, not really. The car was taken to the Dubai - it seems the initial repair wasn't successful. Now the things got better - car became able to move and drove to a gas station.
Then they ate at a restaurant and visited another gas station. Interesting point is that only a driver left the car at a long stop near the restaurant. Maybe the passenger was transported by a car company? Anyway he/she left a passenger at E11 and finished the travel in a place different from the initial. Second home?
The car drove somewhere within the speed limits.
So the new car with some technical problems. Maybe changing it is a good idea? By maybe it is normal for a car to visit the car shop during the first days after the purchase. I can't be sure.
In [32]:
car_data('device_8')
Out[32]:
It seems that this is also a new car with some technical problems: in the morning the hood was opened and then nearby car center was visited. Or the car simply was in the car center and it was driven to the owner's place later. Another possibility is that the car was bought in the morning, but not sure - it is quite early.
Then the car drove to Abu Dhabi and visited a gas station on the way there. Also a stop was made at two more gas station.
After this the car drove to some place called Village Center, but nobody left the car. Maybe there was a short meeting with someone? And then the car returned to the first city.
Very high speed. A new car. And several visits to gas stations. Not sure what to conclude. Could this be an electrical car? Or maybe it is simply a new car and that's the reason for it's behavior.
In [33]:
car_data('device_9')
Out[33]:
Very little data. A family drove the car at midday. They stayed at some place for half an hour and returned home. The car is quite new and is little used today. It seems that the driver likes to drive fast.
In [34]:
car_data('device_10')
Out[34]:
This car drives actively. It drives around a town and makes stops at various places. Most of the time the speed is normal, except one time period with a high speed. The key is in the starting and ending point, which was also visited three times during the day: it is near Fujairah Police Station. So I conclude that this is a police car.
Insurance companies use telematic data a lot. Usage-based insurance provides lower costs for drivers and lower risks for insurance companies. Telematic data can uncover frauds (nonaccidental crashes) or find the real reason behind the crashes.
Car companies also can gain profit from telematic data. If the car has high mileage, they can offer check-ups for the driver, for example. They may find problems in the car remotely and offer the repair before something breaks. Or maybe based on the information about how the cars are used, the companies will make modifications in future cars.
One more possibility is combining telematic data with smart houses. For example, if the car arrives at home, the garade doors could open automatically.
Another idea is using telematic data for advertising. If the driver often visits supermarkets, products could be offered. If the driver drives to cafes, cafes may be advertised and so on.