Analysis of Motor Vehicle Thefts in Chicago

The data about motor vehicle thefts in chicago are taken from the Chicago Data Portal


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [51]:
mvt = pd.read_csv('../../data/mvt2015_2017.csv')
mvt.head(3)


Out[51]:
Date Block Primary Type Description Location Description Ward Community Area Year Location
0 05/03/2016 08:00:00 PM 100XX S SANGAMON ST MOTOR VEHICLE THEFT AUTOMOBILE STREET 34.0 73.0 2016 (41.711843569, -87.646607932)
1 05/03/2016 11:00:00 PM 084XX S MORGAN ST MOTOR VEHICLE THEFT AUTOMOBILE STREET 21.0 71.0 2016 (41.740895923, -87.648617881)
2 05/03/2016 04:45:00 PM 003XX W MONROE ST MOTOR VEHICLE THEFT AUTOMOBILE STREET 2.0 32.0 2016 (41.88063228, -87.635935494)

Exercise 0

Inspect the mvt-data (columns, colum-types, first rows...)

  • How many entries are there?
  • How many different Location Descriptions exists?

In [48]:
# TODO how many entries

In [49]:
# TODO different Location Descriptions

Exercise 1

Draw a histogram to investigate, which locations (Location Description) are the top 10 theft hot spots horizontally.

Exercise 1.1

Use Dataframe/Series.plot to plot the histogram (barchart) horizontally.

Note:
DataFrame.plot.hist() can only work with numerical x-values. If you don't want to convert the 'Location Descriptions' to a numeric value, you may want to use Series.value_counts().plot instead.


In [5]:
# TODO


Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f30f2ed4fd0>

Exercise 1.2

Install seaborn:

conda install seaborn

Use the Countplot to show the frequencies of thefts at the single location (descriptions).

Plot the top 10 hot spots horizontally.

See also Plotting with categorical data for further information about how to show categorical data with seaborn.

Hint

  • you still have to use value_counts() to find the top10 hot spots
  • set sns.countplot-order parameter to the top10 hot spots

In [47]:
import seaborn as sns
# TODO your code goes here


Exercise 2

Investigate, at which weekdays/which hour the most thefts happen.

Exercise 2.1

  1. Convert Date to pandas.DateTime,
  2. set Date as index and
  3. add a further column Weekday (which is the weekday of Date), which is an ordered category (['Monday' , 'Tuesday'...).
    Check

Note:
Defining the weekday as categorical is required to get the plots in the further exercises ordered by weekday.


In [45]:
# TODO Date to pandas.DateTime

In [8]:
# TODO Date as index

In [9]:
# TODO WeekDay as new ordered categorical column

Exercise 2.2

Count the thefts by weekday and
make a corresponding histogram (barchart)

At which weekday happen the most thefts?


In [44]:
# TODO group by Weekday

In [18]:
# Plot group (barh)


Exercise 2.3

  • Add a further column Hour (which is the hour of Date) to the mvt-dataframe.
  • make a line-plot - 1 line per weekday - of number of thefts (y-axis) over the Hours (x-axis).
    The plot should look similar to:

Note
To be able to plot without looping, you may want to

  1. groupby(['Weekday','Hour']) and aggregate size
  2. reset_index of the aggregate and pivot or unstack, so that index are the 24 Hours, the columns are the Weekdays and the values are the counts. Please check the

    The pivot-table should look similar to:

Weekday Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Hour
0 76 .. .. .. .. .. 234
1 126 .. .. .. .. .. ...

In [43]:
# TODO add additional column Hour

In [42]:
# TODO group by Weekday,Hour and aggregate size
# TODO reset index (for pivot only)
# TODO pivot (or unstack)

In [41]:
# make a line-plot: count over Hour by Weekday(==Groups)

Exercise 2.4

Draw a heatmap with seaborn.heatmap:
x-axis=Hours, y-axis=Weekday, fill=count

At which weekday/time are the most thefts committed?

Additional
Play with color palettes for parameter cmap


In [40]:
# TODO