The data about motor vehicle thefts in chicago are taken from the Chicago Data Portal
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [51]:
mvt = pd.read_csv('../../data/mvt2015_2017.csv')
mvt.head(3)
Out[51]:
In [48]:
# TODO how many entries
In [49]:
# TODO different Location Descriptions
Use Dataframe/Series.plot to plot the histogram (barchart) horizontally.
Note:
DataFrame.plot.hist() can only work with numerical x-values. If you don't want to convert the 'Location Descriptions' to a numeric value, you may want to use Series.value_counts().plot instead.
In [5]:
# TODO
Out[5]:
Install seaborn:
conda install seaborn
Use the Countplot to show the frequencies of thefts at the single location (descriptions).
Plot the top 10 hot spots horizontally.
See also Plotting with categorical data for further information about how to show categorical data with seaborn.
Hint
value_counts() to find the top10 hot spotssns.countplot-order parameter to the top10 hot spots
In [47]:
import seaborn as sns
# TODO your code goes here
Date to pandas.DateTime,Date as index andWeekday (which is the weekday of Date), which is an ordered category (['Monday' , 'Tuesday'...).Note:
Defining the weekday as categorical is required to get the plots in the further exercises ordered by weekday.
In [45]:
# TODO Date to pandas.DateTime
In [8]:
# TODO Date as index
In [9]:
# TODO WeekDay as new ordered categorical column
In [44]:
# TODO group by Weekday
In [18]:
# Plot group (barh)
Hour (which is the hour of Date) to the mvt-dataframe.Note
To be able to plot without looping, you may want to
groupby(['Weekday','Hour']) and aggregate sizereset_index of the aggregate and pivot or unstack, so that index are the 24 Hours, the columns are the Weekdays and the values are the counts.
Please check the
pivot works orunstack works.The pivot-table should look similar to:
| Weekday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
|---|---|---|---|---|---|---|---|
| Hour | |||||||
| 0 | 76 | .. | .. | .. | .. | .. | 234 |
| 1 | 126 | .. | .. | .. | .. | .. | ... |
In [43]:
# TODO add additional column Hour
In [42]:
# TODO group by Weekday,Hour and aggregate size
# TODO reset index (for pivot only)
# TODO pivot (or unstack)
In [41]:
# make a line-plot: count over Hour by Weekday(==Groups)
Draw a heatmap with seaborn.heatmap:
x-axis=Hours, y-axis=Weekday, fill=count
At which weekday/time are the most thefts committed?
Additional
Play with color palettes for parameter cmap
In [40]:
# TODO