One nice thing about Portland is the wealth of data available through CivicApps. For this example, let's check out some Crime Data (though a modified version).
In [ ]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg')
The data is provided in CSV format, which looks something like this:
In [ ]:
df = pd.read_csv('civicapps.crime_incident_data_2011.csv')
df.head(2)
First, let's do a simple count of the most common types of crime. Thankfully for those of us who live here "taking or messing with people's stuff" are the top contenders.
In [ ]:
offenses = df.groupby('Major Offense Type').count()['Record ID'].copy()
offenses.sort(ascending=True)
offenses.plot(kind='barh')
plt.title("Top Portland Offenses in 2012")
Next, let's index the data based on where the crimes occured. (This would be probably better mixed in with per capita or neighborhood area data, but it's a reasonable place to start looking.) There are many neighborhoods in the data set, so we'll just look at the top 20 for now.
In [ ]:
hoods = df.groupby('Neighborhood').count()['Record ID'].copy()
hoods.sort(ascending=False)
hoods = hoods[0:20]
tophoods = set(hoods.index)
hoods.sort(ascending=True)
hoods.plot(kind='barh')
plt.title("Top 20 Portland Crime Hoods in 2011")
In [ ]:
df['rough_hood'] = df.Neighborhood.apply(lambda x: x in tophoods)
My neighborhood isn't on this list; out of curiosity, which types of crimes are common in my backyard? I admit being relieved not to see homicide, kidnap, rape or arson on the list!
In [ ]:
bridlemile = df[df['Neighborhood'] == 'BRIDLEMILE']
offenses = bridlemile.groupby('Major Offense Type').count()['Record ID'].copy()
offenses.sort(ascending=True)
offenses.plot(kind='barh')
plt.title("Top Bridlemile Offenses in 2011")
Finally for now, we can take a look at the crimes correlated to the neighborhoods they occured in. The heatmap is a useful tool for this. (Again, limiting the neighborhoods to the top 20 most common to keep the chart legible.) It's interesting to note 'hot spots'.
In [ ]:
by_neighborhood = pd.pivot_table(df[df['rough_hood'] == True], rows=['Major Offense Type'], columns=['Neighborhood'], aggfunc='count')
by_neighborhood = by_neighborhood['Record ID'].fillna(0).sort('DOWNTOWN', ascending=False)
#by_neighborhood
In [ ]:
def k_izer(num):
if num >= 1000:
return "{:.1f}k".format(num/1000.0)
else:
return "{:d}".format(int(num))
fig, ax = plt.subplots()
heatmap = ax.pcolor(by_neighborhood, cmap=plt.cm.Blues, alpha=0.8)
fig = plt.gcf()
fig.set_size_inches(10,10)
ax.set_frame_on(False)
ax.set_yticks(np.arange(by_neighborhood.shape[0])+0.5, minor=False)
ax.set_xticks(np.arange(by_neighborhood.shape[1])+0.5, minor=False)
ax.invert_yaxis()
ax.xaxis.tick_top()
plt.xticks(rotation=90)
ax.grid(False)
ax = plt.gca()
ax.set_xticklabels(by_neighborhood.columns, minor=False)
ax.set_yticklabels(by_neighborhood.index, minor=False)
for t in ax.xaxis.get_major_ticks():
t.tick1On = False
t.tick2On = False
for t in ax.yaxis.get_major_ticks():
t.tick1On = False
t.tick2On = False
for y in range(by_neighborhood.shape[0]):
for x in range(by_neighborhood.shape[1]):
plt.text(x + 0.5, y + 0.5, k_izer(by_neighborhood.iloc[y, x]),
horizontalalignment='center',
verticalalignment='center',
)
plt.colorbar(heatmap)