Theme

Computer are good at consuming, producing and processing data
Humans are good at consuming, producing and processing stories
For data to be useful to humans, we need tools for telling stories that involve code and data

Illustration

Data and code


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

In [2]:
df = pd.read_csv('data/police_locals.csv')
df = df.replace('**',np.nan)
df['all'] = df['all'].astype('float')
df['white'] = df['white'].astype('float')
df['non-white'] = df['non-white'].astype('float')
df['black'] = df['black'].astype('float')
df['hispanic'] = df['hispanic'].astype('float')

In [3]:
df.head()


Out[3]:
city police_force_size all white non-white black hispanic
0 New York 32300 0.617957 0.446387 0.764419 0.770891 0.762861
1 Chicago 12120 0.875000 0.871963 0.877400 0.897406 0.839827
2 Los Angeles 10100 0.228218 0.152778 0.263848 0.387387 0.217680
3 Washington 9340 0.115632 0.056774 0.157365 0.170189 0.089888
4 Houston 7700 0.292208 0.173735 0.399258 0.366379 0.457143

In [4]:
len(df)


Out[4]:
76

In [5]:
df[['white','non-white']].describe()


Out[5]:
white non-white
count 76.000000 76.000000
mean 0.348386 0.486143
std 0.240522 0.252310
min 0.026667 0.076923
25% 0.151914 0.302225
50% 0.293472 0.473922
75% 0.502404 0.669569
max 0.962963 0.956522

In [6]:
sns.set_context("talk")
sns.set_style("whitegrid")
sns.boxplot(df[['white','non-white']]);


Story

This data comes from an article published on 538 on August 20 entitled, Most Police Don’t Live In The Cities They Serve. From the article:

In Ferguson, Missouri, where protests continue following the shooting of a teenager by a police officer this month, more than two-thirds of the civilian population is black. Only 11 percent of the police force is. The racial disparity is troubling enough on its own, but it’s also suggestive of another type of misrepresentation. Given Ferguson’s racial gap, it’s likely that many of its police officers live outside city limits.

The above dataset, published by 538 on this GitHub repository contains the fractions of police officers who live within the city they serve, for different races of police officers for 75 cities across the U.S. Now we can develop a story around that data:

  • On average, only a very small fraction ($\approx 35\%$) of white police officers live in the cities they serve.
  • On average, non-white police officers ($\approx 49\%$) are more likely to live in the cities they serve.
  • There is a significant spread ($\sigma\approx 25\%$) across individual cities.
  • For issues of racial justice, it could be useful to have more police officers live in the cities they serve.
Without the story, the data doesn't mean much to us.