Funding

Instructions / Notes:

Read these carefully

  • Read and execute each cell in order, without skipping forward
  • You may create new Jupyter notebook cells to use for e.g. testing, debugging, exploring, etc.- this is encouraged in fact!- just make sure that your final answer dataframes and answers use the set variables outlined below
  • Have fun!

This dataset shows how much funding the state gave to individuals and tracks individuals' age, gender, and ethnicity.


In [1]:
# Run the following to import necessary packages and import dataset
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
datafile = "dataset/funding.csv"
df = pd.read_csv(datafile)
df.drop('Dummy', axis=1, inplace=True)
df.head(n=5)   # Print n number of rows from top of dataset


Out[1]:
ID Age Gender Expenditures Ethnicity
0 10210 13-17 Female 2113 White not Hispanic
1 10409 22-50 Male 41924 White not Hispanic
2 10486 0 - 5 Male 1454 Hispanic
3 10538 18-21 Female 6400 Hispanic
4 10568 13-17 Male 4412 White not Hispanic

In [9]:
ls = df['Age'].tolist()

In [20]:
df.describe(include='all')


Out[20]:
ID Age Gender Expenditures Ethnicity
count 1000.000000 1000 1000 1000.000000 1000
unique NaN 6 2 NaN 8
top NaN 22-50 Female NaN White not Hispanic
freq NaN 226 503 NaN 401
mean 54662.846000 NaN NaN 18065.786000 NaN
std 25643.673401 NaN NaN 19542.830884 NaN
min 10210.000000 NaN NaN 222.000000 NaN
25% 31808.750000 NaN NaN 2898.750000 NaN
50% 55384.500000 NaN NaN 7026.000000 NaN
75% 76134.750000 NaN NaN 37712.750000 NaN
max 99898.000000 NaN NaN 75098.000000 NaN

The dataset above shows how much funding (i.e., 'Expenditures' column) the state gave to individuals (for training purposes), including also individuals' age, gender, and ethnicity.


In [24]:
# Example dataframe query showing there is no discrimination by gender.
df.groupby(['Gender'], sort=True).agg({'Expenditures': [np.mean]})


Out[24]:
Expenditures
mean
Gender
Female 18129.606362
Male 18001.195171

Discrimination by Ethnicity

Analyze the data set and determine whether or not discrimination among Hispanic and White but not Hispanic groups exists by examining the Expenditures. Feel free to use the dataframes defined in the cell below.


In [21]:
w = "White not Hispanic"
h = "Hispanic"
is_hispanic = df['Ethnicity'] == h
is_white = df['Ethnicity'] == w
df1 = df[is_hispanic | is_white] # filters by two ethnicity groups
dfh = df[is_hispanic]
dfw = df[is_white]
df1.head(5)


Out[21]:
ID Age Gender Expenditures Ethnicity
0 10210 13-17 Female 2113 White not Hispanic
1 10409 22-50 Male 41924 White not Hispanic
2 10486 0 - 5 Male 1454 Hispanic
3 10538 18-21 Female 6400 Hispanic
4 10568 13-17 Male 4412 White not Hispanic

In [38]:
df1.groupby(['Ethnicity', 'Age']).agg({'Expenditures': [np.mean]})


Out[38]:
Expenditures
mean
Ethnicity Age
Hispanic 0 - 5 1393.204545
51 + 55585.000000
13-17 3955.281553
18-21 9959.846154
22-50 40924.116279
6-12 2312.186813
White not Hispanic 0 - 5 1366.900000
51 + 52670.424242
13-17 3904.358209
18-21 10133.057971
22-50 40187.624060
6-12 2052.260870

In [ ]:
# Write your query below and set `df_answer' to the dataframe
df_answer = None
print(df_answer)

After analyzing this dataset, was there discrimination in the expenditures across different ethnicities?


In [ ]:
# Write answer below by setting discrimination to True or False
discrimination = None

Clue

Pandas supports grouping by multiple columns: https://stackoverflow.com/questions/17679089/pandas-dataframe-groupby-two-columns-and-get-counts

If this clue changes your answer, try again below. Otherwise, if you are confident in your answer above, leave the following untouched.


In [ ]:
df_answer_clue = None
print(df_answer_clue)

In [ ]:
discrimination_clue = None