Funding
- - Instructions / Notes:
- This dataset shows how much funding the state gave to individuals and tracks individuals' age, gender, and ethnicity.
  - The dataset above shows how much funding (i.e., 'Expenditures' column) the state gave to individuals (for training purposes), including also individuals' age, gender, and ethnicity.
Discrimination by Ethnicity
- Clue
  - Pandas supports grouping by multiple columns: https://stackoverflow.com/questions/17679089/pandas-dataframe-groupby-two-columns-and-get-counts

Funding

Instructions / Notes:

Read these carefully

Read and execute each cell in order, without skipping forward
You may create new Jupyter notebook cells to use for e.g. testing, debugging, exploring, etc.- this is encouraged in fact!- just make sure that your final answer dataframes and answers use the set variables outlined below
Have fun!

This dataset shows how much funding the state gave to individuals and tracks individuals' age, gender, and ethnicity.



In [1]:

    
# Run the following to import necessary packages and import dataset
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
datafile = "dataset/funding.csv"
df = pd.read_csv(datafile)
df.drop('Dummy', axis=1, inplace=True)
df.head(n=5)   # Print n number of rows from top of dataset









    Out[1]:







  
    
      
      ID
      Age
      Gender
      Expenditures
      Ethnicity
    
  
  
    
      0
      10210
      13-17
      Female
      2113
      White not Hispanic
    
    
      1
      10409
      22-50
      Male
      41924
      White not Hispanic
    
    
      2
      10486
      0 - 5
      Male
      1454
      Hispanic
    
    
      3
      10538
      18-21
      Female
      6400
      Hispanic
    
    
      4
      10568
      13-17
      Male
      4412
      White not Hispanic



In [9]:

    
ls = df['Age'].tolist()



In [20]:

    
df.describe(include='all')









    Out[20]:







  
    
      
      ID
      Age
      Gender
      Expenditures
      Ethnicity
    
  
  
    
      count
      1000.000000
      1000
      1000
      1000.000000
      1000
    
    
      unique
      NaN
      6
      2
      NaN
      8
    
    
      top
      NaN
      22-50
      Female
      NaN
      White not Hispanic
    
    
      freq
      NaN
      226
      503
      NaN
      401
    
    
      mean
      54662.846000
      NaN
      NaN
      18065.786000
      NaN
    
    
      std
      25643.673401
      NaN
      NaN
      19542.830884
      NaN
    
    
      min
      10210.000000
      NaN
      NaN
      222.000000
      NaN
    
    
      25%
      31808.750000
      NaN
      NaN
      2898.750000
      NaN
    
    
      50%
      55384.500000
      NaN
      NaN
      7026.000000
      NaN
    
    
      75%
      76134.750000
      NaN
      NaN
      37712.750000
      NaN
    
    
      max
      99898.000000
      NaN
      NaN
      75098.000000
      NaN

The dataset above shows how much funding (i.e., 'Expenditures' column) the state gave to individuals (for training purposes), including also individuals' age, gender, and ethnicity.



In [24]:

    
# Example dataframe query showing there is no discrimination by gender.
df.groupby(['Gender'], sort=True).agg({'Expenditures': [np.mean]})









    Out[24]:







  
    
      
      Expenditures
    
    
      
      mean
    
    
      Gender
      
    
  
  
    
      Female
      18129.606362
    
    
      Male
      18001.195171

Discrimination by Ethnicity

Analyze the data set and determine whether or not discrimination among Hispanic and White but not Hispanic groups exists by examining the Expenditures. Feel free to use the dataframes defined in the cell below.



In [21]:

    
w = "White not Hispanic"
h = "Hispanic"
is_hispanic = df['Ethnicity'] == h
is_white = df['Ethnicity'] == w
df1 = df[is_hispanic | is_white] # filters by two ethnicity groups
dfh = df[is_hispanic]
dfw = df[is_white]
df1.head(5)









    Out[21]:







  
    
      
      ID
      Age
      Gender
      Expenditures
      Ethnicity
    
  
  
    
      0
      10210
      13-17
      Female
      2113
      White not Hispanic
    
    
      1
      10409
      22-50
      Male
      41924
      White not Hispanic
    
    
      2
      10486
      0 - 5
      Male
      1454
      Hispanic
    
    
      3
      10538
      18-21
      Female
      6400
      Hispanic
    
    
      4
      10568
      13-17
      Male
      4412
      White not Hispanic



In [38]:

    
df1.groupby(['Ethnicity', 'Age']).agg({'Expenditures': [np.mean]})









    Out[38]:







  
    
      
      
      Expenditures
    
    
      
      
      mean
    
    
      Ethnicity
      Age
      
    
  
  
    
      Hispanic
      0 - 5
      1393.204545
    
    
      51 +
      55585.000000
    
    
      13-17
      3955.281553
    
    
      18-21
      9959.846154
    
    
      22-50
      40924.116279
    
    
      6-12
      2312.186813
    
    
      White not Hispanic
      0 - 5
      1366.900000
    
    
      51 +
      52670.424242
    
    
      13-17
      3904.358209
    
    
      18-21
      10133.057971
    
    
      22-50
      40187.624060
    
    
      6-12
      2052.260870



In [ ]:

    
# Write your query below and set `df_answer' to the dataframe
df_answer = None
print(df_answer)

After analyzing this dataset, was there discrimination in the expenditures across different ethnicities?



In [ ]:

    
# Write answer below by setting discrimination to True or False
discrimination = None

Clue

Pandas supports grouping by multiple columns: https://stackoverflow.com/questions/17679089/pandas-dataframe-groupby-two-columns-and-get-counts

If this clue changes your answer, try again below. Otherwise, if you are confident in your answer above, leave the following untouched.



In [ ]:

    
df_answer_clue = None
print(df_answer_clue)



In [ ]:

    
discrimination_clue = None

	ID	Age	Gender	Expenditures	Ethnicity
0	10210	13-17	Female	2113	White not Hispanic
1	10409	22-50	Male	41924	White not Hispanic
2	10486	0 - 5	Male	1454	Hispanic
3	10538	18-21	Female	6400	Hispanic
4	10568	13-17	Male	4412	White not Hispanic

	ID	Age	Gender	Expenditures	Ethnicity
count	1000.000000	1000	1000	1000.000000	1000
unique	NaN	6	2	NaN	8
top	NaN	22-50	Female	NaN	White not Hispanic
freq	NaN	226	503	NaN	401
mean	54662.846000	NaN	NaN	18065.786000	NaN
std	25643.673401	NaN	NaN	19542.830884	NaN
min	10210.000000	NaN	NaN	222.000000	NaN
25%	31808.750000	NaN	NaN	2898.750000	NaN
50%	55384.500000	NaN	NaN	7026.000000	NaN
75%	76134.750000	NaN	NaN	37712.750000	NaN
max	99898.000000	NaN	NaN	75098.000000	NaN

		Expenditures
		mean
Ethnicity	Age
Hispanic	0 - 5	1393.204545
	51 +	55585.000000
	13-17	3955.281553
	18-21	9959.846154
	22-50	40924.116279
	6-12	2312.186813
White not Hispanic	0 - 5	1366.900000
	51 +	52670.424242
	13-17	3904.358209
	18-21	10133.057971
	22-50	40187.624060
	6-12	2052.260870

Table of Contents