GP04: Analyzing Thanksgiving Dinner

1: Introducing Thanksgiving Dinner Data


In [1]:
import pandas as pd

data = pd.read_csv("../data/GP04/thanksgiving.csv", encoding="Latin-1")
data.head()


Out[1]:
RespondentID Do you celebrate Thanksgiving? What is typically the main dish at your Thanksgiving dinner? What is typically the main dish at your Thanksgiving dinner? - Other (please specify) How is the main dish typically cooked? How is the main dish typically cooked? - Other (please specify) What kind of stuffing/dressing do you typically have? What kind of stuffing/dressing do you typically have? - Other (please specify) What type of cranberry saucedo you typically have? What type of cranberry saucedo you typically have? - Other (please specify) ... Have you ever tried to meet up with hometown friends on Thanksgiving night? Have you ever attended a "Friendsgiving?" Will you shop any Black Friday sales on Thanksgiving Day? Do you work in retail? Will you employer make you work on Black Friday? How would you describe where you live? Age What is your gender? How much total combined money did all members of your HOUSEHOLD earn last year? US Region
0 4337954960 Yes Turkey NaN Baked NaN Bread-based NaN None NaN ... Yes No No No NaN Suburban 18 - 29 Male $75,000 to $99,999 Middle Atlantic
1 4337951949 Yes Turkey NaN Baked NaN Bread-based NaN Other (please specify) Homemade cranberry gelatin ring ... No No Yes No NaN Rural 18 - 29 Female $50,000 to $74,999 East South Central
2 4337935621 Yes Turkey NaN Roasted NaN Rice-based NaN Homemade NaN ... Yes Yes Yes No NaN Suburban 18 - 29 Male $0 to $9,999 Mountain
3 4337933040 Yes Turkey NaN Baked NaN Bread-based NaN Homemade NaN ... Yes No No No NaN Urban 30 - 44 Male $200,000 and up Pacific
4 4337931983 Yes Tofurkey NaN Baked NaN Bread-based NaN Canned NaN ... Yes No No No NaN Urban 30 - 44 Male $100,000 to $124,999 Pacific

5 rows × 65 columns

2: Filtering Out Rows From A DataFrame


In [2]:
data["Do you celebrate Thanksgiving?"].value_counts()


Out[2]:
Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [3]:
data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]

3: Using Value_counts To Explore Main Dishes


In [4]:
data["What is typically the main dish at your Thanksgiving dinner?"].value_counts()


Out[4]:
Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [5]:
data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]["Do you typically have gravy?"]


Out[5]:
4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

4: Figuring Out What Pies People Eat


In [6]:
data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].value_counts()


Out[6]:
Apple    514
Name: Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple, dtype: int64

In [7]:
ate_pies = (pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])
&
pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])
 &
 pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])
)

ate_pies.value_counts()


Out[7]:
False    876
True     104
dtype: int64

5: Converting Age To Numeric


In [8]:
data["Age"].value_counts()


Out[8]:
45 - 59    269
60+        258
30 - 44    235
18 - 29    185
Name: Age, dtype: int64

In [9]:
def extract_age(age_str):
    if pd.isnull(age_str):
        return None
    age_str = age_str.split(" ")[0]
    age_str = age_str.replace("+", "")
    return int(age_str)

data["int_age"] = data["Age"].apply(extract_age)
data["int_age"].describe()


Out[9]:
count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

6: Converting Income To Numeric


In [10]:
data["How much total combined money did all members of your HOUSEHOLD earn last year?"].value_counts()


Out[10]:
$25,000 to $49,999      166
$50,000 to $74,999      127
$75,000 to $99,999      127
Prefer not to answer    118
$100,000 to $124,999    109
$200,000 and up          76
$10,000 to $24,999       60
$0 to $9,999             52
$125,000 to $149,999     48
$150,000 to $174,999     38
$175,000 to $199,999     26
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64

In [11]:
def extract_income(income_str):
    if pd.isnull(income_str):
        return None
    income_str = income_str.split(" ")[0]
    if income_str == "Prefer":
        return None
    income_str = income_str.replace(",", "")
    income_str = income_str.replace("$", "")
    return int(income_str)

data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(extract_income)
data["int_income"].describe()


Out[11]:
count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

7: Correlating Travel Distance And Income


In [12]:
data[data["int_income"] < 50000]["How far will you travel for Thanksgiving?"].value_counts()


Out[12]:
Thanksgiving is happening at my home--I won't travel at all                         106
Thanksgiving is local--it will take place in the town I live in                      92
Thanksgiving is out of town but not too far--it's a drive of a few hours or less     64
Thanksgiving is out of town and far away--I have to drive several hours or fly       16
Name: How far will you travel for Thanksgiving?, dtype: int64

In [13]:
data[data["int_income"] > 150000]["How far will you travel for Thanksgiving?"].value_counts()


Out[13]:
Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

8: Linking Friendship And Age


In [14]:
data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", 
    columns='Have you ever attended a "Friendsgiving?"',
    values="int_age"
)


Out[14]:
Have you ever attended a "Friendsgiving?" No Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?
No 42.283702 37.010526
Yes 41.475410 33.976744

In [15]:
data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", 
    columns='Have you ever attended a "Friendsgiving?"',
    values="int_income"
)


Out[15]:
Have you ever attended a "Friendsgiving?" No Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?
No 78914.549654 72894.736842
Yes 78750.000000 66019.736842

In [ ]: