You are currently looking at version 1.1 of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the Jupyter Notebook FAQ course resource.



In [1]:

    
import pandas as pd
import numpy as np
from scipy.stats import ttest_ind

Assignment 4 - Hypothesis Testing

This assignment requires more individual learning than previous assignments - you are encouraged to check out the pandas documentation to find functions or methods you might not have used yet, or ask questions on Stack Overflow and tag them as pandas and python related. And of course, the discussion forums are open for interaction with your peers and the course staff.

Definitions:

A quarter is a specific three month period, Q1 is January through March, Q2 is April through June, Q3 is July through September, Q4 is October through December.
A recession is defined as starting with two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth.
A recession bottom is the quarter within a recession which had the lowest GDP.
A university town is a city which has a high percentage of university students compared to the total population of the city.

Hypothesis: University towns have their mean housing prices less effected by recessions. Run a t-test to compare the ratio of the mean price of houses in university towns the quarter before the recession starts compared to the recession bottom. (price_ratio=quarter_before_recession/recession_bottom)

The following data files are available for this assignment:

From the Zillow research data site there is housing data for the United States. In particular the datafile for all homes at a city level, City_Zhvi_AllHomes.csv, has median home sale prices at a fine grained level.
From the Wikipedia page on college towns is a list of university towns in the United States which has been copy and pasted into the file university_towns.txt.
From Bureau of Economic Analysis, US Department of Commerce, the GDP over time of the United States in current dollars (use the chained value in 2009 dollars), in quarterly intervals, in the file gdplev.xls. For this assignment, only look at GDP data from the first quarter of 2000 onward.

Each function in this assignment below is worth 10%, with the exception of run_ttest(), which is worth 50%.



In [2]:

    
# Use this dictionary to map state names to two letter acronyms
states = {'OH': 'Ohio', 'KY': 'Kentucky', 'AS': 'American Samoa', 'NV': 'Nevada', 'WY': 'Wyoming', 'NA': 'National', 'AL': 'Alabama', 'MD': 'Maryland', 'AK': 'Alaska', 'UT': 'Utah', 'OR': 'Oregon', 'MT': 'Montana', 'IL': 'Illinois', 'TN': 'Tennessee', 'DC': 'District of Columbia', 'VT': 'Vermont', 'ID': 'Idaho', 'AR': 'Arkansas', 'ME': 'Maine', 'WA': 'Washington', 'HI': 'Hawaii', 'WI': 'Wisconsin', 'MI': 'Michigan', 'IN': 'Indiana', 'NJ': 'New Jersey', 'AZ': 'Arizona', 'GU': 'Guam', 'MS': 'Mississippi', 'PR': 'Puerto Rico', 'NC': 'North Carolina', 'TX': 'Texas', 'SD': 'South Dakota', 'MP': 'Northern Mariana Islands', 'IA': 'Iowa', 'MO': 'Missouri', 'CT': 'Connecticut', 'WV': 'West Virginia', 'SC': 'South Carolina', 'LA': 'Louisiana', 'KS': 'Kansas', 'NY': 'New York', 'NE': 'Nebraska', 'OK': 'Oklahoma', 'FL': 'Florida', 'CA': 'California', 'CO': 'Colorado', 'PA': 'Pennsylvania', 'DE': 'Delaware', 'NM': 'New Mexico', 'RI': 'Rhode Island', 'MN': 'Minnesota', 'VI': 'Virgin Islands', 'NH': 'New Hampshire', 'MA': 'Massachusetts', 'GA': 'Georgia', 'ND': 'North Dakota', 'VA': 'Virginia'}



In [3]:

    
df_all_homes = pd.read_csv("City_Zhvi_AllHomes.csv")



In [4]:

    
df_all_homes.head()









    Out[4]:






  
    
      
      RegionID
      RegionName
      State
      Metro
      CountyName
      SizeRank
      1996-04
      1996-05
      1996-06
      1996-07
      ...
      2015-11
      2015-12
      2016-01
      2016-02
      2016-03
      2016-04
      2016-05
      2016-06
      2016-07
      2016-08
    
  
  
    
      0
      6181
      New York
      NY
      New York
      Queens
      1
      NaN
      NaN
      NaN
      NaN
      ...
      573600
      576200
      578400
      582200
      588000
      592200
      592500
      590200
      588000
      586400
    
    
      1
      12447
      Los Angeles
      CA
      Los Angeles-Long Beach-Anaheim
      Los Angeles
      2
      155000.0
      154600.0
      154400.0
      154200.0
      ...
      558200
      560800
      562800
      565600
      569700
      574000
      577800
      580600
      583000
      585100
    
    
      2
      17426
      Chicago
      IL
      Chicago
      Cook
      3
      109700.0
      109400.0
      109300.0
      109300.0
      ...
      207800
      206900
      206200
      205800
      206200
      207300
      208200
      209100
      211000
      213000
    
    
      3
      13271
      Philadelphia
      PA
      Philadelphia
      Philadelphia
      4
      50000.0
      49900.0
      49600.0
      49400.0
      ...
      122300
      121600
      121800
      123300
      125200
      126400
      127000
      127400
      128300
      129100
    
    
      4
      40326
      Phoenix
      AZ
      Phoenix
      Maricopa
      5
      87200.0
      87700.0
      88200.0
      88400.0
      ...
      183800
      185300
      186600
      188000
      189100
      190200
      191300
      192800
      194500
      195900
    
  

5 rows × 251 columns



In [5]:

    
def get_list_of_university_towns():
    '''Returns a DataFrame of towns and the states they are in from the 
    university_towns.txt list. The format of the DataFrame should be:
    DataFrame( [ ["Michigan", "Ann Arbor"], ["Michigan", "Yipsilanti"] ], 
    columns=["State", "RegionName"]  )
    
    The following cleaning needs to be done:

    1. For "State", removing characters from "[" to the end.
    2. For "RegionName", when applicable, removing every character from " (" to the end.
    3. Depending on how you read the data, you may need to remove newline character '\n'. '''
    
    towns = open("university_towns.txt", "r")
    lines = towns.read().split("\n")
    
    # Elements having string "[edit]" are states.
    state_indicator = "[edit]"
    state = ""
    states_regions = []
    
    for element in lines:
        if state_indicator in element:
            # State
            state = element.split("[")[0].rstrip()
        else:
            # Region
            states_regions.append({"State" : state, "RegionName" : element.split("(")[0].rstrip()})
            
    df = pd.DataFrame.from_dict(states_regions)
    # Drop last row since it has only state without region
    df = df.drop(df.index[len(df)-1])
    
    # Moving State as a first column. This is done only for automatic grader that expects
    # this order of columns
    cols = df.columns.tolist()
    cols = cols[-1:] + cols[:-1]
    df = df[cols]
    
    return df

get_list_of_university_towns()









    Out[5]:






  
    
      
      State
      RegionName
    
  
  
    
      0
      Alabama
      Auburn
    
    
      1
      Alabama
      Florence
    
    
      2
      Alabama
      Jacksonville
    
    
      3
      Alabama
      Livingston
    
    
      4
      Alabama
      Montevallo
    
    
      5
      Alabama
      Troy
    
    
      6
      Alabama
      Tuscaloosa
    
    
      7
      Alabama
      Tuskegee
    
    
      8
      Alaska
      Fairbanks
    
    
      9
      Arizona
      Flagstaff
    
    
      10
      Arizona
      Tempe
    
    
      11
      Arizona
      Tucson
    
    
      12
      Arkansas
      Arkadelphia
    
    
      13
      Arkansas
      Conway
    
    
      14
      Arkansas
      Fayetteville
    
    
      15
      Arkansas
      Jonesboro
    
    
      16
      Arkansas
      Magnolia
    
    
      17
      Arkansas
      Monticello
    
    
      18
      Arkansas
      Russellville
    
    
      19
      Arkansas
      Searcy
    
    
      20
      California
      Angwin
    
    
      21
      California
      Arcata
    
    
      22
      California
      Berkeley
    
    
      23
      California
      Chico
    
    
      24
      California
      Claremont
    
    
      25
      California
      Cotati
    
    
      26
      California
      Davis
    
    
      27
      California
      Irvine
    
    
      28
      California
      Isla Vista
    
    
      29
      California
      University Park, Los Angeles
    
    
      ...
      ...
      ...
    
    
      487
      Virginia
      Wise
    
    
      488
      Virginia
      Chesapeake
    
    
      489
      Washington
      Bellingham
    
    
      490
      Washington
      Cheney
    
    
      491
      Washington
      Ellensburg
    
    
      492
      Washington
      Pullman
    
    
      493
      Washington
      University District, Seattle
    
    
      494
      West Virginia
      Athens
    
    
      495
      West Virginia
      Buckhannon
    
    
      496
      West Virginia
      Fairmont
    
    
      497
      West Virginia
      Glenville
    
    
      498
      West Virginia
      Huntington
    
    
      499
      West Virginia
      Montgomery
    
    
      500
      West Virginia
      Morgantown
    
    
      501
      West Virginia
      Shepherdstown
    
    
      502
      West Virginia
      West Liberty
    
    
      503
      Wisconsin
      Appleton
    
    
      504
      Wisconsin
      Eau Claire
    
    
      505
      Wisconsin
      Green Bay
    
    
      506
      Wisconsin
      La Crosse
    
    
      507
      Wisconsin
      Madison
    
    
      508
      Wisconsin
      Menomonie
    
    
      509
      Wisconsin
      Milwaukee
    
    
      510
      Wisconsin
      Oshkosh
    
    
      511
      Wisconsin
      Platteville
    
    
      512
      Wisconsin
      River Falls
    
    
      513
      Wisconsin
      Stevens Point
    
    
      514
      Wisconsin
      Waukesha
    
    
      515
      Wisconsin
      Whitewater
    
    
      516
      Wyoming
      Laramie
    
  

517 rows × 2 columns



In [6]:

    
# Get GDP info: Drop unnessary header rows
df_gdp = pd.read_excel("gdplev.xls", header=6)
# Take only quarterly info
df_gdp = df_gdp.ix[:, 4:7]
df_gdp.columns = ["Quarter", "GDP current", "GDP chained"]
df_gdp.head()









    Out[6]:






  
    
      
      Quarter
      GDP current
      GDP chained
    
  
  
    
      0
      1947q1
      243.1
      1934.5
    
    
      1
      1947q2
      246.3
      1932.3
    
    
      2
      1947q3
      250.1
      1930.3
    
    
      3
      1947q4
      260.3
      1960.7
    
    
      4
      1948q1
      266.2
      1989.5



In [7]:

    
# Find GPD changes between quarters
df_gdp["Diff"] = df_gdp["GDP chained"].diff()
df_gdp.head()









    Out[7]:






  
    
      
      Quarter
      GDP current
      GDP chained
      Diff
    
  
  
    
      0
      1947q1
      243.1
      1934.5
      NaN
    
    
      1
      1947q2
      246.3
      1932.3
      -2.2
    
    
      2
      1947q3
      250.1
      1930.3
      -2.0
    
    
      3
      1947q4
      260.3
      1960.7
      30.4
    
    
      4
      1948q1
      266.2
      1989.5
      28.8



In [8]:

    
# Mark increase (1) or decrease (0) in GDP
df_gdp["Change"] = np.where(df_gdp["Diff"] < 0, "0", "1")
# Now drop years before 2000 since "For this assignment, only look at GDP data from the first quarter of 2000 onward"
df_gdp = df_gdp[df_gdp.Quarter >= "2000q1"]
df_gdp.head()









    Out[8]:






  
    
      
      Quarter
      GDP current
      GDP chained
      Diff
      Change
    
  
  
    
      212
      2000q1
      10031.0
      12359.1
      35.8
      1
    
    
      213
      2000q2
      10278.3
      12592.5
      233.4
      1
    
    
      214
      2000q3
      10357.4
      12607.7
      15.2
      1
    
    
      215
      2000q4
      10472.3
      12679.3
      71.6
      1
    
    
      216
      2001q1
      10508.1
      12643.3
      -36.0
      0



In [9]:

    
# Find 2 consecutive declines in GDP. Converting increase/decline info as string 
# so its easy to find consecutive declines/increases
change_str = ""
change_str = change_str.join(df_gdp["Change"].values)
change_str









    Out[9]:





'111101011111111111111111111111110100001111110111111111110111111111'



In [10]:

    
# Find recession
recession_index = change_str.index("0011")



In [11]:

    
def get_recession_start():
    '''Returns the year and quarter of the recession start time as a 
    string value in a format such as 2005q3'''
        
    # Start of recession, find 1 before start patter '0011'
    start_recession_index = change_str.rindex("1", 0, recession_index) + 1
    return df_gdp.iloc[start_recession_index]["Quarter"]

get_recession_start()









    Out[11]:





'2008q3'



In [12]:

    
def get_recession_end():
    '''Returns the year and quarter of the recession end time as a 
    string value in a format such as 2005q3'''
    
    # End recession is index of pattern + len of pattern
    end_recession_index = recession_index + len("0011") - 1
    return df_gdp.iloc[end_recession_index]["Quarter"]

get_recession_end()









    Out[12]:





'2009q4'



In [13]:

    
def get_recession_bottom():
    '''Returns the year and quarter of the recession bottom time as a 
    string value in a format such as 2005q3'''
    
    recession_start = get_recession_start()
    recession_end = get_recession_end()
    
    df_recession = df_gdp[ (df_gdp.Quarter >= recession_start) & (df_gdp.Quarter <= recession_end) ]
    return df_recession.loc[df_recession["GDP chained"].argmin()]["Quarter"]

get_recession_bottom()









    Out[13]:





'2009q2'



In [14]:

    
def convert_housing_data_to_quarters():
    '''Converts the housing data to quarters and returns it as mean 
    values in a dataframe. This dataframe should be a dataframe with
    columns for 2000q1 through 2016q3, and should have a multi-index
    in the shape of ["State","RegionName"].
    
    Note: Quarters are defined in the assignment description, they are
    not arbitrary three month periods.
    
    The resulting dataframe should have 67 columns, and 10,730 rows.
    '''
    # A quarter is a specific three month period, 
    # - Q1 is January through March (1-3)
    # - Q2 is April through June (4-6)
    # - Q3 is July through September (7-9)
    # - Q4 is October through December. (10-12)
    
    # Drop columns for years not under inspection (year < 2000)
    cols = [c for c in df_all_homes.columns if c.lower()[:3] != '199']
    df_homes = df_all_homes[cols]    
    
    # Calculate mean for each quarter between 2000-2016
    quarters = [ ["01", "02", "03"], ["04", "05", "06"], ["07", "08", "09"], ["10", "11", "12"] ]
    
    for year in range(2000,2016):
        for q in range(0, 4):
            quarter_columns = ["{}-{}".format(year,quarters[q][0]), "{}-{}".format(year,quarters[q][1]), "{}-{}".format(year,quarters[q][2])]
            df_homes["{}q{}".format(year, q+1)] = df_homes[quarter_columns].mean(axis=1)
            
    # 2016 is not a full year so taking it separately
    df_homes["2016q1"] = df_homes[["2016-01", "2016-02", "2016-03"]].mean(axis=1)
    df_homes["2016q2"] = df_homes[["2016-04", "2016-05", "2016-06"]].mean(axis=1)
    df_homes["2016q3"] = df_homes[["2016-07", "2016-08"]].mean(axis=1)
    
    # Drop rest of the year-month columns
    cols = [c for c in df_homes.columns if c.lower()[4] != '-']
    df_homes = df_homes[cols]
    
    # Drop unnessary columns
    df_homes = df_homes.drop( ["RegionID", "Metro", "CountyName", "SizeRank"], axis=1 )
    
    # Convert acronym state names to full names
    df_homes["State"] = df_homes.apply(lambda row: states[row["State"]], axis=1)
    
    # Create indexes
    df_homes_indexed = df_homes.set_index(["State", "RegionName"])
    
    return df_homes_indexed

df_houses_quarters = convert_housing_data_to_quarters()









    



/opt/conda/lib/python3.5/site-packages/ipykernel/__main__.py:28: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [15]:

    
# Calculate recession impact on house prices
recession_start = get_recession_start()
recession_end = get_recession_end()
    
print("Recession start {} and end {}".format(recession_start, recession_end))
    
# Below line causes notebook to run out of memory. Using lambda instead.
#df_houses_quarters["Ratio"] =  df_houses_quarters[recession_start] / df_houses_quarters[recession_bottom]

df_houses_quarters["Ratio"] = df_houses_quarters.apply(lambda row: row[recession_start] / row [recession_end], axis=1)
df_houses_quarters.head()









    



Recession start 2008q3 and end 2009q4






    Out[15]:






  
    
      
      
      2000q1
      2000q2
      2000q3
      2000q4
      2001q1
      2001q2
      2001q3
      2001q4
      2002q1
      2002q2
      ...
      2014q3
      2014q4
      2015q1
      2015q2
      2015q3
      2015q4
      2016q1
      2016q2
      2016q3
      Ratio
    
    
      State
      RegionName
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      New York
      New York
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      522800.000000
      528066.666667
      532266.666667
      540800.000000
      557200.000000
      572833.333333
      582866.666667
      591633.333333
      587200.0
      1.090321
    
    
      California
      Los Angeles
      207066.666667
      214466.666667
      220966.666667
      226166.666667
      233000.000000
      239100.000000
      245066.666667
      253033.333333
      261966.666667
      272700.000000
      ...
      509066.666667
      518866.666667
      528800.000000
      538166.666667
      547266.666667
      557733.333333
      566033.333333
      577466.666667
      584050.0
      1.161171
    
    
      Illinois
      Chicago
      138400.000000
      143633.333333
      147866.666667
      152133.333333
      156933.333333
      161800.000000
      166400.000000
      170433.333333
      175500.000000
      177566.666667
      ...
      195766.666667
      201266.666667
      201066.666667
      206033.333333
      208300.000000
      207900.000000
      206066.666667
      208200.000000
      212000.0
      1.096063
    
    
      Pennsylvania
      Philadelphia
      53000.000000
      53633.333333
      54133.333333
      54700.000000
      55333.333333
      55533.333333
      56266.666667
      57533.333333
      59133.333333
      60733.333333
      ...
      115300.000000
      115666.666667
      116200.000000
      117966.666667
      121233.333333
      122200.000000
      123433.333333
      126933.333333
      128700.0
      0.986224
    
    
      Arizona
      Phoenix
      111833.333333
      114366.666667
      116000.000000
      117400.000000
      119600.000000
      121566.666667
      122700.000000
      124300.000000
      126533.333333
      128366.666667
      ...
      165366.666667
      168500.000000
      171533.333333
      174166.666667
      179066.666667
      183833.333333
      187900.000000
      191433.333333
      195200.0
      1.350604
    
  

5 rows × 68 columns



In [16]:

    
university_towns = get_list_of_university_towns()
university_towns = university_towns.set_index(["State", "RegionName"])
university_towns.head()









    Out[16]:






  
    
      
      
    
    
      State
      RegionName
    
  
  
    
      Alabama
      Auburn
    
    
      Florence
    
    
      Jacksonville
    
    
      Livingston
    
    
      Montevallo



In [17]:

    
# Get all university towns
df_univ_towns = df_houses_quarters.loc[list(university_towns.index)]
df_univ_towns = df_univ_towns.dropna()
df_univ_towns.head()









    Out[17]:






  
    
      
      
      2000q1
      2000q2
      2000q3
      2000q4
      2001q1
      2001q2
      2001q3
      2001q4
      2002q1
      2002q2
      ...
      2014q3
      2014q4
      2015q1
      2015q2
      2015q3
      2015q4
      2016q1
      2016q2
      2016q3
      Ratio
    
  
  
    
      Alabama
      Montevallo
      97000.0
      96800.000000
      96533.333333
      98333.333333
      99466.666667
      101333.333333
      103200.000000
      101866.666667
      100900.000000
      102700.000000
      ...
      120266.666667
      120000.000000
      119900.000000
      119466.666667
      119400.000000
      119833.333333
      121700.000000
      122400.000000
      121500.0
      1.025242
    
    
      Tuscaloosa
      104500.0
      106033.333333
      108500.000000
      108766.666667
      109900.000000
      111366.666667
      110000.000000
      110700.000000
      111900.000000
      111233.333333
      ...
      139566.666667
      142733.333333
      145000.000000
      144700.000000
      144266.666667
      145000.000000
      140300.000000
      138033.333333
      138650.0
      1.022461
    
    
      Alaska
      Fairbanks
      163200.0
      165033.333333
      169300.000000
      172800.000000
      164433.333333
      157800.000000
      158200.000000
      154666.666667
      152766.666667
      154533.333333
      ...
      207100.000000
      207000.000000
      207766.666667
      206466.666667
      208433.333333
      209466.666667
      209066.666667
      212933.333333
      215850.0
      1.115757
    
    
      Arizona
      Flagstaff
      161300.0
      164100.000000
      165466.666667
      168266.666667
      172000.000000
      175633.333333
      178966.666667
      180800.000000
      182066.666667
      184400.000000
      ...
      295066.666667
      295966.666667
      297633.333333
      302800.000000
      307333.333333
      310033.333333
      312266.666667
      318433.333333
      322650.0
      1.136033
    
    
      Tempe
      128100.0
      129933.333333
      131100.000000
      132800.000000
      135633.333333
      137900.000000
      139700.000000
      141833.333333
      144166.666667
      146566.666667
      ...
      213066.666667
      216866.666667
      220466.666667
      225900.000000
      231200.000000
      237800.000000
      241900.000000
      245000.000000
      246950.0
      1.206204
    
  

5 rows × 68 columns



In [18]:

    
# Get all non university towns
non_univ_index = set(df_houses_quarters.index) - set(df_univ_towns.index)
df_non_univ_towns = df_houses_quarters.loc[list(non_univ_index)]
df_non_univ_towns = df_non_univ_towns.dropna()
df_non_univ_towns.head()









    Out[18]:






  
    
      
      
      2000q1
      2000q2
      2000q3
      2000q4
      2001q1
      2001q2
      2001q3
      2001q4
      2002q1
      2002q2
      ...
      2014q3
      2014q4
      2015q1
      2015q2
      2015q3
      2015q4
      2016q1
      2016q2
      2016q3
      Ratio
    
    
      State
      RegionName
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Kentucky
      Briarwood
      101666.666667
      105766.666667
      108300.000000
      109733.333333
      109166.666667
      109266.666667
      108800.000000
      108700.000000
      111533.333333
      112066.666667
      ...
      129666.666667
      129600.000000
      129366.666667
      132200.000000
      131766.666667
      131400.000000
      133033.333333
      134466.666667
      134900.0
      1.001606
    
    
      Texas
      Baytown
      74133.333333
      73000.000000
      73600.000000
      74566.666667
      74400.000000
      73400.000000
      73766.666667
      74400.000000
      75566.666667
      77366.666667
      ...
      102433.333333
      104766.666667
      107566.666667
      110500.000000
      113533.333333
      116666.666667
      118933.333333
      122266.666667
      124200.0
      0.990712
    
    
      Pennsylvania
      Pulaski
      65266.666667
      66033.333333
      65666.666667
      65000.000000
      61300.000000
      60800.000000
      62466.666667
      64966.666667
      65933.333333
      67666.666667
      ...
      67533.333333
      64266.666667
      63300.000000
      64700.000000
      67466.666667
      71000.000000
      73900.000000
      78866.666667
      78200.0
      0.962736
    
    
      Illinois
      Schiller Park
      153400.000000
      157100.000000
      158933.333333
      162966.666667
      165933.333333
      172733.333333
      178266.666667
      186366.666667
      190200.000000
      189400.000000
      ...
      181700.000000
      182966.666667
      180700.000000
      179766.666667
      176433.333333
      177433.333333
      178133.333333
      182066.666667
      186950.0
      1.177983
    
    
      California
      Mentone
      126933.333333
      129900.000000
      132866.666667
      135766.666667
      139133.333333
      142066.666667
      146800.000000
      153700.000000
      159100.000000
      164866.666667
      ...
      234000.000000
      238866.666667
      244866.666667
      252266.666667
      261900.000000
      272000.000000
      276766.666667
      278766.666667
      280200.0
      1.191615
    
  

5 rows × 68 columns



In [19]:

    
def run_ttest():
    '''First creates new data showing the decline or growth of housing prices
    between the recession start and the recession bottom. Then runs a ttest
    comparing the university town values to the non-university towns values, 
    return whether the alternative hypothesis (that the two groups are the same)
    is true or not as well as the p-value of the confidence. 
    
    Return the tuple (different, p, better) where different=True if the t-test is
    True at a p<0.01 (we reject the null hypothesis), or different=False if 
    otherwise (we cannot reject the null hypothesis). The variable p should
    be equal to the exact p value returned from scipy.stats.ttest_ind(). The
    value for better should be either "university town" or "non-university town"
    depending on which has a lower mean price ratio (which is equivilent to a
    reduced market loss).'''
    statistic, pvalue = ttest_ind(df_univ_towns["Ratio"], df_non_univ_towns["Ratio"])
    
    different = pvalue < 0.01
    
    if df_univ_towns["Ratio"].mean() < df_non_univ_towns["Ratio"].mean():
        better = "university town"
    else:
        better = "non-university town"
        
    return different, pvalue, better

run_ttest()









    Out[19]:





(True, 0.0041517008594592966, 'university town')

	RegionID	RegionName	State	Metro	CountyName	SizeRank	1996-04	1996-05	1996-06	1996-07	...	2015-11	2015-12	2016-01	2016-02	2016-03	2016-04	2016-05	2016-06	2016-07	2016-08
0	6181	New York	NY	New York	Queens	1	NaN	NaN	NaN	NaN	...	573600	576200	578400	582200	588000	592200	592500	590200	588000	586400
1	12447	Los Angeles	CA	Los Angeles-Long Beach-Anaheim	Los Angeles	2	155000.0	154600.0	154400.0	154200.0	...	558200	560800	562800	565600	569700	574000	577800	580600	583000	585100
2	17426	Chicago	IL	Chicago	Cook	3	109700.0	109400.0	109300.0	109300.0	...	207800	206900	206200	205800	206200	207300	208200	209100	211000	213000
3	13271	Philadelphia	PA	Philadelphia	Philadelphia	4	50000.0	49900.0	49600.0	49400.0	...	122300	121600	121800	123300	125200	126400	127000	127400	128300	129100
4	40326	Phoenix	AZ	Phoenix	Maricopa	5	87200.0	87700.0	88200.0	88400.0	...	183800	185300	186600	188000	189100	190200	191300	192800	194500	195900

	State	RegionName
0	Alabama	Auburn
1	Alabama	Florence
2	Alabama	Jacksonville
3	Alabama	Livingston
4	Alabama	Montevallo
5	Alabama	Troy
6	Alabama	Tuscaloosa
7	Alabama	Tuskegee
8	Alaska	Fairbanks
9	Arizona	Flagstaff
10	Arizona	Tempe
11	Arizona	Tucson
12	Arkansas	Arkadelphia
13	Arkansas	Conway
14	Arkansas	Fayetteville
15	Arkansas	Jonesboro
16	Arkansas	Magnolia
17	Arkansas	Monticello
18	Arkansas	Russellville
19	Arkansas	Searcy
20	California	Angwin
21	California	Arcata
22	California	Berkeley
23	California	Chico
24	California	Claremont
25	California	Cotati
26	California	Davis
27	California	Irvine
28	California	Isla Vista
29	California	University Park, Los Angeles
...	...	...
487	Virginia	Wise
488	Virginia	Chesapeake
489	Washington	Bellingham
490	Washington	Cheney
491	Washington	Ellensburg
492	Washington	Pullman
493	Washington	University District, Seattle
494	West Virginia	Athens
495	West Virginia	Buckhannon
496	West Virginia	Fairmont
497	West Virginia	Glenville
498	West Virginia	Huntington
499	West Virginia	Montgomery
500	West Virginia	Morgantown
501	West Virginia	Shepherdstown
502	West Virginia	West Liberty
503	Wisconsin	Appleton
504	Wisconsin	Eau Claire
505	Wisconsin	Green Bay
506	Wisconsin	La Crosse
507	Wisconsin	Madison
508	Wisconsin	Menomonie
509	Wisconsin	Milwaukee
510	Wisconsin	Oshkosh
511	Wisconsin	Platteville
512	Wisconsin	River Falls
513	Wisconsin	Stevens Point
514	Wisconsin	Waukesha
515	Wisconsin	Whitewater
516	Wyoming	Laramie

	Quarter	GDP current	GDP chained
0	1947q1	243.1	1934.5
1	1947q2	246.3	1932.3
2	1947q3	250.1	1930.3
3	1947q4	260.3	1960.7
4	1948q1	266.2	1989.5

	Quarter	GDP current	GDP chained	Diff	Change
212	2000q1	10031.0	12359.1	35.8	1
213	2000q2	10278.3	12592.5	233.4	1
214	2000q3	10357.4	12607.7	15.2	1
215	2000q4	10472.3	12679.3	71.6	1
216	2001q1	10508.1	12643.3	-36.0	0

		2000q1	2000q2	2000q3	2000q4	2001q1	2001q2	2001q3	2001q4	2002q1	2002q2	...	2014q3	2014q4	2015q1	2015q2	2015q3	2015q4	2016q1	2016q2	2016q3	Ratio
State	RegionName
New York	New York	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	522800.000000	528066.666667	532266.666667	540800.000000	557200.000000	572833.333333	582866.666667	591633.333333	587200.0	1.090321
California	Los Angeles	207066.666667	214466.666667	220966.666667	226166.666667	233000.000000	239100.000000	245066.666667	253033.333333	261966.666667	272700.000000	...	509066.666667	518866.666667	528800.000000	538166.666667	547266.666667	557733.333333	566033.333333	577466.666667	584050.0	1.161171
Illinois	Chicago	138400.000000	143633.333333	147866.666667	152133.333333	156933.333333	161800.000000	166400.000000	170433.333333	175500.000000	177566.666667	...	195766.666667	201266.666667	201066.666667	206033.333333	208300.000000	207900.000000	206066.666667	208200.000000	212000.0	1.096063
Pennsylvania	Philadelphia	53000.000000	53633.333333	54133.333333	54700.000000	55333.333333	55533.333333	56266.666667	57533.333333	59133.333333	60733.333333	...	115300.000000	115666.666667	116200.000000	117966.666667	121233.333333	122200.000000	123433.333333	126933.333333	128700.0	0.986224
Arizona	Phoenix	111833.333333	114366.666667	116000.000000	117400.000000	119600.000000	121566.666667	122700.000000	124300.000000	126533.333333	128366.666667	...	165366.666667	168500.000000	171533.333333	174166.666667	179066.666667	183833.333333	187900.000000	191433.333333	195200.0	1.350604

		2000q1	2000q2	2000q3	2000q4	2001q1	2001q2	2001q3	2001q4	2002q1	2002q2	...	2014q3	2014q4	2015q1	2015q2	2015q3	2015q4	2016q1	2016q2	2016q3	Ratio
Alabama	Montevallo	97000.0	96800.000000	96533.333333	98333.333333	99466.666667	101333.333333	103200.000000	101866.666667	100900.000000	102700.000000	...	120266.666667	120000.000000	119900.000000	119466.666667	119400.000000	119833.333333	121700.000000	122400.000000	121500.0	1.025242
Alabama	Tuscaloosa	104500.0	106033.333333	108500.000000	108766.666667	109900.000000	111366.666667	110000.000000	110700.000000	111900.000000	111233.333333	...	139566.666667	142733.333333	145000.000000	144700.000000	144266.666667	145000.000000	140300.000000	138033.333333	138650.0	1.022461
Alaska	Fairbanks	163200.0	165033.333333	169300.000000	172800.000000	164433.333333	157800.000000	158200.000000	154666.666667	152766.666667	154533.333333	...	207100.000000	207000.000000	207766.666667	206466.666667	208433.333333	209466.666667	209066.666667	212933.333333	215850.0	1.115757
Arizona	Flagstaff	161300.0	164100.000000	165466.666667	168266.666667	172000.000000	175633.333333	178966.666667	180800.000000	182066.666667	184400.000000	...	295066.666667	295966.666667	297633.333333	302800.000000	307333.333333	310033.333333	312266.666667	318433.333333	322650.0	1.136033
Arizona	Tempe	128100.0	129933.333333	131100.000000	132800.000000	135633.333333	137900.000000	139700.000000	141833.333333	144166.666667	146566.666667	...	213066.666667	216866.666667	220466.666667	225900.000000	231200.000000	237800.000000	241900.000000	245000.000000	246950.0	1.206204

		2000q1	2000q2	2000q3	2000q4	2001q1	2001q2	2001q3	2001q4	2002q1	2002q2	...	2014q3	2014q4	2015q1	2015q2	2015q3	2015q4	2016q1	2016q2	2016q3	Ratio
State	RegionName
Kentucky	Briarwood	101666.666667	105766.666667	108300.000000	109733.333333	109166.666667	109266.666667	108800.000000	108700.000000	111533.333333	112066.666667	...	129666.666667	129600.000000	129366.666667	132200.000000	131766.666667	131400.000000	133033.333333	134466.666667	134900.0	1.001606
Texas	Baytown	74133.333333	73000.000000	73600.000000	74566.666667	74400.000000	73400.000000	73766.666667	74400.000000	75566.666667	77366.666667	...	102433.333333	104766.666667	107566.666667	110500.000000	113533.333333	116666.666667	118933.333333	122266.666667	124200.0	0.990712
Pennsylvania	Pulaski	65266.666667	66033.333333	65666.666667	65000.000000	61300.000000	60800.000000	62466.666667	64966.666667	65933.333333	67666.666667	...	67533.333333	64266.666667	63300.000000	64700.000000	67466.666667	71000.000000	73900.000000	78866.666667	78200.0	0.962736
Illinois	Schiller Park	153400.000000	157100.000000	158933.333333	162966.666667	165933.333333	172733.333333	178266.666667	186366.666667	190200.000000	189400.000000	...	181700.000000	182966.666667	180700.000000	179766.666667	176433.333333	177433.333333	178133.333333	182066.666667	186950.0	1.177983
California	Mentone	126933.333333	129900.000000	132866.666667	135766.666667	139133.333333	142066.666667	146800.000000	153700.000000	159100.000000	164866.666667	...	234000.000000	238866.666667	244866.666667	252266.666667	261900.000000	272000.000000	276766.666667	278766.666667	280200.0	1.191615