Will Millennials Ever Get Married?

This notebook contains analysis related to a paper on marriage patterns in the U.S., based on data from the National Survey of Family Growth (NSFG).

It is based on Chapter 13 of Think Stats, 2nd Edition, by Allen Downey, available from thinkstats2.com



In [1]:

    
%matplotlib inline

import pandas as pd
import numpy as np
import seaborn as sns

import math

import matplotlib.pyplot as plt
from matplotlib import pylab

from scipy.interpolate import interp1d
from scipy.misc import derivative

import thinkstats2
import thinkplot
from thinkstats2 import Cdf

import survival
import marriage



In [2]:

    
%time df = pd.read_hdf('FemMarriageData.hdf', 'FemMarriageData')
df.shape









    



CPU times: user 189 ms, sys: 8.19 ms, total: 197 ms
Wall time: 196 ms






    Out[2]:





(64042, 41)

Make a table showing the number of respondents in each cycle:



In [3]:

    
df.cycle.value_counts().sort_index()









    Out[3]:





3      7969
4      8450
5     10847
6      7643
7     12279
8      5601
9      5699
10     5554
Name: cycle, dtype: int64



In [4]:

    
def format_date_range(array):
    a, b = array.astype(int)
    return '%d--%d' % (a, b)

def SummarizeCycle(cycle, df):
    ages = df.age.min(), df.age.max()
    ages= np.array(ages)
    
    intvws = df.cmintvw.min(), df.cmintvw.max()
    intvws = np.array(intvws) / 12 + 1900
    
    births = df.cmbirth.min(), df.cmbirth.max()
    births = np.array(births) / 12 + 1900
    
    intvw_dates = format_date_range(intvws)
    birth_dates = format_date_range(births)
    
    print(cycle, ' & ', intvw_dates, '&', len(df), '&', birth_dates, r'\\')



In [5]:

    
for cycle, group in df.groupby('cycle'):
    SummarizeCycle(cycle, group)









    



3  &  1982--1983 & 7969 & 1937--1968 \\
4  &  1988--1988 & 8450 & 1943--1973 \\
5  &  1995--1995 & 10847 & 1950--1980 \\
6  &  2002--2003 & 7643 & 1957--1988 \\
7  &  2006--2010 & 12279 & 1961--1995 \\
8  &  2011--2013 & 5601 & 1966--1998 \\
9  &  2013--2015 & 5699 & 1968--2000 \\
10  &  2015--2017 & 5554 & 1966--2002 \\

Check for missing values in agemarry:

def CheckAgeVars(df): print(sum(df[df.evrmarry].agemarry.isnull()))

for cycle, group in df.groupby('cycle'): CheckAgeVars(group)

Generate a table with the number of respondents in each cohort:



In [6]:

    
marriage.DigitizeResp(df)
grouped = df.groupby('birth_index')
for name, group in iter(grouped):
    age_range = '%d--%d' % (int(group.age.min()), int(group.age_index.max()))
    print(name, '&', len(group), '&', age_range,
                '&', len(group[group.evrmarry]), '&', sum(group.missing), r'\\')









    



30 & 325 & 42--45 & 310 & 0 \\
40 & 3608 & 32--45 & 3287 & 9 \\
50 & 10631 & 22--45 & 8667 & 18 \\
60 & 14951 & 15--50 & 8813 & 33 \\
70 & 15016 & 14--47 & 8075 & 30 \\
80 & 12397 & 14--37 & 4208 & 13 \\
90 & 6695 & 15--27 & 517 & 0 \\
100 & 419 & 15--17 & 0 & 0 \\



In [7]:

    
def ComputeCutoffs(df):
    grouped = df.groupby('birth_index')
    cutoffs = {}
    for name, group in sorted(grouped):
        cutoffs[name] = int(group.age.max())
    return cutoffs



In [8]:

    
cutoffs = ComputeCutoffs(df)
cutoffs









    Out[8]:





{30: 45, 40: 45, 50: 45, 60: 50, 70: 47, 80: 37, 90: 27, 100: 17}

Estimate the hazard function for the 80s cohort (curious to see what's going on during the "marriage strike")



In [9]:

    
cohort = grouped.get_group(80)
missing = (cohort.evrmarry & cohort.agemarry.isnull())
cohort = cohort[~missing]
complete = cohort[cohort.evrmarry].agemarry_index
ongoing = cohort[~cohort.evrmarry].age_index
hf = survival.EstimateHazardFunction(complete, ongoing, verbose=True)









    



10	12384	1	0	8.1e-05
11	12383	1	0	8.1e-05
12	12382	1	0	8.1e-05
13	12381	10	0	0.00081
14	12371	12	14	0.00097
15	12345	40	260	0.0032
16	12045	104	242	0.0086
17	11699	152	338	0.013
18	11209	345	460	0.031
19	10404	436	569	0.042
20	9399	440	579	0.047
21	8380	426	572	0.051
22	7382	452	572	0.061
23	6358	383	454	0.06
24	5521	370	533	0.067
25	4618	317	576	0.069
26	3725	233	554	0.063
27	2938	167	506	0.057
28	2265	128	423	0.057
29	1714	76	391	0.044
30	1247	44	280	0.035
31	923	34	261	0.037
32	628	10	222	0.016
33	396	9	144	0.023
34	243	2	101	0.0082
35	140	1	72	0.0071
36	67	1	51	0.015
37	15	0	15	0

Run the same analysis for the 70s cohort (to extract $\lambda(33)$).



In [10]:

    
cohort = grouped.get_group(70)
missing = (cohort.evrmarry & cohort.agemarry.isnull())
cohort = cohort[~missing]
complete = cohort[cohort.evrmarry].agemarry_index
ongoing = cohort[~cohort.evrmarry].age_index
hf = survival.EstimateHazardFunction(complete, ongoing, verbose=True)









    



10	14986	2	0	0.00013
11	14984	1	0	6.7e-05
12	14983	4	0	0.00027
13	14979	11	0	0.00073
14	14968	38	3	0.0025
15	14927	81	388	0.0054
16	14458	181	518	0.013
17	13759	275	540	0.02
18	12944	562	340	0.043
19	12042	656	268	0.054
20	11118	635	242	0.057
21	10241	672	220	0.066
22	9349	702	264	0.075
23	8383	673	347	0.08
24	7363	611	345	0.083
25	6407	538	205	0.084
26	5664	481	149	0.085
27	5034	359	164	0.071
28	4511	373	198	0.083
29	3940	294	215	0.075
30	3431	235	315	0.068
31	2881	158	278	0.055
32	2445	138	212	0.056
33	2095	97	155	0.046
34	1843	78	173	0.042
35	1592	61	216	0.038
36	1315	34	203	0.026
37	1078	30	186	0.028
38	862	19	169	0.022
39	674	15	142	0.022
40	517	11	103	0.021
41	403	6	104	0.015
42	293	8	90	0.027
43	195	3	77	0.015
44	115	2	53	0.017
45	60	1	28	0.017
46	31	0	21	0
47	10	0	10	0

Use the 30s cohort to demonstrate the simple way to do survival analysis, by computing the survival function directly.



In [11]:

    
cohort = grouped.get_group(30)
sf = survival.MakeSurvivalFromSeq(cohort.agemarry_index.fillna(np.inf))
ts, ss = sf.Render()
print(ss)
thinkplot.Plot(ts, ss)
thinkplot.Config(xlim=[12, 42])









    



[ 9.93846154e-01  9.69230769e-01  9.47692308e-01  9.01538462e-01
  8.18461538e-01  6.86153846e-01  5.60000000e-01  4.64615385e-01
  3.63076923e-01  2.95384615e-01  2.61538462e-01  2.33846154e-01
  2.12307692e-01  1.75384615e-01  1.47692308e-01  1.32307692e-01
  1.10769231e-01  1.01538462e-01  8.92307692e-02  7.69230769e-02
  7.38461538e-02  6.46153846e-02  6.15384615e-02  5.84615385e-02
  5.53846154e-02  4.92307692e-02  4.61538462e-02 -2.22044605e-16]

Then use the SurvivalFunction to compute the HazardFunction:



In [12]:

    
hf = sf.MakeHazardFunction()
ts, lams = hf.Render()
print(lams)
thinkplot.Plot(ts, lams)
thinkplot.Config(xlim=[12, 42])









    



[0.00615385 0.0247678  0.02222222 0.0487013  0.09215017 0.16165414
 0.1838565  0.17032967 0.21854305 0.18644068 0.11458333 0.10588235
 0.09210526 0.17391304 0.15789474 0.10416667 0.1627907  0.08333333
 0.12121212 0.13793103 0.04       0.125      0.04761905 0.05
 0.05263158 0.11111111 0.0625     1.        ]

Make the first figure, showing sf and hf for the 30s cohort:



In [13]:

    
options = dict(formats=['pdf', 'png'], clf=False)



In [14]:

    
thinkplot.PrePlot(rows=2)

thinkplot.Plot(sf, label='survival')
thinkplot.Config(xlim=[13, 41], ylim=[0, 1.05])
plt.ylabel('Survival Function')

thinkplot.SubPlot(2)
thinkplot.Plot(hf, label='hazard')
thinkplot.Config(xlabel='age(years)', ylabel='Hazard function', xlim=[13, 41])
plt.ylabel('Hazard Function')
plt.xlabel('Age (years)')

thinkplot.Save(root='figs/marriage1', **options)









    



Writing figs/marriage1.pdf
Writing figs/marriage1.png



In [15]:

    
thinkplot.Plot(sf, label='30s')
thinkplot.Config(xlim=[13, 41], ylim=[0, 1.05])
plt.xlabel('Age (years)', fontsize=14)
plt.ylabel('Survival function', fontsize=14)

thinkplot.Save(root='figs/marriage2', **options)









    



Writing figs/marriage2.pdf
Writing figs/marriage2.png



In [16]:

    
thinkplot.Plot(hf, label='30s')
thinkplot.Config(xlim=[13, 41])
plt.xlabel('Age (years)', fontsize=14)
plt.ylabel('Hazard function', fontsize=14)

thinkplot.Save(root='figs/marriage3', **options)









    



Writing figs/marriage3.pdf
Writing figs/marriage3.png

Make some pivot tables, just to see where the data are:



In [17]:

    
pt = df.pivot_table(index='birth_index', columns='age_index', values='age', aggfunc=len, fill_value=0)
pt









    Out[17]:







  
    
      age_index
      14.0
      15.0
      16.0
      17.0
      18.0
      19.0
      20.0
      21.0
      22.0
      23.0
      ...
      41.0
      42.0
      43.0
      44.0
      45.0
      46.0
      47.0
      48.0
      49.0
      50.0
    
    
      birth_index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      30
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      19
      146
      148
      12
      0
      0
      0
      0
      0
    
    
      40
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      414
      384
      194
      198
      25
      0
      0
      0
      0
      0
    
    
      50
      0
      0
      0
      0
      0
      0
      0
      0
      42
      301
      ...
      371
      482
      633
      533
      38
      0
      0
      0
      0
      0
    
    
      60
      0
      269
      300
      370
      665
      722
      609
      513
      520
      242
      ...
      562
      537
      512
      517
      33
      18
      83
      153
      137
      2
    
    
      70
      3
      389
      523
      549
      362
      293
      283
      285
      365
      572
      ...
      430
      363
      332
      241
      129
      116
      38
      0
      0
      0
    
    
      80
      14
      260
      242
      340
      483
      589
      658
      683
      705
      602
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      90
      0
      749
      829
      895
      888
      737
      452
      514
      465
      392
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      100
      0
      181
      171
      67
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
  

8 rows × 37 columns

The following pivot table is not as helpful as it could be, since it doesn't show the number at risk.



In [18]:

    
df.pivot_table(index='birth_index', columns='agemarry_index', values='age', aggfunc=len, fill_value=0)









    Out[18]:







  
    
      agemarry_index
      10.0
      11.0
      12.0
      13.0
      14.0
      15.0
      16.0
      17.0
      18.0
      19.0
      ...
      38.0
      39.0
      40.0
      41.0
      42.0
      43.0
      44.0
      45.0
      46.0
      47.0
    
    
      birth_index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      30
      0
      0
      0
      2
      8
      7
      15
      27
      43
      41
      ...
      0
      2
      1
      0
      0
      0
      0
      0
      0
      0
    
    
      40
      1
      0
      0
      5
      31
      87
      163
      218
      425
      467
      ...
      3
      9
      3
      1
      0
      0
      0
      0
      0
      0
    
    
      50
      0
      0
      1
      10
      50
      125
      384
      642
      1111
      1057
      ...
      12
      11
      7
      4
      4
      4
      0
      0
      0
      0
    
    
      60
      0
      1
      2
      8
      40
      128
      300
      445
      860
      831
      ...
      24
      24
      18
      13
      11
      5
      2
      3
      1
      1
    
    
      70
      2
      1
      4
      11
      38
      81
      181
      276
      562
      656
      ...
      19
      15
      11
      6
      8
      3
      2
      1
      0
      0
    
    
      80
      1
      1
      1
      10
      12
      40
      104
      152
      345
      436
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      90
      0
      0
      0
      0
      2
      6
      15
      49
      86
      92
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
  

7 rows × 38 columns

Estimate the survival curve for each cohort:



In [19]:

    
df['complete'] = df.evrmarry
df['complete_var'] = df.agemarry_index
df['ongoing_var'] = df.age_index
df['complete_missing'] = df.complete & df.complete_var.isnull()
df['ongoing_missing'] = ~df.complete & df.ongoing_var.isnull()



In [20]:

    
# for some marriages, we don't have the date of marriage

for cycle, group in df.groupby('cycle'):
    print(cycle, sum(group.complete_missing), sum(group.ongoing_missing))



In [21]:

    
resps = [group for cycle, group in df.groupby('cycle')]
iters = 101



In [22]:

    
%time sf_map = marriage.EstimateSurvivalByCohort(resps, iters=iters, cutoffs=cutoffs)









    



CPU times: user 29 s, sys: 34.1 s, total: 1min 3s
Wall time: 16 s



In [23]:

    
del sf_map[30]

try: 
    del sf_map[100]
except KeyError:
    pass

Check a sample:



In [24]:

    
for sf in sf_map[90]:
    print(sf.ss)
    print(sf.Prob(34))
    break









    



14.0    0.999840
15.0    0.998395
16.0    0.996408
17.0    0.987276
18.0    0.966395
19.0    0.940836
20.0    0.906839
21.0    0.872929
22.0    0.829741
23.0    0.808923
24.0    0.765328
25.0    0.750778
26.0    0.750778
dtype: float64
0.7507784025321489



In [25]:

    
for sf in sf_map[80]:
    print(sf.ss)
    print(sf.Prob(34))
    break









    



11.0    0.999915
12.0    0.999831
13.0    0.999237
14.0    0.998390
15.0    0.994743
16.0    0.986660
17.0    0.974109
18.0    0.941480
19.0    0.897603
20.0    0.859113
21.0    0.811348
22.0    0.755303
23.0    0.703772
24.0    0.651021
25.0    0.592230
26.0    0.528698
27.0    0.480155
28.0    0.446503
29.0    0.421149
30.0    0.404408
31.0    0.381550
32.0    0.372563
33.0    0.348842
34.0    0.346469
35.0    0.346469
36.0    0.346469
dtype: float64
0.34646884897303215

Make the figure showing estimated survival curves:



In [26]:

    
def PlotSurvivalFunctions(root, sf_map, sf_map_pred=None, **options):

    if sf_map_pred:
        marriage.PlotSurvivalFunctions(sf_map_pred, predict_flag=True)
    marriage.PlotSurvivalFunctions(sf_map)

    thinkplot.config(xlabel='Age (years)',
                     ylabel='Fraction never married',
                     xlim=[13, 50], 
                     ylim=[0, 1.05], 
                     loc='upper right', 
                     frameon=False,
                     **options)
    plt.tight_layout()
    thinkplot.save(root=root, formats=['pdf', 'png'])



In [27]:

    
def set_palette(*args, **kwds):
    """Set the matplotlib color cycler.
    
    args, kwds: same as for sns.color_palette
    
    Also takes a boolean kwd, `reverse`, to indicate
    whether the order of the palette should be reversed.
    
    returns: list of colors
    """
    reverse = kwds.pop('reverse', False)
    palette = sns.color_palette(*args, **kwds)
    
    palette = list(palette)
    if reverse:
        palette.reverse()
        
    cycler = plt.cycler(color=palette)
    plt.gca().set_prop_cycle(cycler)
    return palette



In [56]:

    
def draw_age_lines(ages):
    for age in ages:
        plt.axvline(age, color='gray', linestyle='dotted', alpha=0.3)



In [57]:

    
palette = set_palette('hls', 6)
draw_age_lines(ages)
options_w = dict(title='Women in the U.S. by decade of birth')
PlotSurvivalFunctions('figs/marriage4', sf_map, None, **options_w)









    



Writing figs/marriage4.pdf
Writing figs/marriage4.png

Make a table of marriage rates for each cohort at each age:



In [29]:

    
def MakeTable(sf_map, ages):
    t = []
    for name, sf_seq in sorted(sf_map.items()):
        ts, ss = marriage.MakeSurvivalCI(sf_seq, [50])
        ss = ss[0]
        vals = [np.interp(age, ts, ss, right=np.nan) for age in ages]
        t.append((name, vals))
    return t



In [30]:

    
def MakePercentageTable(sf_map, ages):
    """Prints percentage unmarried for each cohort at each age.
    """
    t = MakeTable(sf_map, ages)
    for name, sf_seq in sorted(sf_map.items()):
        ts, ss = marriage.MakeSurvivalCI(sf_seq, [50])
        ss = ss[0]
        vals = [np.interp(age, ts, ss, right=np.nan) for age in ages]
        print(name, '&', ' & '.join('%0.0f' % (val*100) for val in vals), r'\\')
        
ages = [24, 34, 44]
MakePercentageTable(sf_map, ages=ages)









    



40 & 21 & 9 & 7 \\
50 & 32 & 14 & 10 \\
60 & 47 & 18 & 12 \\
70 & 54 & 23 & 18 \\
80 & 64 & 35 & nan \\
90 & 76 & nan & nan \\

Generate projections:



In [31]:

    
%time sf_map_pred = marriage.EstimateSurvivalByCohort(resps, iters=iters, cutoffs=cutoffs, predict_flag=True)
del sf_map_pred[30]

try: 
    del sf_map[100]
except KeyError:
    pass









    



CPU times: user 28.7 s, sys: 33.7 s, total: 1min 2s
Wall time: 15.7 s



In [32]:

    
for cohort, seq in sf_map_pred.items():
    if cohort > 90:
        break
    medians = [sf.MakeCdf().Value(0.5) for sf in seq]
    print(cohort, np.median(medians))

And make the figure showing projections:



In [58]:

    
palette = set_palette('hls', 6)
draw_age_lines(ages)
PlotSurvivalFunctions('figs/marriage5', sf_map, sf_map_pred, **options_w)









    



Writing figs/marriage5.pdf
Writing figs/marriage5.png

Make the table again with the projections filled in.



In [34]:

    
MakePercentageTable(sf_map_pred, ages)









    



40 & 21 & 9 & 7 \\
50 & 32 & 14 & 10 \\
60 & 47 & 18 & 12 \\
70 & 54 & 23 & 18 \\
80 & 64 & 36 & 30 \\
90 & 76 & 51 & 42 \\
100 & 76 & 51 & 43 \\



In [35]:

    
def PlotFractions(sf_map, ages, label_flag=False, **options):
    t = MakeTable(sf_map, ages)
    cohorts, cols = zip(*t)
    rows = zip(*cols)

    thinkplot.PrePlot(3)
    t = list(zip(ages, rows))
    for age, row in reversed(t):
        label = 'at age %d' % age if label_flag else ''
        thinkplot.Plot(cohorts, row, label=label, **options)



In [36]:

    
PlotFractions(sf_map_pred, ages, color='gray', linestyle='dashed', linewidth=2)
PlotFractions(sf_map, ages, label_flag=True, alpha=1)

fontsize=12
thinkplot.Text(36, 0.26, '24', fontsize=fontsize)
thinkplot.Text(37, 0.13, '9', fontsize=fontsize)
thinkplot.Text(37, 0.07, '7', fontsize=fontsize)

thinkplot.Text(90, 0.85, '80', fontsize=fontsize)
thinkplot.Text(90, 0.56, '51', fontsize=fontsize)
thinkplot.Text(89.5, 0.47, '42', fontsize=fontsize)

thinkplot.Text(80, 0.42, '35', fontsize=fontsize)
thinkplot.Text(70, 0.18, '18', fontsize=fontsize)

thinkplot.Config(xlim=[34, 97], ylim=[0, 1], legend=True, loc='upper left',
                 xlabel='cohort (decade)', ylabel='Fraction unmarried', 
                 title='Women in the U.S.')

thinkplot.Save(root='figs/marriage6', **options)









    



/home/downey/anaconda3/lib/python3.6/site-packages/matplotlib/figure.py:98: MatplotlibDeprecationWarning: 
Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  "Adding an axes using the same arguments as a previous axes "






    



Writing figs/marriage6.pdf
Writing figs/marriage6.png

Marriage patterns for men



In [37]:

    
%time df2 = pd.read_hdf('MaleMarriageData.hdf', 'MaleMarriageData')
df2.shape









    



CPU times: user 11.2 ms, sys: 2.87 ms, total: 14.1 ms
Wall time: 12.6 ms






    Out[37]:





(29192, 37)



In [38]:

    
for cycle, group in df2.groupby('cycle'):
    SummarizeCycle(cycle, group)









    



6  &  2002--2003 & 4928 & 1957--1988 \\
7  &  2006--2010 & 10403 & 1961--1995 \\
8  &  2011--2013 & 4815 & 1966--1998 \\
9  &  2013--2015 & 4506 & 1968--2000 \\
10  &  2015--2017 & 4540 & 1965--2002 \\



In [39]:

    
sum(df2.missing)









    Out[39]:





0



In [40]:

    
marriage.DigitizeResp(df2)
grouped = df2.groupby('birth_index')
for name, group in iter(grouped):
    age_range = '%d--%d' % (int(group.age.min()), int(group.age_index.max()))
    print(name, '&', len(group), '&', age_range,
                '&', len(group[group.evrmarry]), '&', sum(group.missing), r'\\')









    



50 & 322 & 42--45 & 224 & 0 \\
60 & 3970 & 32--50 & 2755 & 0 \\
70 & 8282 & 22--47 & 4883 & 0 \\
80 & 9861 & 15--37 & 2445 & 0 \\
90 & 6364 & 15--27 & 243 & 0 \\
100 & 393 & 15--17 & 0 & 0 \\



In [41]:

    
cutoffs2 = ComputeCutoffs(df2)
cutoffs2









    Out[41]:





{50: 45, 60: 50, 70: 47, 80: 37, 90: 27, 100: 17}



In [42]:

    
resps2 = [group for cycle, group in df2.groupby('cycle')]



In [43]:

    
%time sf_map_male = marriage.EstimateSurvivalByCohort(resps2, iters=iters, cutoffs=cutoffs2)
del sf_map_male[100]









    



CPU times: user 7.46 s, sys: 29.3 ms, total: 7.49 s
Wall time: 7.46 s



In [59]:

    
palette = set_palette('hls', 6)
draw_age_lines(ages)
options_m = dict(title='Men in the U.S. by decade of birth')
PlotSurvivalFunctions('figs/marriage7', sf_map_male, None, **options_m)









    



Writing figs/marriage7.pdf
Writing figs/marriage7.png



In [45]:

    
%time sf_map_male_pred = marriage.EstimateSurvivalByCohort(resps2, iters=iters, cutoffs=cutoffs2, predict_flag=True)
del sf_map_male_pred[100]









    



CPU times: user 7.86 s, sys: 24.2 ms, total: 7.88 s
Wall time: 7.85 s



In [46]:

    
for cohort, seq in sf_map_male_pred.items():
    if cohort > 90:
        break
    medians = [sf.MakeCdf().Value(0.5) for sf in seq]
    print(cohort, np.median(medians))



In [60]:

    
palette = set_palette('hls', 6)
draw_age_lines(ages)
PlotSurvivalFunctions('figs/marriage8', sf_map_male, sf_map_male_pred, **options_m)









    



Writing figs/marriage8.pdf
Writing figs/marriage8.png



In [48]:

    
MakePercentageTable(sf_map_male, ages)









    



50 & 58 & 24 & 18 \\
60 & 63 & 27 & 17 \\
70 & 69 & 29 & 19 \\
80 & 75 & 41 & nan \\
90 & 83 & nan & nan \\



In [49]:

    
MakePercentageTable(sf_map_male_pred, ages)









    



50 & 58 & 24 & 18 \\
60 & 63 & 27 & 17 \\
70 & 70 & 29 & 19 \\
80 & 75 & 41 & 30 \\
90 & 83 & 50 & 37 \\



In [50]:

    
PlotFractions(sf_map_male_pred, ages, color='gray', linestyle='dashed', linewidth=2)
PlotFractions(sf_map_male, ages, label_flag=True, alpha=1)

fontsize=12
thinkplot.Text(46, 0.69, '68', fontsize=fontsize)
thinkplot.Text(46, 0.30, '26', fontsize=fontsize)
thinkplot.Text(46, 0.20, '18', fontsize=fontsize)

thinkplot.Text(70, 0.18, '19', fontsize=fontsize)

thinkplot.Text(80, 0.43, '43', fontsize=fontsize)

thinkplot.Text(90, 0.89, '86', fontsize=fontsize)
thinkplot.Text(90, 0.56, '52', fontsize=fontsize)
thinkplot.Text(90, 0.40, '38', fontsize=fontsize)

thinkplot.Config(xlim=[34, 97], ylim=[0, 1], legend=True, loc='upper left',
                 xlabel='cohort (decade)', ylabel='Fraction unmarried', 
                 title='Men in the U.S.')

thinkplot.Save(root='figs/marriage9', **options)









    



Writing figs/marriage9.pdf






    



/home/downey/anaconda3/lib/python3.6/site-packages/matplotlib/figure.py:98: MatplotlibDeprecationWarning: 
Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  "Adding an axes using the same arguments as a previous axes "






    



Writing figs/marriage9.png



In [ ]:

age_index	14.0	15.0	16.0	17.0	18.0	19.0	20.0	21.0	22.0	23.0	...	41.0	42.0	43.0	44.0	45.0	46.0	47.0	48.0	49.0	50.0
birth_index
30	0	0	0	0	0	0	0	0	0	0	...	0	19	146	148	12	0	0	0	0	0
40	0	0	0	0	0	0	0	0	0	0	...	414	384	194	198	25	0	0	0	0	0
50	0	0	0	0	0	0	0	0	42	301	...	371	482	633	533	38	0	0	0	0	0
60	0	269	300	370	665	722	609	513	520	242	...	562	537	512	517	33	18	83	153	137	2
70	3	389	523	549	362	293	283	285	365	572	...	430	363	332	241	129	116	38	0	0	0
80	14	260	242	340	483	589	658	683	705	602	...	0	0	0	0	0	0	0	0	0	0
90	0	749	829	895	888	737	452	514	465	392	...	0	0	0	0	0	0	0	0	0	0
100	0	181	171	67	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

agemarry_index	10.0	11.0	12.0	13.0	14.0	15.0	16.0	17.0	18.0	19.0	...	38.0	39.0	40.0	41.0	42.0	43.0	44.0	45.0	46.0	47.0
birth_index
30	0	0	0	2	8	7	15	27	43	41	...	0	2	1	0	0	0	0	0	0	0
40	1	0	0	5	31	87	163	218	425	467	...	3	9	3	1	0	0	0	0	0	0
50	0	0	1	10	50	125	384	642	1111	1057	...	12	11	7	4	4	4	0	0	0	0
60	0	1	2	8	40	128	300	445	860	831	...	24	24	18	13	11	5	2	3	1	1
70	2	1	4	11	38	81	181	276	562	656	...	19	15	11	6	8	3	2	1	0	0
80	1	1	1	10	12	40	104	152	345	436	...	0	0	0	0	0	0	0	0	0	0
90	0	0	0	0	2	6	15	49	86	92	...	0	0	0	0	0	0	0	0	0	0