Does Trivers-Willard apply to people?

This notebook contains a "one-day paper", my attempt to pose a research question, answer it, and publish the results in one work day.

Copyright 2016 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT


In [1]:
from __future__ import print_function, division

import thinkstats2
import thinkplot

import pandas as pd
import numpy as np

import statsmodels.formula.api as smf

%matplotlib inline

Trivers-Willard

According to Wikipedia, the Trivers-Willard hypothesis:

"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition)."

For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys. Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.

To test whether the T-W hypothesis holds up in humans, I downloaded birth data for the nearly 4 million babies born in the U.S. in 2014.

I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio.

Summary of results

  1. Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.

  2. However, many of the variables are also correlated with race. If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.

  3. Contrary to other reports, the age of the parents seems to have no predictive power.

  4. Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits. Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).

Following convention, I report sex ratio in terms of boys per 100 girls. The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls.

Data cleaning

Here's how I loaded the data:


In [2]:
names = ['year', 'mager9', 'restatus', 'mbrace', 'mhisp_r',
        'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc', 
        'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']
colspecs = [(15, 18),
            (93, 93),
            (138, 138),
            (143, 143),
            (148, 148),
            (152, 152),
            (153, 153),
            (155, 155),
            (186, 187),
            (191, 191),
            (195, 195),
            (197, 197),
            (212, 212),
            (272, 273),
            (281, 281),
            (555, 556),
            (533, 533),
            (413, 413),
            (436, 436),
           ]

colspecs = [(start-1, end) for start, end in colspecs]

In [3]:
df = None

In [4]:
filename = 'Nat2013PublicUS.r20141016.gz'
#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)
#df.head()

In [5]:
# store the dataframe for faster loading

#store = pd.HDFStore('store.h5')
#store['births2013'] = df
#store.close()

In [6]:
# load the dataframe

store = pd.HDFStore('store.h5')
df = store['births2013']
store.close()

In [7]:
def series_to_ratio(series):
    """Takes a boolean series and computes sex ratio.
    """
    boys = np.mean(series)
    return np.round(100 * boys / (1-boys)).astype(int)

I have to recode sex as 0 or 1 to make logit happy.


In [8]:
df['boy'] = (df.sex=='M').astype(int)
df.boy.value_counts().sort_index()


Out[8]:
0    1923390
1    2017374
Name: boy, dtype: int64

All births are from 2014.


In [9]:
df.year.value_counts().sort_index()


Out[9]:
2013    3940764
Name: year, dtype: int64

Mother's age:


In [10]:
df.mager9.value_counts().sort_index()


Out[10]:
1       3099
2     273598
3     898163
4    1123370
5    1039480
6     485088
7     109738
8       7539
9        689
Name: mager9, dtype: int64

In [11]:
var = 'mager9'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[11]:
boy
mager9
1 106
2 105
3 105
4 105
5 105
6 105
7 104
8 104
9 95

In [12]:
df.mager9.isnull().mean()


Out[12]:
0.0

In [13]:
df['youngm'] = df.mager9<=2
df['oldm'] = df.mager9>=7
df.youngm.mean(), df.oldm.mean()


Out[13]:
(0.070214049864442532, 0.029934804520138733)

Residence status (1=resident)


In [14]:
df.restatus.value_counts().sort_index()


Out[14]:
1    2847673
2     999320
3      85188
4       8583
Name: restatus, dtype: int64

In [15]:
var = 'restatus'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[15]:
boy
restatus
1 105
2 105
3 105
4 106

Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)


In [16]:
df.mbrace.value_counts().sort_index()


Out[16]:
1    2993686
2     635120
3      46011
4     265947
Name: mbrace, dtype: int64

In [17]:
var = 'mbrace'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[17]:
boy
mbrace
1 105
2 103
3 106
4 106

Mother's Hispanic origin (0=Non-Hispanic)


In [18]:
df.mhisp_r.replace([9], np.nan, inplace=True)
df.mhisp_r.value_counts().sort_index()


Out[18]:
0    3005012
1     552005
2      68313
3      18855
4     131436
5     137518
Name: mhisp_r, dtype: int64

In [19]:
def copy_null(df, oldvar, newvar):
    df.loc[df[oldvar].isnull(), newvar] = np.nan

In [20]:
df['mhisp'] = df.mhisp_r > 0
copy_null(df, 'mhisp_r', 'mhisp')
df.mhisp.isnull().mean(), df.mhisp.mean()


Out[20]:
(0.007010062008280628, 0.23207123488329956)

In [21]:
var = 'mhisp'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[21]:
boy
mhisp
0 105
1 104

Marital status (1=Married)


In [22]:
df.dmar.value_counts().sort_index()


Out[22]:
1    2342660
2    1598104
Name: dmar, dtype: int64

In [23]:
var = 'dmar'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[23]:
boy
dmar
1 105
2 104

Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).

I recode X (not applicable because married) as Y (paternity acknowledged).


In [24]:
df.mar_p.replace(['U'], np.nan, inplace=True)
df.mar_p.replace(['X'], 'Y', inplace=True)
df.mar_p.value_counts().sort_index()


Out[24]:
N     429652
Y    3127707
Name: mar_p, dtype: int64

In [25]:
var = 'mar_p'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[25]:
boy
mar_p
N 103
Y 105

Mother's education level


In [26]:
df.meduc.replace([9], np.nan, inplace=True)
df.meduc.value_counts().sort_index()


Out[26]:
1    136701
2    421293
3    879956
4    753056
5    280660
6    669170
7    297054
8     84707
Name: meduc, dtype: int64

In [27]:
var = 'meduc'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[27]:
boy
meduc
1 104
2 104
3 105
4 105
5 105
6 105
7 105
8 106

In [28]:
df['lowed'] = df.meduc <= 2
copy_null(df, 'meduc', 'lowed')
df.lowed.isnull().mean(), df.lowed.mean()


Out[28]:
(0.1061131800838619, 0.15840415466202917)

Father's age, in 10 ranges


In [29]:
df.fagerrec11.replace([11], np.nan, inplace=True)
df.fagerrec11.value_counts().sort_index()


Out[29]:
1        368
2      92982
3     510562
4     862475
5     993126
6     603109
7     257493
8      84476
9      27431
10     11627
Name: fagerrec11, dtype: int64

In [30]:
var = 'fagerrec11'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[30]:
boy
fagerrec11
1 97
2 107
3 105
4 105
5 105
6 105
7 105
8 103
9 104
10 102

In [31]:
df['youngf'] = df.fagerrec11<=2
copy_null(df, 'fagerrec11', 'youngf')
df.youngf.isnull().mean(), df.youngf.mean()


Out[31]:
(0.12614685883244975, 0.027107873073010633)

In [32]:
df['oldf'] = df.fagerrec11>=8
copy_null(df, 'fagerrec11', 'oldf')
df.oldf.isnull().mean(), df.oldf.mean()


Out[32]:
(0.12614685883244975, 0.03587299402465234)

Father's race


In [33]:
df.fbrace.replace([9], np.nan, inplace=True)
df.fbrace.value_counts().sort_index()


Out[33]:
1    2466993
2     476535
3      35143
4     222529
Name: fbrace, dtype: int64

In [34]:
var = 'fbrace'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[34]:
boy
fbrace
1 105
2 104
3 105
4 107

Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)


In [35]:
df.fhisp_r.replace([9], np.nan, inplace=True)
df.fhisp_r.value_counts().sort_index()


Out[35]:
0    2604861
1     491433
2      58210
3      17820
4     104385
5     120142
Name: fhisp_r, dtype: int64

In [36]:
df['fhisp'] = df.fhisp_r > 0
copy_null(df, 'fhisp_r', 'fhisp')
df.fhisp.isnull().mean(), df.fhisp.mean()


Out[36]:
(0.13802222107185308, 0.2331541772070662)

In [37]:
var = 'fhisp'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[37]:
boy
fhisp
0 105
1 104

Father's education level


In [38]:
df.feduc.replace([9], np.nan, inplace=True)
df.feduc.value_counts().sort_index()


Out[38]:
1    136789
2    326194
3    879276
4    591364
5    212259
6    564045
7    220241
8     99780
Name: feduc, dtype: int64

In [39]:
var = 'feduc'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[39]:
boy
feduc
1 104
2 105
3 105
4 105
5 105
6 105
7 106
8 106

Live birth order.


In [40]:
df.lbo_rec.replace([9], np.nan, inplace=True)
df.lbo_rec.value_counts().sort_index()


Out[40]:
1    1550114
2    1246847
3     654946
4     276936
5     108168
6      44188
7      20301
8      20732
Name: lbo_rec, dtype: int64

In [41]:
var = 'lbo_rec'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[41]:
boy
lbo_rec
1 105
2 105
3 104
4 104
5 104
6 103
7 107
8 102

In [42]:
df['highbo'] = df.lbo_rec >= 5
copy_null(df, 'lbo_rec', 'highbo')
df.highbo.isnull().mean(), df.highbo.mean()


Out[42]:
(0.0047026414167405106, 0.04930585442166603)

Number of prenatal visits, in 11 ranges


In [43]:
df.previs_rec.replace([12], np.nan, inplace=True)
df.previs_rec.value_counts().sort_index()


Out[43]:
1      55475
2      41960
3      92649
4     191376
5     356722
6     806697
7     992507
8     677219
9     385084
10     98410
11    124040
Name: previs_rec, dtype: int64

In [44]:
df.previs_rec.mean()
df['previs'] = df.previs_rec - 7

In [45]:
var = 'previs'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[45]:
boy
previs
-6 109
-5 106
-4 108
-3 108
-2 108
-1 106
0 105
1 103
2 103
3 101
4 101

In [46]:
df['no_previs'] = df.previs_rec <= 1
copy_null(df, 'previs_rec', 'no_previs')
df.no_previs.isnull().mean(), df.no_previs.mean()


Out[46]:
(0.030102030976734459, 0.014514124159273119)

Whether the mother is eligible for food stamps


In [47]:
df.wic.replace(['U'], np.nan, inplace=True)
df.wic.value_counts().sort_index()


Out[47]:
N    1906790
Y    1568093
Name: wic, dtype: int64

In [48]:
var = 'wic'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[48]:
boy
wic
N 105
Y 104

Mother's height in inches


In [49]:
df.height.replace([99], np.nan, inplace=True)
df.height.value_counts().sort_index()


Out[49]:
30         8
31         3
32         2
33         1
36         9
37         4
38         6
39        16
40         8
41        14
42         6
43         7
44         2
45         7
46         7
47        20
48       759
49       492
50       346
51       362
52       450
53      1360
54      1301
55      2532
56      6411
57     17089
58     19417
59     74060
60    193506
61    246137
62    436950
63    448774
64    512975
65    415945
66    397583
67    308439
68    177404
69    117775
70     57814
71     30853
72     14269
73      4872
74      2369
75       955
76       540
77       626
78      1000
Name: height, dtype: int64

In [50]:
df['mshort'] = df.height<60
copy_null(df, 'height', 'mshort')
df.mshort.isnull().mean(), df.mshort.mean()


Out[50]:
(0.11350058009056112, 0.03569472890251425)

In [51]:
df['mtall'] = df.height>=70
copy_null(df, 'height', 'mtall')
df.mtall.isnull().mean(), df.mtall.mean()


Out[51]:
(0.11350058009056112, 0.032431225552707395)

In [52]:
var = 'mshort'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[52]:
boy
mshort
0 105
1 104

In [53]:
var = 'mtall'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[53]:
boy
mtall
0 105
1 105

Mother's BMI in 6 ranges


In [54]:
df.bmi_r.replace([9], np.nan, inplace=True)
df.bmi_r.value_counts().sort_index()


Out[54]:
1     128922
2    1578486
3     869407
4     458241
5     217090
6     149572
Name: bmi_r, dtype: int64

In [55]:
var = 'bmi_r'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[55]:
boy
bmi_r
1 105
2 105
3 105
4 105
5 105
6 103

In [56]:
df['obese'] = df.bmi_r >= 4
copy_null(df, 'bmi_r', 'obese')
df.obese.isnull().mean(), df.obese.mean()


Out[56]:
(0.13678718136889192, 0.2424959976106191)

Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)


In [57]:
df.pay_rec.replace([9], np.nan, inplace=True)
df.pay_rec.value_counts().sort_index()


Out[57]:
1    1530635
2    1663943
3     153560
4     171457
Name: pay_rec, dtype: int64

In [58]:
var = 'pay_rec'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[58]:
boy
pay_rec
1 104
2 105
3 106
4 106

Sex of baby


In [59]:
df.sex.value_counts().sort_index()


Out[59]:
F    1923390
M    2017374
Name: sex, dtype: int64

Regression models

Here are some functions I'll use to interpret the results of logistic regression


In [60]:
def logodds_to_ratio(logodds):
    """Convert from log odds to probability."""
    odds = np.exp(logodds)
    return 100 * odds

def summarize(results):
    """Summarize parameters in terms of birth ratio."""
    inter_or = results.params['Intercept']
    inter_rat = logodds_to_ratio(inter_or)
    
    for value, lor in results.params.iteritems():
        if value=='Intercept':
            continue
        
        rat = logodds_to_ratio(inter_or + lor)
        code = '*' if results.pvalues[value] < 0.05 else ' '
        
        print('%-20s   %0.1f   %0.1f' % (value, inter_rat, rat), code)

Now I'll run models with each variable, one at a time.

Mother's age seems to have no predictive value:


In [61]:
model = smf.logit('boy ~ mager9', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692863
         Iterations 3
mager9                 105.2   105.1  
Out[61]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3940764
Model: Logit Df Residuals: 3940762
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.110e-07
Time: 16:24:34 Log-Likelihood: -2.7304e+06
converged: True LL-Null: -2.7304e+06
LLR p-value: 0.4363
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0504 0.004 13.906 0.000 0.043 0.058
mager9 -0.0006 0.001 -0.778 0.436 -0.002 0.001

The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant.


In [62]:
model = smf.logit('boy ~ youngm + oldm', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692862
         Iterations 3
youngm[T.True]         104.9   105.4  
oldm[T.True]           104.9   104.1  
Out[62]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3940764
Model: Logit Df Residuals: 3940761
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 6.086e-07
Time: 16:24:39 Log-Likelihood: -2.7304e+06
converged: True LL-Null: -2.7304e+06
LLR p-value: 0.1898
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0476 0.001 44.817 0.000 0.046 0.050
youngm[T.True] 0.0047 0.004 1.189 0.234 -0.003 0.012
oldm[T.True] -0.0078 0.006 -1.323 0.186 -0.019 0.004

Neither does residence status


In [63]:
model = smf.logit('boy ~ C(restatus)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692861
         Iterations 3
C(restatus)[T.2]       104.7   105.5 *
C(restatus)[T.3]       104.7   105.3  
C(restatus)[T.4]       104.7   106.2  
Out[63]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3940764
Model: Logit Df Residuals: 3940760
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.140e-06
Time: 16:25:07 Log-Likelihood: -2.7304e+06
converged: True LL-Null: -2.7304e+06
LLR p-value: 0.008550
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0456 0.001 38.456 0.000 0.043 0.048
C(restatus)[T.2] 0.0077 0.002 3.329 0.001 0.003 0.012
C(restatus)[T.3] 0.0057 0.007 0.819 0.413 -0.008 0.019
C(restatus)[T.4] 0.0143 0.022 0.662 0.508 -0.028 0.057

Mother's race seems to have predictive value. Relative to whites, black and Native American mothers have more girls; Asians have more boys.


In [64]:
model = smf.logit('boy ~ C(mbrace)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692855
         Iterations 3
C(mbrace)[T.2]         105.1   103.2 *
C(mbrace)[T.3]         105.1   105.5  
C(mbrace)[T.4]         105.1   106.5 *
Out[64]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3940764
Model: Logit Df Residuals: 3940760
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.126e-05
Time: 16:25:34 Log-Likelihood: -2.7304e+06
converged: True LL-Null: -2.7304e+06
LLR p-value: 2.843e-13
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0497 0.001 43.014 0.000 0.047 0.052
C(mbrace)[T.2] -0.0184 0.003 -6.659 0.000 -0.024 -0.013
C(mbrace)[T.3] 0.0039 0.009 0.412 0.680 -0.015 0.022
C(mbrace)[T.4] 0.0132 0.004 3.266 0.001 0.005 0.021

Hispanic mothers have more girls.


In [65]:
model = smf.logit('boy ~ mhisp', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692863
         Iterations 3
mhisp                  105.1   104.3 *
Out[65]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3913139
Model: Logit Df Residuals: 3913137
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.830e-06
Time: 16:25:39 Log-Likelihood: -2.7113e+06
converged: True LL-Null: -2.7113e+06
LLR p-value: 0.001634
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0494 0.001 42.780 0.000 0.047 0.052
mhisp -0.0075 0.002 -3.150 0.002 -0.012 -0.003

If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)


In [66]:
model = smf.logit('boy ~ C(mar_p)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692857
         Iterations 3
C(mar_p)[T.Y]          102.9   105.2 *
Out[66]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3557359
Model: Logit Df Residuals: 3557357
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.735e-06
Time: 16:26:06 Log-Likelihood: -2.4647e+06
converged: True LL-Null: -2.4648e+06
LLR p-value: 5.309e-11
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0289 0.003 9.480 0.000 0.023 0.035
C(mar_p)[T.Y] 0.0214 0.003 6.562 0.000 0.015 0.028

Being unmarried predicts more girls.


In [67]:
model = smf.logit('boy ~ C(dmar)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692860
         Iterations 3
C(dmar)[T.2]           105.3   104.3 *
Out[67]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3940764
Model: Logit Df Residuals: 3940762
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.091e-06
Time: 16:26:33 Log-Likelihood: -2.7304e+06
converged: True LL-Null: -2.7304e+06
LLR p-value: 2.286e-06
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0516 0.001 39.508 0.000 0.049 0.054
C(dmar)[T.2] -0.0097 0.002 -4.726 0.000 -0.014 -0.006

Each level of mother's education predicts a small increase in the probability of a boy.


In [68]:
model = smf.logit('boy ~ meduc', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692863
         Iterations 3
meduc                  103.9   104.1 *
Out[68]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3522597
Model: Logit Df Residuals: 3522595
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.858e-06
Time: 16:26:38 Log-Likelihood: -2.4407e+06
converged: True LL-Null: -2.4407e+06
LLR p-value: 0.0001875
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0379 0.003 13.590 0.000 0.032 0.043
meduc 0.0023 0.001 3.735 0.000 0.001 0.003

In [69]:
model = smf.logit('boy ~ lowed', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692863
         Iterations 3
lowed                  105.0   104.1 *
Out[69]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3522597
Model: Logit Df Residuals: 3522595
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.737e-06
Time: 16:26:43 Log-Likelihood: -2.4407e+06
converged: True LL-Null: -2.4407e+06
LLR p-value: 0.003591
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0489 0.001 42.080 0.000 0.047 0.051
lowed -0.0085 0.003 -2.912 0.004 -0.014 -0.003

Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance).


In [70]:
model = smf.logit('boy ~ fagerrec11', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692832
         Iterations 3
fagerrec11             106.0   105.8 *
Out[70]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3443649
Model: Logit Df Residuals: 3443647
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.017e-06
Time: 16:26:47 Log-Likelihood: -2.3859e+06
converged: True LL-Null: -2.3859e+06
LLR p-value: 0.02764
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0583 0.004 15.100 0.000 0.051 0.066
fagerrec11 -0.0017 0.001 -2.202 0.028 -0.003 -0.000

In [71]:
model = smf.logit('boy ~ youngf + oldf', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692831
         Iterations 3
youngf                 105.2   106.5  
oldf                   105.2   103.5 *
Out[71]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3443649
Model: Logit Df Residuals: 3443646
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.423e-06
Time: 16:26:51 Log-Likelihood: -2.3859e+06
converged: True LL-Null: -2.3859e+06
LLR p-value: 0.003089
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0504 0.001 45.235 0.000 0.048 0.053
youngf 0.0127 0.007 1.908 0.056 -0.000 0.026
oldf -0.0160 0.006 -2.752 0.006 -0.027 -0.005

Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers.


In [72]:
model = smf.logit('boy ~ C(fbrace)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692818
         Iterations 3
C(fbrace)[T.2.0]       105.4   103.5 *
C(fbrace)[T.3.0]       105.4   105.2  
C(fbrace)[T.4.0]       105.4   107.0 *
Out[72]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3201200
Model: Logit Df Residuals: 3201196
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.082e-05
Time: 16:27:18 Log-Likelihood: -2.2179e+06
converged: True LL-Null: -2.2179e+06
LLR p-value: 2.121e-10
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0523 0.001 41.046 0.000 0.050 0.055
C(fbrace)[T.2.0] -0.0176 0.003 -5.566 0.000 -0.024 -0.011
C(fbrace)[T.3.0] -0.0012 0.011 -0.114 0.909 -0.022 0.020
C(fbrace)[T.4.0] 0.0153 0.004 3.455 0.001 0.007 0.024

If the father is Hispanic, that predicts more girls.


In [73]:
model = smf.logit('boy ~ fhisp', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692829
         Iterations 3
fhisp                  105.4   104.3 *
Out[73]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3396851
Model: Logit Df Residuals: 3396849
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.902e-06
Time: 16:27:24 Log-Likelihood: -2.3534e+06
converged: True LL-Null: -2.3534e+06
LLR p-value: 1.824e-05
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0528 0.001 42.594 0.000 0.050 0.055
fhisp -0.0110 0.003 -4.286 0.000 -0.016 -0.006

Father's education level might predict more boys, but the apparent effect could be due to chance.


In [74]:
model = smf.logit('boy ~ feduc', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692828
         Iterations 3
feduc                  104.5   104.6 *
Out[74]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3029948
Model: Logit Df Residuals: 3029946
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.471e-06
Time: 16:27:28 Log-Likelihood: -2.0992e+06
converged: True LL-Null: -2.0992e+06
LLR p-value: 0.01294
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0437 0.003 14.857 0.000 0.038 0.049
feduc 0.0016 0.001 2.485 0.013 0.000 0.003

Babies with high birth order are slightly more likely to be girls.


In [75]:
model = smf.logit('boy ~ lbo_rec', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692860
         Iterations 3
lbo_rec                105.7   105.3 *
Out[75]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3922232
Model: Logit Df Residuals: 3922230
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.349e-06
Time: 16:27:35 Log-Likelihood: -2.7176e+06
converged: True LL-Null: -2.7176e+06
LLR p-value: 1.163e-06
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0559 0.002 28.436 0.000 0.052 0.060
lbo_rec -0.0039 0.001 -4.862 0.000 -0.005 -0.002

In [76]:
model = smf.logit('boy ~ highbo', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692863
         Iterations 3
highbo                 104.9   104.0  
Out[76]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3922232
Model: Logit Df Residuals: 3922230
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 6.688e-07
Time: 16:27:41 Log-Likelihood: -2.7176e+06
converged: True LL-Null: -2.7176e+06
LLR p-value: 0.05657
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0481 0.001 46.434 0.000 0.046 0.050
highbo -0.0089 0.005 -1.907 0.057 -0.018 0.000

Strangely, prenatal visits are associated with an increased probability of girls.


In [77]:
model = smf.logit('boy ~ previs', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692829
         Iterations 3
previs                 104.6   103.6 *
Out[77]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3822139
Model: Logit Df Residuals: 3822137
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.759e-05
Time: 16:27:46 Log-Likelihood: -2.6481e+06
converged: True LL-Null: -2.6482e+06
LLR p-value: 2.660e-68
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0453 0.001 44.004 0.000 0.043 0.047
previs -0.0095 0.001 -17.463 0.000 -0.011 -0.008

The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits.


In [78]:
model = smf.logit('boy ~ no_previs + previs', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692828
         Iterations 3
no_previs              104.7   102.4 *
previs                 104.7   103.6 *
Out[78]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3822139
Model: Logit Df Residuals: 3822136
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.862e-05
Time: 16:27:52 Log-Likelihood: -2.6481e+06
converged: True LL-Null: -2.6482e+06
LLR p-value: 3.790e-68
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0455 0.001 44.041 0.000 0.043 0.048
no_previs -0.0216 0.009 -2.339 0.019 -0.040 -0.004
previs -0.0101 0.001 -17.064 0.000 -0.011 -0.009

If the mother qualifies for food stamps, she is more likely to have a girl.


In [79]:
model = smf.logit('boy ~ wic', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692864
         Iterations 3
wic[T.Y]               105.2   104.4 *
Out[79]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3474883
Model: Logit Df Residuals: 3474881
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.939e-06
Time: 16:28:18 Log-Likelihood: -2.4076e+06
converged: True LL-Null: -2.4076e+06
LLR p-value: 0.0001686
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0511 0.001 35.284 0.000 0.048 0.054
wic[T.Y] -0.0081 0.002 -3.762 0.000 -0.012 -0.004

Mother's height seems to have no predictive value.


In [80]:
model = smf.logit('boy ~ height', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692865
         Iterations 3
height                 102.6   102.7  
Out[80]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3493485
Model: Logit Df Residuals: 3493483
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.650e-07
Time: 16:28:24 Log-Likelihood: -2.4205e+06
converged: True LL-Null: -2.4205e+06
LLR p-value: 0.3714
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0260 0.024 1.079 0.280 -0.021 0.073
height 0.0003 0.000 0.894 0.371 -0.000 0.001

In [81]:
model = smf.logit('boy ~ mtall + mshort', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692864
         Iterations 3
mtall                  104.9   104.8  
mshort                 104.9   103.6 *
Out[81]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3493485
Model: Logit Df Residuals: 3493482
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.267e-07
Time: 16:28:30 Log-Likelihood: -2.4205e+06
converged: True LL-Null: -2.4205e+06
LLR p-value: 0.1061
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0480 0.001 43.296 0.000 0.046 0.050
mtall -0.0015 0.006 -0.242 0.809 -0.013 0.010
mshort -0.0122 0.006 -2.111 0.035 -0.024 -0.001

Mother's with higher BMI are more likely to have girls.


In [82]:
model = smf.logit('boy ~ bmi_r', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692864
         Iterations 3
bmi_r                  106.0   105.6 *
Out[82]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3401718
Model: Logit Df Residuals: 3401716
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.511e-06
Time: 16:28:36 Log-Likelihood: -2.3569e+06
converged: True LL-Null: -2.3569e+06
LLR p-value: 4.736e-05
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0581 0.003 20.400 0.000 0.053 0.064
bmi_r -0.0038 0.001 -4.068 0.000 -0.006 -0.002

In [83]:
model = smf.logit('boy ~ obese', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692865
         Iterations 3
obese                  105.1   104.2 *
Out[83]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3401718
Model: Logit Df Residuals: 3401716
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.997e-06
Time: 16:28:41 Log-Likelihood: -2.3569e+06
converged: True LL-Null: -2.3569e+06
LLR p-value: 0.002152
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0493 0.001 39.558 0.000 0.047 0.052
obese -0.0078 0.003 -3.068 0.002 -0.013 -0.003

If payment was made by Medicaid, the baby is more likely to be a girl. Private insurance, self-payment, and other payment method are associated with more boys.


In [84]:
model = smf.logit('boy ~ C(pay_rec)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692861
         Iterations 3
C(pay_rec)[T.2.0]      104.4   105.1 *
C(pay_rec)[T.3.0]      104.4   106.4 *
C(pay_rec)[T.4.0]      104.4   105.7 *
Out[84]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3519595
Model: Logit Df Residuals: 3519591
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.468e-06
Time: 16:29:09 Log-Likelihood: -2.4386e+06
converged: True LL-Null: -2.4386e+06
LLR p-value: 7.204e-05
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0428 0.002 26.465 0.000 0.040 0.046
C(pay_rec)[T.2.0] 0.0070 0.002 3.116 0.002 0.003 0.011
C(pay_rec)[T.3.0] 0.0195 0.005 3.633 0.000 0.009 0.030
C(pay_rec)[T.4.0] 0.0128 0.005 2.518 0.012 0.003 0.023

Adding controls

However, none of the previous results should be taken too seriously. We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.

In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value.


In [85]:
formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692815
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.1 *
C(fbrace)[T.3.0]       105.7   104.5  
C(fbrace)[T.4.0]       105.7   106.8  
C(mbrace)[T.2]         105.7   106.3  
C(mbrace)[T.3]         105.7   107.7  
C(mbrace)[T.4]         105.7   106.0  
fhisp                  105.7   104.7 *
mhisp                  105.7   105.2  
Out[85]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3178957
Model: Logit Df Residuals: 3178948
Method: MLE Df Model: 8
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.589e-05
Time: 16:30:01 Log-Likelihood: -2.2024e+06
converged: True LL-Null: -2.2025e+06
LLR p-value: 4.910e-12
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0552 0.001 37.128 0.000 0.052 0.058
C(fbrace)[T.2.0] -0.0244 0.006 -4.319 0.000 -0.036 -0.013
C(fbrace)[T.3.0] -0.0112 0.012 -0.903 0.366 -0.036 0.013
C(fbrace)[T.4.0] 0.0101 0.007 1.382 0.167 -0.004 0.024
C(mbrace)[T.2] 0.0058 0.006 0.968 0.333 -0.006 0.018
C(mbrace)[T.3] 0.0187 0.013 1.434 0.152 -0.007 0.044
C(mbrace)[T.4] 0.0027 0.007 0.383 0.702 -0.011 0.016
fhisp -0.0090 0.004 -2.042 0.041 -0.018 -0.000
mhisp -0.0047 0.004 -1.076 0.282 -0.013 0.004

In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity.


In [86]:
formula = ('boy ~ C(fbrace) + fhisp + mar_p')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692814
         Iterations 3
C(fbrace)[T.2.0]       106.0   103.9 *
C(fbrace)[T.3.0]       106.0   105.7  
C(fbrace)[T.4.0]       106.0   107.0  
mar_p[T.Y]             106.0   105.7  
fhisp                  106.0   104.6 *
Out[86]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2865016
Model: Logit Df Residuals: 2865010
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.324e-05
Time: 16:30:50 Log-Likelihood: -1.9849e+06
converged: True LL-Null: -1.9850e+06
LLR p-value: 4.153e-10
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0579 0.015 3.834 0.000 0.028 0.088
C(fbrace)[T.2.0] -0.0195 0.003 -5.734 0.000 -0.026 -0.013
C(fbrace)[T.3.0] -0.0027 0.012 -0.232 0.816 -0.026 0.020
C(fbrace)[T.4.0] 0.0093 0.005 1.936 0.053 -0.000 0.019
mar_p[T.Y] -0.0023 0.015 -0.156 0.876 -0.032 0.027
fhisp -0.0128 0.003 -4.120 0.000 -0.019 -0.007

Being married still predicts more boys.


In [87]:
formula = ('boy ~ C(fbrace) + fhisp + dmar')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692816
         Iterations 3
C(fbrace)[T.2.0]       105.3   103.2 *
C(fbrace)[T.3.0]       105.3   105.1  
C(fbrace)[T.4.0]       105.3   106.7 *
fhisp                  105.3   103.9 *
dmar                   105.3   105.6  
Out[87]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3182886
Model: Logit Df Residuals: 3182880
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.516e-05
Time: 16:31:20 Log-Likelihood: -2.2052e+06
converged: True LL-Null: -2.2052e+06
LLR p-value: 4.631e-13
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0521 0.003 15.120 0.000 0.045 0.059
C(fbrace)[T.2.0] -0.0210 0.003 -6.234 0.000 -0.028 -0.014
C(fbrace)[T.3.0] -0.0021 0.011 -0.190 0.849 -0.023 0.019
C(fbrace)[T.4.0] 0.0125 0.004 2.794 0.005 0.004 0.021
fhisp -0.0133 0.003 -4.459 0.000 -0.019 -0.007
dmar 0.0026 0.003 1.001 0.317 -0.002 0.008

The effect of education disappears.


In [88]:
formula = ('boy ~ C(fbrace) + fhisp + lowed')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692813
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.6 *
C(fbrace)[T.3.0]       105.7   105.4  
C(fbrace)[T.4.0]       105.7   106.7  
fhisp                  105.7   104.2 *
lowed                  105.7   106.2  
Out[88]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2844274
Model: Logit Df Residuals: 2844268
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.362e-05
Time: 16:31:47 Log-Likelihood: -1.9706e+06
converged: True LL-Null: -1.9706e+06
LLR p-value: 2.422e-10
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0554 0.002 35.743 0.000 0.052 0.058
C(fbrace)[T.2.0] -0.0197 0.003 -5.751 0.000 -0.026 -0.013
C(fbrace)[T.3.0] -0.0025 0.012 -0.215 0.830 -0.026 0.021
C(fbrace)[T.4.0] 0.0091 0.005 1.882 0.060 -0.000 0.019
fhisp -0.0141 0.003 -4.359 0.000 -0.020 -0.008
lowed 0.0045 0.004 1.196 0.232 -0.003 0.012

The effect of birth order disappears.


In [89]:
formula = ('boy ~ C(fbrace) + fhisp + highbo')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692816
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.6 *
C(fbrace)[T.3.0]       105.7   105.7  
C(fbrace)[T.4.0]       105.7   107.0 *
fhisp                  105.7   104.3 *
highbo                 105.7   105.5  
Out[89]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3170216
Model: Logit Df Residuals: 3170210
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.527e-05
Time: 16:32:15 Log-Likelihood: -2.1964e+06
converged: True LL-Null: -2.1964e+06
LLR p-value: 4.151e-13
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0553 0.001 37.705 0.000 0.052 0.058
C(fbrace)[T.2.0] -0.0202 0.003 -6.216 0.000 -0.027 -0.014
C(fbrace)[T.3.0] -0.0002 0.011 -0.016 0.987 -0.022 0.021
C(fbrace)[T.4.0] 0.0123 0.004 2.735 0.006 0.003 0.021
fhisp -0.0129 0.003 -4.410 0.000 -0.019 -0.007
highbo -0.0022 0.005 -0.404 0.686 -0.013 0.009

WIC is no longer associated with more girls.


In [90]:
formula = ('boy ~ C(fbrace) + fhisp + wic')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692815
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.6 *
C(fbrace)[T.3.0]       105.7   105.3  
C(fbrace)[T.4.0]       105.7   106.8 *
wic[T.Y]               105.7   105.7  
fhisp                  105.7   104.3 *
Out[90]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2798726
Model: Logit Df Residuals: 2798720
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.387e-05
Time: 16:33:05 Log-Likelihood: -1.9390e+06
converged: True LL-Null: -1.9390e+06
LLR p-value: 2.328e-10
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0554 0.002 32.654 0.000 0.052 0.059
C(fbrace)[T.2.0] -0.0200 0.004 -5.620 0.000 -0.027 -0.013
C(fbrace)[T.3.0] -0.0036 0.012 -0.299 0.765 -0.027 0.020
C(fbrace)[T.4.0] 0.0102 0.005 2.103 0.035 0.001 0.020
wic[T.Y] 0.0004 0.003 0.151 0.880 -0.005 0.006
fhisp -0.0129 0.003 -3.938 0.000 -0.019 -0.006

The effect of obesity disappears.


In [91]:
formula = ('boy ~ C(fbrace) + fhisp + obese')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692816
         Iterations 3
C(fbrace)[T.2.0]       105.8   103.7 *
C(fbrace)[T.3.0]       105.8   105.8  
C(fbrace)[T.4.0]       105.8   106.7  
fhisp                  105.8   104.5 *
obese                  105.8   105.3  
Out[91]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2744537
Model: Logit Df Residuals: 2744531
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.454e-05
Time: 16:33:32 Log-Likelihood: -1.9015e+06
converged: True LL-Null: -1.9015e+06
LLR p-value: 1.139e-10
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.002 33.872 0.000 0.053 0.060
C(fbrace)[T.2.0] -0.0200 0.004 -5.681 0.000 -0.027 -0.013
C(fbrace)[T.3.0] -0.0001 0.012 -0.009 0.993 -0.024 0.023
C(fbrace)[T.4.0] 0.0078 0.005 1.589 0.112 -0.002 0.018
fhisp -0.0128 0.003 -4.036 0.000 -0.019 -0.007
obese -0.0046 0.003 -1.588 0.112 -0.010 0.001

The effect of payment method is diminished, but self-payment is still associated with more boys.


In [92]:
formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692811
         Iterations 3
C(fbrace)[T.2.0]       106.0   103.8 *
C(fbrace)[T.3.0]       106.0   105.3  
C(fbrace)[T.4.0]       106.0   106.9  
C(pay_rec)[T.2.0]      106.0   105.4  
C(pay_rec)[T.3.0]      106.0   108.2 *
C(pay_rec)[T.4.0]      106.0   106.9  
fhisp                  106.0   104.3 *
Out[92]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2833065
Model: Logit Df Residuals: 2833057
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.928e-05
Time: 16:34:22 Log-Likelihood: -1.9628e+06
converged: True LL-Null: -1.9628e+06
LLR p-value: 1.047e-13
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0579 0.002 23.562 0.000 0.053 0.063
C(fbrace)[T.2.0] -0.0207 0.004 -5.863 0.000 -0.028 -0.014
C(fbrace)[T.3.0] -0.0065 0.012 -0.548 0.584 -0.030 0.017
C(fbrace)[T.4.0] 0.0089 0.005 1.858 0.063 -0.000 0.018
C(pay_rec)[T.2.0] -0.0051 0.003 -1.901 0.057 -0.010 0.000
C(pay_rec)[T.3.0] 0.0213 0.006 3.366 0.001 0.009 0.034
C(pay_rec)[T.4.0] 0.0084 0.006 1.456 0.145 -0.003 0.020
fhisp -0.0161 0.003 -4.953 0.000 -0.022 -0.010

But the effect of prenatal visits is still a strong predictor of more girls.


In [93]:
formula = ('boy ~ C(fbrace) + fhisp + previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692772
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.0 *
C(fbrace)[T.3.0]       105.7   104.8  
C(fbrace)[T.4.0]       105.7   106.9 *
fhisp                  105.7   104.0 *
previs                 105.7   104.5 *
Out[93]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3096464
Model: Logit Df Residuals: 3096458
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.021e-05
Time: 16:34:51 Log-Likelihood: -2.1451e+06
converged: True LL-Null: -2.1453e+06
LLR p-value: 1.832e-81
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0552 0.001 37.584 0.000 0.052 0.058
C(fbrace)[T.2.0] -0.0255 0.003 -7.701 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0082 0.011 -0.747 0.455 -0.030 0.013
C(fbrace)[T.4.0] 0.0111 0.005 2.436 0.015 0.002 0.020
fhisp -0.0159 0.003 -5.363 0.000 -0.022 -0.010
previs -0.0115 0.001 -17.928 0.000 -0.013 -0.010

And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits.


In [94]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692772
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.0 *
C(fbrace)[T.3.0]       105.7   104.8  
C(fbrace)[T.4.0]       105.7   106.9 *
fhisp                  105.7   104.0 *
previs                 105.7   104.5 *
no_previs              105.7   103.8  
Out[94]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3096464
Model: Logit Df Residuals: 3096457
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.072e-05
Time: 16:35:20 Log-Likelihood: -2.1451e+06
converged: True LL-Null: -2.1453e+06
LLR p-value: 5.706e-81
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0554 0.001 37.601 0.000 0.052 0.058
C(fbrace)[T.2.0] -0.0254 0.003 -7.688 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0083 0.011 -0.754 0.451 -0.030 0.013
C(fbrace)[T.4.0] 0.0110 0.005 2.423 0.015 0.002 0.020
fhisp -0.0159 0.003 -5.344 0.000 -0.022 -0.010
previs -0.0118 0.001 -17.440 0.000 -0.013 -0.010
no_previs -0.0183 0.012 -1.485 0.138 -0.042 0.006

More controls

Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears.


In [95]:
formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692772
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.0 *
C(fbrace)[T.3.0]       105.7   104.8  
C(fbrace)[T.4.0]       105.7   106.9 *
fhisp                  105.7   104.0 *
previs                 105.7   104.5 *
dmar                   105.7   105.7  
Out[95]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3096464
Model: Logit Df Residuals: 3096457
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.021e-05
Time: 16:35:49 Log-Likelihood: -2.1451e+06
converged: True LL-Null: -2.1453e+06
LLR p-value: 1.697e-80
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0553 0.003 15.816 0.000 0.048 0.062
C(fbrace)[T.2.0] -0.0254 0.003 -7.384 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0082 0.011 -0.743 0.458 -0.030 0.013
C(fbrace)[T.4.0] 0.0111 0.005 2.431 0.015 0.002 0.020
fhisp -0.0159 0.003 -5.242 0.000 -0.022 -0.010
previs -0.0115 0.001 -17.900 0.000 -0.013 -0.010
dmar -8.777e-05 0.003 -0.034 0.973 -0.005 0.005

The effect of payment method disappears.


In [96]:
formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692775
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.1 *
C(fbrace)[T.3.0]       105.7   104.3  
C(fbrace)[T.4.0]       105.7   106.6  
C(pay_rec)[T.2.0]      105.7   105.6  
C(pay_rec)[T.3.0]      105.7   106.8  
C(pay_rec)[T.4.0]      105.7   106.6  
fhisp                  105.7   103.9 *
previs                 105.7   104.6 *
Out[96]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2752104
Model: Logit Df Residuals: 2752095
Method: MLE Df Model: 8
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.519e-05
Time: 16:36:40 Log-Likelihood: -1.9066e+06
converged: True LL-Null: -1.9068e+06
LLR p-value: 2.081e-65
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0557 0.002 22.311 0.000 0.051 0.061
C(fbrace)[T.2.0] -0.0249 0.004 -6.923 0.000 -0.032 -0.018
C(fbrace)[T.3.0] -0.0132 0.012 -1.089 0.276 -0.037 0.011
C(fbrace)[T.4.0] 0.0081 0.005 1.651 0.099 -0.002 0.018
C(pay_rec)[T.2.0] -0.0011 0.003 -0.397 0.691 -0.006 0.004
C(pay_rec)[T.3.0] 0.0106 0.006 1.642 0.101 -0.002 0.023
C(pay_rec)[T.4.0] 0.0083 0.006 1.412 0.158 -0.003 0.020
fhisp -0.0170 0.003 -5.174 0.000 -0.023 -0.011
previs -0.0109 0.001 -15.813 0.000 -0.012 -0.010

Here's a version with the addition of a boolean for no prenatal visits.


In [97]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692772
         Iterations 3
C(fbrace)[T.2.0]       105.7   103.0 *
C(fbrace)[T.3.0]       105.7   104.8  
C(fbrace)[T.4.0]       105.7   106.9 *
fhisp                  105.7   104.0 *
previs                 105.7   104.5 *
no_previs              105.7   103.8  
Out[97]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3096464
Model: Logit Df Residuals: 3096457
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.072e-05
Time: 16:37:07 Log-Likelihood: -2.1451e+06
converged: True LL-Null: -2.1453e+06
LLR p-value: 5.706e-81
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0554 0.001 37.601 0.000 0.052 0.058
C(fbrace)[T.2.0] -0.0254 0.003 -7.688 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0083 0.011 -0.754 0.451 -0.030 0.013
C(fbrace)[T.4.0] 0.0110 0.005 2.423 0.015 0.002 0.020
fhisp -0.0159 0.003 -5.344 0.000 -0.022 -0.010
previs -0.0118 0.001 -17.440 0.000 -0.013 -0.010
no_previs -0.0183 0.012 -1.485 0.138 -0.042 0.006

Now, surprisingly, the mother's age has a small effect.


In [98]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692772
         Iterations 3
C(fbrace)[T.2.0]       106.3   103.5 *
C(fbrace)[T.3.0]       106.3   105.3  
C(fbrace)[T.4.0]       106.3   107.5 *
fhisp                  106.3   104.5 *
previs                 106.3   105.0 *
no_previs              106.3   104.3  
mager9                 106.3   106.1  
Out[98]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3096464
Model: Logit Df Residuals: 3096456
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.107e-05
Time: 16:37:37 Log-Likelihood: -2.1451e+06
converged: True LL-Null: -2.1453e+06
LLR p-value: 2.300e-80
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0606 0.005 13.327 0.000 0.052 0.070
C(fbrace)[T.2.0] -0.0258 0.003 -7.770 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0089 0.011 -0.805 0.421 -0.030 0.013
C(fbrace)[T.4.0] 0.0114 0.005 2.505 0.012 0.002 0.020
fhisp -0.0162 0.003 -5.437 0.000 -0.022 -0.010
previs -0.0117 0.001 -17.323 0.000 -0.013 -0.010
no_previs -0.0182 0.012 -1.479 0.139 -0.042 0.006
mager9 -0.0012 0.001 -1.221 0.222 -0.003 0.001

So does the father's age. But both age effects are small and borderline significant.


In [99]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692772
         Iterations 3
C(fbrace)[T.2.0]       106.6   103.9 *
C(fbrace)[T.3.0]       106.6   105.6  
C(fbrace)[T.4.0]       106.6   107.8 *
fhisp                  106.6   104.8 *
previs                 106.6   105.3 *
no_previs              106.6   104.5  
fagerrec11             106.6   106.4 *
Out[99]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3088921
Model: Logit Df Residuals: 3088913
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.189e-05
Time: 16:38:07 Log-Likelihood: -2.1399e+06
converged: True LL-Null: -2.1401e+06
LLR p-value: 6.432e-81
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0637 0.004 14.796 0.000 0.055 0.072
C(fbrace)[T.2.0] -0.0258 0.003 -7.763 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0094 0.011 -0.856 0.392 -0.031 0.012
C(fbrace)[T.4.0] 0.0116 0.005 2.527 0.012 0.003 0.021
fhisp -0.0165 0.003 -5.530 0.000 -0.022 -0.011
previs -0.0118 0.001 -17.371 0.000 -0.013 -0.010
no_previs -0.0194 0.012 -1.569 0.117 -0.044 0.005
fagerrec11 -0.0017 0.001 -2.046 0.041 -0.003 -7.15e-05

What's up with prenatal visits?

The predictive power of prenatal visits is still surprising to me. To make sure we're controlled for race, I'll select cases where both parents are white:


In [100]:
white = df[(df.mbrace==1) & (df.fbrace==1)]
len(white)


Out[100]:
2373016

And compute sex ratios for each level of previs


In [101]:
var = 'previs'
white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[101]:
boy
previs
-6 111
-5 110
-4 110
-3 109
-2 109
-1 107
0 105
1 103
2 103
3 101
4 102

The effect holds up. People with fewer than average prenatal visits are substantially more likely to have boys.


In [102]:
formula = ('boy ~ previs + no_previs')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692766
         Iterations 3
previs                 105.3   104.0 *
no_previs              105.3   103.4  
Out[102]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2315682
Model: Logit Df Residuals: 2315679
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 7.429e-05
Time: 16:38:13 Log-Likelihood: -1.6042e+06
converged: True LL-Null: -1.6043e+06
LLR p-value: 1.719e-52
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0516 0.001 39.039 0.000 0.049 0.054
previs -0.0119 0.001 -14.973 0.000 -0.013 -0.010
no_previs -0.0180 0.015 -1.188 0.235 -0.048 0.012

In [103]:
inter = results.params['Intercept']
slope = results.params['previs']
inter, slope


Out[103]:
(0.051571918028023904, -0.01191111958144815)

In [104]:
previs = np.arange(-5, 5)
logodds = inter + slope * previs
odds = np.exp(logodds)
odds * 100


Out[104]:
array([ 111.75374016,  110.43052413,  109.1229756 ,  107.83090904,
        106.55414115,  105.29249079,  104.04577894,  102.81382875,
        101.59646541,  100.39351622])

In [105]:
formula = ('boy ~ dmar')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692808
         Iterations 3
dmar                   105.3   105.3  
Out[105]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2373016
Model: Logit Df Residuals: 2373014
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.132e-08
Time: 16:38:17 Log-Likelihood: -1.6440e+06
converged: True LL-Null: -1.6440e+06
LLR p-value: 0.8470
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0514 0.004 13.066 0.000 0.044 0.059
dmar 0.0006 0.003 0.193 0.847 -0.005 0.006

In [106]:
formula = ('boy ~ lowed')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692803
         Iterations 3
lowed                  105.4   105.5  
Out[106]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2128894
Model: Logit Df Residuals: 2128892
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.716e-08
Time: 16:38:19 Log-Likelihood: -1.4749e+06
converged: True LL-Null: -1.4749e+06
LLR p-value: 0.8220
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0523 0.001 35.788 0.000 0.049 0.055
lowed 0.0009 0.004 0.225 0.822 -0.007 0.009

In [107]:
formula = ('boy ~ highbo')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692808
         Iterations 3
highbo                 105.4   105.2  
Out[107]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2364914
Model: Logit Df Residuals: 2364912
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.675e-08
Time: 16:38:22 Log-Likelihood: -1.6384e+06
converged: True LL-Null: -1.6384e+06
LLR p-value: 0.8148
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0521 0.001 39.220 0.000 0.050 0.055
highbo -0.0015 0.006 -0.234 0.815 -0.014 0.011

In [108]:
formula = ('boy ~ wic')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692806
         Iterations 3
wic[T.Y]               105.4   105.3  
Out[108]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2096546
Model: Logit Df Residuals: 2096544
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.034e-07
Time: 16:38:38 Log-Likelihood: -1.4525e+06
converged: True LL-Null: -1.4525e+06
LLR p-value: 0.5836
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0528 0.002 30.597 0.000 0.049 0.056
wic[T.Y] -0.0016 0.003 -0.548 0.584 -0.007 0.004

In [109]:
formula = ('boy ~ obese')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692806
         Iterations 3
obese                  105.4   105.1  
Out[109]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2061631
Model: Logit Df Residuals: 2061629
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.055e-07
Time: 16:38:41 Log-Likelihood: -1.4283e+06
converged: True LL-Null: -1.4283e+06
LLR p-value: 0.3502
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0530 0.002 33.495 0.000 0.050 0.056
obese -0.0031 0.003 -0.934 0.350 -0.010 0.003

In [110]:
formula = ('boy ~ C(pay_rec)')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692804
         Iterations 3
C(pay_rec)[T.2.0]      105.3   105.2  
C(pay_rec)[T.3.0]      105.3   106.8  
C(pay_rec)[T.4.0]      105.3   106.5  
Out[110]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2118925
Model: Logit Df Residuals: 2118921
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.490e-06
Time: 16:38:58 Log-Likelihood: -1.4680e+06
converged: True LL-Null: -1.4680e+06
LLR p-value: 0.06264
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0519 0.002 21.966 0.000 0.047 0.057
C(pay_rec)[T.2.0] -0.0012 0.003 -0.416 0.677 -0.007 0.005
C(pay_rec)[T.3.0] 0.0134 0.007 1.853 0.064 -0.001 0.028
C(pay_rec)[T.4.0] 0.0112 0.007 1.641 0.101 -0.002 0.024

In [111]:
formula = ('boy ~ mager9')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692806
         Iterations 3
mager9                 106.9   106.5 *
Out[111]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2373016
Model: Logit Df Residuals: 2373014
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.743e-06
Time: 16:39:02 Log-Likelihood: -1.6440e+06
converged: True LL-Null: -1.6440e+06
LLR p-value: 0.002671
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0666 0.005 13.343 0.000 0.057 0.076
mager9 -0.0033 0.001 -3.003 0.003 -0.005 -0.001

In [112]:
formula = ('boy ~ youngm + oldm')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692806
         Iterations 3
youngm[T.True]         105.3   106.7 *
oldm[T.True]           105.3   103.9  
Out[112]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2373016
Model: Logit Df Residuals: 2373013
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.490e-06
Time: 16:39:07 Log-Likelihood: -1.6440e+06
converged: True LL-Null: -1.6440e+06
LLR p-value: 0.01668
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0519 0.001 38.350 0.000 0.049 0.055
youngm[T.True] 0.0129 0.006 2.146 0.032 0.001 0.025
oldm[T.True] -0.0137 0.008 -1.805 0.071 -0.029 0.001

In [113]:
formula = ('boy ~ youngf + oldf')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692807
         Iterations 3
youngf                 105.3   107.3 *
oldf                   105.3   103.9  
Out[113]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2368037
Model: Logit Df Residuals: 2368034
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.483e-06
Time: 16:39:11 Log-Likelihood: -1.6406e+06
converged: True LL-Null: -1.6406e+06
LLR p-value: 0.01701
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0520 0.001 38.943 0.000 0.049 0.055
youngf 0.0182 0.009 2.137 0.033 0.002 0.035
oldf -0.0140 0.008 -1.834 0.067 -0.029 0.001

In [ ]:


In [ ]: