Does Trivers-Willard apply to people?

This notebook contains a "one-day paper", my attempt to pose a research question, answer it, and publish the results in one work day.

Copyright 2016 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT


In [1]:
from __future__ import print_function, division

import thinkstats2
import thinkplot

import pandas as pd
import numpy as np

import statsmodels.formula.api as smf

%matplotlib inline

Trivers-Willard

According to Wikipedia, the Trivers-Willard hypothesis:

"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition)."

For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys. Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.

To test whether the T-W hypothesis holds up in humans, I downloaded birth data for the nearly 4 million babies born in the U.S. in 2014.

I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio.

Summary of results

  1. Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.

  2. However, many of the variables are also correlated with race. If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.

  3. Contrary to other reports, the age of the parents seems to have no predictive power.

  4. Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits. Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).

Following convention, I report sex ratio in terms of boys per 100 girls. The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls.

Data cleaning

Here's how I loaded the data:


In [2]:
names = ['year', 'mager9', 'mnativ', 'restatus', 'mbrace', 'mhisp_r',
        'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc', 
        'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']
colspecs = [(9, 12),
            (79, 79),
            (84, 84),
            (104, 104),
            (110, 110),
            (115, 115),
            (119, 119),
            (120, 120),
            (124, 124),
            (149, 150),
            (156, 156),
            (160, 160),
            (163, 163),
            (179, 179),
            (242, 243),
            (251, 251),
            (280, 281),
            (287, 287),
            (436, 436),
            (475, 475),
           ]

colspecs = [(start-1, end) for start, end in colspecs]

In [3]:
df = None

In [4]:
filename = 'Nat2014PublicUS.c20150514.r20151022.txt.gz'
#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)
#df.head()

In [5]:
# store the dataframe for faster loading

#store = pd.HDFStore('store.h5')
#store['births2014'] = df
#store.close()

In [6]:
# load the dataframe

store = pd.HDFStore('store.h5')
df = store['births2014']
store.close()

In [7]:
def series_to_ratio(series):
    """Takes a boolean series and computes sex ratio.
    """
    boys = np.mean(series)
    return np.round(100 * boys / (1-boys)).astype(int)

I have to recode sex as 0 or 1 to make logit happy.


In [8]:
df['boy'] = (df.sex=='M').astype(int)
df.boy.value_counts().sort_index()


Out[8]:
0    1952273
1    2045902
Name: boy, dtype: int64

All births are from 2014.


In [9]:
df.year.value_counts().sort_index()


Out[9]:
2014    3998175
Name: year, dtype: int64

Mother's age:


In [10]:
df.mager9.value_counts().sort_index()


Out[10]:
1       2777
2     249581
3     884246
4    1148469
5    1084064
6     510214
7     110318
8       7750
9        756
Name: mager9, dtype: int64

In [11]:
var = 'mager9'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[11]:
boy
mager9
1 109
2 105
3 105
4 105
5 105
6 105
7 104
8 104
9 102

In [12]:
df.mager9.isnull().mean()


Out[12]:
0.0

In [13]:
df['youngm'] = df.mager9<=2
df['oldm'] = df.mager9>=7
df.youngm.mean(), df.oldm.mean()


Out[13]:
(0.06311829772333627, 0.029719559549044251)

Mother's nativity (1 = born in the U.S.)


In [14]:
df.mnativ.replace([3], np.nan, inplace=True)
df.mnativ.value_counts().sort_index()


Out[14]:
1    3106689
2     881662
Name: mnativ, dtype: int64

In [15]:
var = 'mnativ'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[15]:
boy
mnativ
1 105
2 105

Residence status (1=resident)


In [16]:
df.restatus.value_counts().sort_index()


Out[16]:
1    2873404
2    1025766
3      88906
4      10099
Name: restatus, dtype: int64

In [17]:
var = 'restatus'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[17]:
boy
restatus
1 105
2 105
3 106
4 106

Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)


In [18]:
df.mbrace.value_counts().sort_index()


Out[18]:
1    3029013
2     641089
3      44962
4     283111
Name: mbrace, dtype: int64

In [19]:
var = 'mbrace'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[19]:
boy
mbrace
1 105
2 103
3 103
4 106

Mother's Hispanic origin (0=Non-Hispanic)


In [20]:
df.mhisp_r.replace([9], np.nan, inplace=True)
df.mhisp_r.value_counts().sort_index()


Out[20]:
0    3045419
1     553738
2      69894
3      20165
4     136785
5     141497
Name: mhisp_r, dtype: int64

In [21]:
def copy_null(df, oldvar, newvar):
    df.loc[df[oldvar].isnull(), newvar] = np.nan

In [22]:
df['mhisp'] = df.mhisp_r > 0
copy_null(df, 'mhisp_r', 'mhisp')
df.mhisp.isnull().mean(), df.mhisp.mean()


Out[22]:
(0.0076727506925034546, 0.23240818268843488)

In [23]:
var = 'mhisp'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[23]:
boy
mhisp
0 105
1 104

Marital status (1=Married)


In [24]:
df.dmar.value_counts().sort_index()


Out[24]:
1    2390630
2    1607545
Name: dmar, dtype: int64

In [25]:
var = 'dmar'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[25]:
boy
dmar
1 105
2 104

Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).

I recode X (not applicable because married) as Y (paternity acknowledged).


In [26]:
df.mar_p.replace(['U'], np.nan, inplace=True)
df.mar_p.replace(['X'], 'Y', inplace=True)
df.mar_p.value_counts().sort_index()


Out[26]:
N     462627
Y    3386542
Name: mar_p, dtype: int64

In [27]:
var = 'mar_p'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[27]:
boy
mar_p
N 103
Y 105

Mother's education level


In [28]:
df.meduc.replace([9], np.nan, inplace=True)
df.meduc.value_counts().sort_index()


Out[28]:
1    138589
2    437081
3    957265
4    815688
5    308384
6    732661
7    326800
8     94057
Name: meduc, dtype: int64

In [29]:
var = 'meduc'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[29]:
boy
meduc
1 104
2 104
3 105
4 105
5 105
6 105
7 105
8 104

In [30]:
df['lowed'] = df.meduc <= 2
copy_null(df, 'meduc', 'lowed')
df.lowed.isnull().mean(), df.lowed.mean()


Out[30]:
(0.046933913598079122, 0.15107367095085322)

Father's age, in 10 ranges


In [31]:
df.fagerrec11.replace([11], np.nan, inplace=True)
df.fagerrec11.value_counts().sort_index()


Out[31]:
1         277
2       84852
3      498779
4      869280
5     1025631
6      631685
7      262169
8       87432
9       28465
10      12490
Name: fagerrec11, dtype: int64

In [32]:
var = 'fagerrec11'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[32]:
boy
fagerrec11
1 102
2 106
3 106
4 105
5 105
6 105
7 105
8 105
9 104
10 109

In [33]:
df['youngf'] = df.fagerrec11<=2
copy_null(df, 'fagerrec11', 'youngf')
df.youngf.isnull().mean(), df.youngf.mean()


Out[33]:
(0.12433547806186572, 0.024315207394332003)

In [34]:
df['oldf'] = df.fagerrec11>=8
copy_null(df, 'fagerrec11', 'oldf')
df.oldf.isnull().mean(), df.oldf.mean()


Out[34]:
(0.12433547806186572, 0.036670893957829916)

Father's race


In [35]:
df.fbrace.replace([9], np.nan, inplace=True)
df.fbrace.value_counts().sort_index()


Out[35]:
1    2497901
2     482433
3      35408
4     238394
Name: fbrace, dtype: int64

In [36]:
var = 'fbrace'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[36]:
boy
fbrace
1 105
2 103
3 103
4 107

Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)


In [37]:
df.fhisp_r.replace([9], np.nan, inplace=True)
df.fhisp_r.value_counts().sort_index()


Out[37]:
0    2649007
1     493497
2      59137
3      19128
4     108111
5     124172
Name: fhisp_r, dtype: int64

In [38]:
df['fhisp'] = df.fhisp_r > 0
copy_null(df, 'fhisp_r', 'fhisp')
df.fhisp.isnull().mean(), df.fhisp.mean()


Out[38]:
(0.13634295647389122, 0.23285053338322156)

In [39]:
var = 'fhisp'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[39]:
boy
fhisp
0 105
1 104

Father's education level


In [40]:
df.feduc.replace([9], np.nan, inplace=True)
df.feduc.value_counts().sort_index()


Out[40]:
1    141654
2    342061
3    951980
4    643118
5    232622
6    616187
7    242022
8    109482
Name: feduc, dtype: int64

In [41]:
var = 'feduc'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[41]:
boy
feduc
1 104
2 105
3 105
4 105
5 106
6 105
7 105
8 105

Live birth order.


In [42]:
df.lbo_rec.replace([9], np.nan, inplace=True)
df.lbo_rec.value_counts().sort_index()


Out[42]:
1    1555006
2    1270496
3     669016
4     284435
5     110708
6      46093
7      20786
8      21610
Name: lbo_rec, dtype: int64

In [43]:
var = 'lbo_rec'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[43]:
boy
lbo_rec
1 105
2 105
3 105
4 105
5 104
6 104
7 104
8 102

In [44]:
df['highbo'] = df.lbo_rec >= 5
copy_null(df, 'lbo_rec', 'highbo')
df.highbo.isnull().mean(), df.highbo.mean()


Out[44]:
(0.0050085351441595226, 0.050072772519889897)

Number of prenatal visits, in 11 ranges


In [45]:
df.previs_rec.replace([12], np.nan, inplace=True)
df.previs_rec.value_counts().sort_index()


Out[45]:
1      59670
2      44923
3      98141
4     201032
5     366887
6     826908
7     998330
8     684997
9     379305
10     99067
11    128805
Name: previs_rec, dtype: int64

In [46]:
df.previs_rec.mean()
df['previs'] = df.previs_rec - 7

In [47]:
var = 'previs'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[47]:
boy
previs
-6 105
-5 107
-4 107
-3 108
-2 107
-1 106
0 105
1 103
2 102
3 102
4 102

In [48]:
df['no_previs'] = df.previs_rec <= 1
copy_null(df, 'previs_rec', 'no_previs')
df.no_previs.isnull().mean(), df.no_previs.mean()


Out[48]:
(0.027540065154726845, 0.015346965650008423)

Whether the mother is eligible for food stamps


In [49]:
df.wic.replace(['U'], np.nan, inplace=True)
df.wic.value_counts().sort_index()


Out[49]:
N    2124143
Y    1634978
Name: wic, dtype: int64

In [50]:
var = 'wic'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[50]:
boy
wic
N 105
Y 104

Mother's height in inches


In [51]:
df.height.replace([99], np.nan, inplace=True)
df.height.value_counts().sort_index()


Out[51]:
30        28
31         1
34         2
36        14
37         7
38         7
39         7
40         6
41        10
42        13
43         3
44         8
45        11
46        14
47        22
48       857
49       544
50       357
51       422
52       493
53      1503
54      1414
55      2762
56      6678
57     18359
58     21019
59     81588
60    209490
61    269142
62    474306
63    485840
64    559249
65    453503
66    429253
67    334485
68    189690
69    127789
70     62364
71     33428
72     15323
73      5200
74      2538
75      1019
76       590
77       593
78       941
Name: height, dtype: int64

In [52]:
df['mshort'] = df.height<60
copy_null(df, 'height', 'mshort')
df.mshort.isnull().mean(), df.mshort.mean()


Out[52]:
(0.051844404009329256, 0.0359147662344377)

In [53]:
df['mtall'] = df.height>=70
copy_null(df, 'height', 'mtall')
df.mtall.isnull().mean(), df.mtall.mean()


Out[53]:
(0.051844404009329256, 0.03218134412692316)

In [54]:
var = 'mshort'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[54]:
boy
mshort
0 105
1 104

In [55]:
var = 'mtall'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[55]:
boy
mtall
0 105
1 104

Mother's BMI in 6 ranges


In [56]:
df.bmi_r.replace([9], np.nan, inplace=True)
df.bmi_r.value_counts().sort_index()


Out[56]:
1     140142
2    1702519
3     949075
4     506017
5     242957
6     168515
Name: bmi_r, dtype: int64

In [57]:
var = 'bmi_r'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[57]:
boy
bmi_r
1 105
2 105
3 105
4 104
5 104
6 104

In [58]:
df['obese'] = df.bmi_r >= 4
copy_null(df, 'bmi_r', 'obese')
df.obese.isnull().mean(), df.obese.mean()


Out[58]:
(0.07227047340349034, 0.2473532880857861)

Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)


In [59]:
df.pay_rec.replace([9], np.nan, inplace=True)
df.pay_rec.value_counts().sort_index()


Out[59]:
1    1665161
2    1824151
3     162650
4     167806
Name: pay_rec, dtype: int64

In [60]:
var = 'pay_rec'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[60]:
boy
pay_rec
1 104
2 105
3 107
4 105

Sex of baby


In [61]:
df.sex.value_counts().sort_index()


Out[61]:
F    1952273
M    2045902
Name: sex, dtype: int64

Regression models

Here are some functions I'll use to interpret the results of logistic regression


In [62]:
def logodds_to_ratio(logodds):
    """Convert from log odds to probability."""
    odds = np.exp(logodds)
    return 100 * odds

def summarize(results):
    """Summarize parameters in terms of birth ratio."""
    inter_or = results.params['Intercept']
    inter_rat = logodds_to_ratio(inter_or)
    
    for value, lor in results.params.iteritems():
        if value=='Intercept':
            continue
        
        rat = logodds_to_ratio(inter_or + lor)
        code = '*' if results.pvalues[value] < 0.05 else ' '
        
        print('%-20s   %0.1f   %0.1f' % (value, inter_rat, rat), code)

Now I'll run models with each variable, one at a time.

Mother's age seems to have no predictive value:


In [63]:
model = smf.logit('boy ~ mager9', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692873
         Iterations 3
mager9                 105.1   105.0  
Out[63]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998173
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.129e-07
Time: 14:18:28 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 0.4290
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0496 0.004 13.550 0.000 0.042 0.057
mager9 -0.0007 0.001 -0.791 0.429 -0.002 0.001

The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant.


In [64]:
model = smf.logit('boy ~ youngm + oldm', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692873
         Iterations 3
youngm[T.True]         104.8   104.9  
oldm[T.True]           104.8   103.9  
Out[64]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998172
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.813e-07
Time: 14:18:33 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 0.3478
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0470 0.001 44.772 0.000 0.045 0.049
youngm[T.True] 0.0010 0.004 0.240 0.810 -0.007 0.009
oldm[T.True] -0.0084 0.006 -1.421 0.155 -0.020 0.003

Whether the mother was born in the U.S. has no predictive value


In [65]:
model = smf.logit('boy ~ C(mnativ)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692873
         Iterations 3
C(mnativ)[T.2.0]       104.8   104.9  
Out[65]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3988351
Model: Logit Df Residuals: 3988349
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.566e-08
Time: 14:19:00 Log-Likelihood: -2.7634e+06
converged: True LL-Null: -2.7634e+06
LLR p-value: 0.6154
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0466 0.001 41.050 0.000 0.044 0.049
C(mnativ)[T.2.0] 0.0012 0.002 0.502 0.615 -0.004 0.006

Neither does residence status


In [66]:
model = smf.logit('boy ~ C(restatus)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692872
         Iterations 3
C(restatus)[T.2]       104.8   104.7  
C(restatus)[T.3]       104.8   106.0  
C(restatus)[T.4]       104.8   106.2  
Out[66]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998171
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 6.716e-07
Time: 14:19:28 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 0.2932
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0468 0.001 39.653 0.000 0.044 0.049
C(restatus)[T.2] -0.0010 0.002 -0.418 0.676 -0.005 0.004
C(restatus)[T.3] 0.0117 0.007 1.718 0.086 -0.002 0.025
C(restatus)[T.4] 0.0132 0.020 0.663 0.507 -0.026 0.052

Mother's race seems to have predictive value. Relative to whites, black and Native American mothers have more girls; Asians have more boys.


In [67]:
model = smf.logit('boy ~ C(mbrace)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692863
         Iterations 3
C(mbrace)[T.2]         105.1   102.9 *
C(mbrace)[T.3]         105.1   103.1 *
C(mbrace)[T.4]         105.1   106.3 *
Out[67]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998171
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.401e-05
Time: 14:19:55 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 1.007e-16
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0497 0.001 43.250 0.000 0.047 0.052
C(mbrace)[T.2] -0.0214 0.003 -7.770 0.000 -0.027 -0.016
C(mbrace)[T.3] -0.0195 0.010 -2.049 0.041 -0.038 -0.001
C(mbrace)[T.4] 0.0109 0.004 2.777 0.005 0.003 0.019

Hispanic mothers have more girls.


In [68]:
model = smf.logit('boy ~ mhisp', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692874
         Iterations 3
mhisp                  105.0   104.1 *
Out[68]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3967498
Model: Logit Df Residuals: 3967496
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.998e-06
Time: 14:19:59 Log-Likelihood: -2.7490e+06
converged: True LL-Null: -2.7490e+06
LLR p-value: 0.0009174
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0485 0.001 42.263 0.000 0.046 0.051
mhisp -0.0079 0.002 -3.315 0.001 -0.013 -0.003

If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)


In [69]:
model = smf.logit('boy ~ C(mar_p)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692864
         Iterations 3
C(mar_p)[T.Y]          102.8   105.1 *
Out[69]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3849169
Model: Logit Df Residuals: 3849167
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.129e-06
Time: 14:20:27 Log-Likelihood: -2.6670e+06
converged: True LL-Null: -2.6670e+06
LLR p-value: 2.990e-12
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0278 0.003 9.446 0.000 0.022 0.034
C(mar_p)[T.Y] 0.0219 0.003 6.978 0.000 0.016 0.028

Being unmarried predicts more girls.


In [70]:
model = smf.logit('boy ~ C(dmar)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692871
         Iterations 3
C(dmar)[T.2]           105.1   104.3 *
Out[70]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998173
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.001e-06
Time: 14:20:54 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 4.555e-05
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0502 0.001 38.789 0.000 0.048 0.053
C(dmar)[T.2] -0.0083 0.002 -4.077 0.000 -0.012 -0.004

Each level of mother's education predicts a small increase in the probability of a boy.


In [71]:
model = smf.logit('boy ~ meduc', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692874
         Iterations 3
meduc                  104.1   104.2 *
Out[71]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3810525
Model: Logit Df Residuals: 3810523
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.416e-06
Time: 14:20:59 Log-Likelihood: -2.6402e+06
converged: True LL-Null: -2.6402e+06
LLR p-value: 0.006248
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0398 0.003 14.711 0.000 0.034 0.045
meduc 0.0016 0.001 2.734 0.006 0.000 0.003

In [72]:
model = smf.logit('boy ~ lowed', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692874
         Iterations 3
lowed                  104.9   104.1 *
Out[72]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3810525
Model: Logit Df Residuals: 3810523
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.431e-06
Time: 14:21:03 Log-Likelihood: -2.6402e+06
converged: True LL-Null: -2.6402e+06
LLR p-value: 0.005983
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0478 0.001 43.002 0.000 0.046 0.050
lowed -0.0079 0.003 -2.749 0.006 -0.013 -0.002

Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance).


In [73]:
model = smf.logit('boy ~ fagerrec11', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692840
         Iterations 3
fagerrec11             105.9   105.7 *
Out[73]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3501060
Model: Logit Df Residuals: 3501058
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.226e-07
Time: 14:21:08 Log-Likelihood: -2.4257e+06
converged: True LL-Null: -2.4257e+06
LLR p-value: 0.04575
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0570 0.004 14.707 0.000 0.049 0.065
fagerrec11 -0.0015 0.001 -1.998 0.046 -0.003 -2.9e-05

In [74]:
model = smf.logit('boy ~ youngf + oldf', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692840
         Iterations 3
youngf                 105.1   106.3  
oldf                   105.1   105.0  
Out[74]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3501060
Model: Logit Df Residuals: 3501057
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.807e-07
Time: 14:21:12 Log-Likelihood: -2.4257e+06
converged: True LL-Null: -2.4257e+06
LLR p-value: 0.2445
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0493 0.001 44.656 0.000 0.047 0.051
youngf 0.0116 0.007 1.673 0.094 -0.002 0.025
oldf -0.0005 0.006 -0.086 0.932 -0.012 0.011

Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers.


In [75]:
model = smf.logit('boy ~ C(fbrace)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692818
         Iterations 3
C(fbrace)[T.2.0]       105.5   103.1 *
C(fbrace)[T.3.0]       105.5   102.9 *
C(fbrace)[T.4.0]       105.5   106.6 *
Out[75]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3254136
Model: Logit Df Residuals: 3254132
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.504e-05
Time: 14:21:38 Log-Likelihood: -2.2545e+06
converged: True LL-Null: -2.2546e+06
LLR p-value: 1.256e-14
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0533 0.001 42.144 0.000 0.051 0.056
C(fbrace)[T.2.0] -0.0227 0.003 -7.221 0.000 -0.029 -0.017
C(fbrace)[T.3.0] -0.0250 0.011 -2.335 0.020 -0.046 -0.004
C(fbrace)[T.4.0] 0.0106 0.004 2.479 0.013 0.002 0.019

If the father is Hispanic, that predicts more girls.


In [76]:
model = smf.logit('boy ~ fhisp', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692839
         Iterations 3
fhisp                  105.4   104.0 *
Out[76]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3453052
Model: Logit Df Residuals: 3453050
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.800e-06
Time: 14:21:42 Log-Likelihood: -2.3924e+06
converged: True LL-Null: -2.3924e+06
LLR p-value: 1.378e-07
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0525 0.001 42.696 0.000 0.050 0.055
fhisp -0.0134 0.003 -5.268 0.000 -0.018 -0.008

Father's education level might predict more boys, but the apparent effect could be due to chance.


In [77]:
model = smf.logit('boy ~ feduc', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692840
         Iterations 3
feduc                  104.6   104.7  
Out[77]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3279126
Model: Logit Df Residuals: 3279124
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.046e-07
Time: 14:21:46 Log-Likelihood: -2.2719e+06
converged: True LL-Null: -2.2719e+06
LLR p-value: 0.05587
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0445 0.003 15.630 0.000 0.039 0.050
feduc 0.0012 0.001 1.912 0.056 -3.02e-05 0.002

Babies with high birth order are slightly more likely to be girls.


In [78]:
model = smf.logit('boy ~ lbo_rec', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692872
         Iterations 3
lbo_rec                105.3   105.1 *
Out[78]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3978150
Model: Logit Df Residuals: 3978148
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.576e-06
Time: 14:21:51 Log-Likelihood: -2.7563e+06
converged: True LL-Null: -2.7564e+06
LLR p-value: 0.003206
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0518 0.002 26.529 0.000 0.048 0.056
lbo_rec -0.0023 0.001 -2.947 0.003 -0.004 -0.001

In [79]:
model = smf.logit('boy ~ highbo', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692872
         Iterations 3
highbo                 104.9   103.4 *
Out[79]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3978150
Model: Logit Df Residuals: 3978148
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.647e-06
Time: 14:21:56 Log-Likelihood: -2.7563e+06
converged: True LL-Null: -2.7564e+06
LLR p-value: 0.002584
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0475 0.001 46.200 0.000 0.046 0.050
highbo -0.0139 0.005 -3.013 0.003 -0.023 -0.005

Strangely, prenatal visits are associated with an increased probability of girls.


In [80]:
model = smf.logit('boy ~ previs', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692847
         Iterations 3
previs                 104.6   103.8 *
Out[80]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3888065
Model: Logit Df Residuals: 3888063
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.975e-05
Time: 14:22:01 Log-Likelihood: -2.6938e+06
converged: True LL-Null: -2.6939e+06
LLR p-value: 1.677e-48
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0449 0.001 43.933 0.000 0.043 0.047
previs -0.0079 0.001 -14.634 0.000 -0.009 -0.007

The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits.


In [81]:
model = smf.logit('boy ~ no_previs + previs', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692842
         Iterations 3
no_previs              104.6   98.9 *
previs                 104.6   103.7 *
Out[81]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3888065
Model: Logit Df Residuals: 3888062
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.717e-05
Time: 14:22:07 Log-Likelihood: -2.6938e+06
converged: True LL-Null: -2.6939e+06
LLR p-value: 6.538e-56
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0454 0.001 44.310 0.000 0.043 0.047
no_previs -0.0564 0.009 -6.322 0.000 -0.074 -0.039
previs -0.0093 0.001 -15.938 0.000 -0.010 -0.008

If the mother qualifies for food stamps, she is more likely to have a girl.


In [82]:
model = smf.logit('boy ~ wic', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692869
         Iterations 3
wic[T.Y]               105.2   104.3 *
Out[82]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3759121
Model: Logit Df Residuals: 3759119
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.051e-06
Time: 14:22:35 Log-Likelihood: -2.6046e+06
converged: True LL-Null: -2.6046e+06
LLR p-value: 6.700e-05
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0506 0.001 36.886 0.000 0.048 0.053
wic[T.Y] -0.0083 0.002 -3.987 0.000 -0.012 -0.004

Mother's height seems to have no predictive value.


In [83]:
model = smf.logit('boy ~ height', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692873
         Iterations 3
height                 102.4   102.5  
Out[83]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3790892
Model: Logit Df Residuals: 3790890
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.853e-07
Time: 14:22:39 Log-Likelihood: -2.6266e+06
converged: True LL-Null: -2.6266e+06
LLR p-value: 0.3238
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0240 0.023 1.038 0.299 -0.021 0.069
height 0.0004 0.000 0.987 0.324 -0.000 0.001

In [84]:
model = smf.logit('boy ~ mtall + mshort', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692872
         Iterations 3
mtall                  104.8   104.1  
mshort                 104.8   104.3  
Out[84]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3790892
Model: Logit Df Residuals: 3790889
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.560e-07
Time: 14:22:43 Log-Likelihood: -2.6266e+06
converged: True LL-Null: -2.6266e+06
LLR p-value: 0.3019
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0473 0.001 44.433 0.000 0.045 0.049
mtall -0.0071 0.006 -1.212 0.226 -0.018 0.004
mshort -0.0056 0.006 -1.005 0.315 -0.016 0.005

Mother's with higher BMI are more likely to have girls.


In [85]:
model = smf.logit('boy ~ bmi_r', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692870
         Iterations 3
bmi_r                  105.7   105.4 *
Out[85]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3709225
Model: Logit Df Residuals: 3709223
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.168e-06
Time: 14:22:48 Log-Likelihood: -2.5700e+06
converged: True LL-Null: -2.5700e+06
LLR p-value: 0.0008442
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0554 0.003 20.336 0.000 0.050 0.061
bmi_r -0.0029 0.001 -3.338 0.001 -0.005 -0.001

In [86]:
model = smf.logit('boy ~ obese', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692870
         Iterations 3
obese                  105.0   104.2 *
Out[86]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3709225
Model: Logit Df Residuals: 3709223
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.347e-06
Time: 14:22:53 Log-Likelihood: -2.5700e+06
converged: True LL-Null: -2.5700e+06
LLR p-value: 0.0005139
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0491 0.001 40.976 0.000 0.047 0.051
obese -0.0084 0.002 -3.473 0.001 -0.013 -0.004

If payment was made by Medicaid, the baby is more likely to be a girl. Private insurance, self-payment, and other payment method are associated with more boys.


In [87]:
model = smf.logit('boy ~ C(pay_rec)', data=df)    
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692869
         Iterations 3
C(pay_rec)[T.2.0]      104.2   105.1 *
C(pay_rec)[T.3.0]      104.2   106.6 *
C(pay_rec)[T.4.0]      104.2   104.7  
Out[87]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3819768
Model: Logit Df Residuals: 3819764
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.306e-06
Time: 14:23:19 Log-Likelihood: -2.6466e+06
converged: True LL-Null: -2.6466e+06
LLR p-value: 3.482e-06
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0416 0.002 26.840 0.000 0.039 0.045
C(pay_rec)[T.2.0] 0.0085 0.002 3.982 0.000 0.004 0.013
C(pay_rec)[T.3.0] 0.0222 0.005 4.272 0.000 0.012 0.032
C(pay_rec)[T.4.0] 0.0047 0.005 0.925 0.355 -0.005 0.015

Adding controls

However, none of the previous results should be taken too seriously. We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.

In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value.


In [88]:
formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692816
         Iterations 3
C(fbrace)[T.2.0]       105.8   103.1 *
C(fbrace)[T.3.0]       105.8   103.5  
C(fbrace)[T.4.0]       105.8   106.9  
C(mbrace)[T.2]         105.8   105.9  
C(mbrace)[T.3]         105.8   104.5  
C(mbrace)[T.4]         105.8   105.6  
fhisp                  105.8   104.2 *
mhisp                  105.8   106.0  
Out[88]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3231530
Model: Logit Df Residuals: 3231521
Method: MLE Df Model: 8
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.087e-05
Time: 14:24:08 Log-Likelihood: -2.2389e+06
converged: True LL-Null: -2.2389e+06
LLR p-value: 9.292e-17
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.001 38.234 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0260 0.006 -4.668 0.000 -0.037 -0.015
C(fbrace)[T.3.0] -0.0221 0.012 -1.793 0.073 -0.046 0.002
C(fbrace)[T.4.0] 0.0097 0.007 1.344 0.179 -0.004 0.024
C(mbrace)[T.2] 0.0004 0.006 0.075 0.940 -0.011 0.012
C(mbrace)[T.3] -0.0130 0.013 -0.994 0.320 -0.039 0.013
C(mbrace)[T.4] -0.0026 0.007 -0.375 0.708 -0.016 0.011
fhisp -0.0156 0.004 -3.591 0.000 -0.024 -0.007
mhisp 0.0018 0.004 0.422 0.673 -0.007 0.010

In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity.


In [89]:
formula = ('boy ~ C(fbrace) + fhisp + mar_p')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692814
         Iterations 3
C(fbrace)[T.2.0]       108.2   105.5 *
C(fbrace)[T.3.0]       108.2   105.2 *
C(fbrace)[T.4.0]       108.2   109.1  
mar_p[T.Y]             108.2   105.8  
fhisp                  108.2   106.7 *
Out[89]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3112362
Model: Logit Df Residuals: 3112356
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.117e-05
Time: 14:24:56 Log-Likelihood: -2.1563e+06
converged: True LL-Null: -2.1563e+06
LLR p-value: 3.558e-18
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0792 0.015 5.155 0.000 0.049 0.109
C(fbrace)[T.2.0] -0.0258 0.003 -7.860 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0283 0.011 -2.594 0.009 -0.050 -0.007
C(fbrace)[T.4.0] 0.0074 0.004 1.662 0.097 -0.001 0.016
mar_p[T.Y] -0.0225 0.015 -1.464 0.143 -0.053 0.008
fhisp -0.0148 0.003 -4.982 0.000 -0.021 -0.009

Being married still predicts more boys.


In [90]:
formula = ('boy ~ C(fbrace) + fhisp + dmar')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692814
         Iterations 3
C(fbrace)[T.2.0]       105.0   102.2 *
C(fbrace)[T.3.0]       105.0   101.9 *
C(fbrace)[T.4.0]       105.0   105.9  
fhisp                  105.0   103.4 *
dmar                   105.0   105.7 *
Out[90]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3235798
Model: Logit Df Residuals: 3235792
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.183e-05
Time: 14:25:22 Log-Likelihood: -2.2418e+06
converged: True LL-Null: -2.2419e+06
LLR p-value: 1.485e-19
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0492 0.003 14.375 0.000 0.042 0.056
C(fbrace)[T.2.0] -0.0278 0.003 -8.324 0.000 -0.034 -0.021
C(fbrace)[T.3.0] -0.0301 0.011 -2.778 0.005 -0.051 -0.009
C(fbrace)[T.4.0] 0.0081 0.004 1.871 0.061 -0.000 0.017
fhisp -0.0156 0.003 -5.270 0.000 -0.021 -0.010
dmar 0.0062 0.003 2.416 0.016 0.001 0.011

The effect of education disappears.


In [91]:
formula = ('boy ~ C(fbrace) + fhisp + lowed')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692816
         Iterations 3
C(fbrace)[T.2.0]       105.8   103.1 *
C(fbrace)[T.3.0]       105.8   102.8 *
C(fbrace)[T.4.0]       105.8   106.5  
fhisp                  105.8   104.2 *
lowed                  105.8   106.0  
Out[91]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3091385
Model: Logit Df Residuals: 3091379
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.076e-05
Time: 14:25:47 Log-Likelihood: -2.1418e+06
converged: True LL-Null: -2.1418e+06
LLR p-value: 1.130e-17
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.001 37.993 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0259 0.003 -7.838 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0287 0.011 -2.624 0.009 -0.050 -0.007
C(fbrace)[T.4.0] 0.0067 0.004 1.487 0.137 -0.002 0.015
fhisp -0.0152 0.003 -4.927 0.000 -0.021 -0.009
lowed 0.0017 0.004 0.462 0.644 -0.006 0.009

The effect of birth order disappears.


In [92]:
formula = ('boy ~ C(fbrace) + fhisp + highbo')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692816
         Iterations 3
C(fbrace)[T.2.0]       105.8   103.2 *
C(fbrace)[T.3.0]       105.8   102.9 *
C(fbrace)[T.4.0]       105.8   106.6  
fhisp                  105.8   104.4 *
highbo                 105.8   105.6  
Out[92]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3221819
Model: Logit Df Residuals: 3221813
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.029e-05
Time: 14:26:13 Log-Likelihood: -2.2321e+06
converged: True LL-Null: -2.2322e+06
LLR p-value: 5.072e-18
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.001 38.815 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0253 0.003 -7.841 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0284 0.011 -2.616 0.009 -0.050 -0.007
C(fbrace)[T.4.0] 0.0077 0.004 1.758 0.079 -0.001 0.016
fhisp -0.0139 0.003 -4.785 0.000 -0.020 -0.008
highbo -0.0026 0.005 -0.483 0.629 -0.013 0.008

WIC is no longer associated with more girls.


In [93]:
formula = ('boy ~ C(fbrace) + fhisp + wic')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692813
         Iterations 3
C(fbrace)[T.2.0]       105.8   103.0 *
C(fbrace)[T.3.0]       105.8   103.0 *
C(fbrace)[T.4.0]       105.8   106.6  
wic[T.Y]               105.8   106.1  
fhisp                  105.8   104.1 *
Out[93]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3040527
Model: Logit Df Residuals: 3040521
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.175e-05
Time: 14:27:01 Log-Likelihood: -2.1065e+06
converged: True LL-Null: -2.1066e+06
LLR p-value: 3.031e-18
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0564 0.002 34.772 0.000 0.053 0.060
C(fbrace)[T.2.0] -0.0271 0.003 -7.892 0.000 -0.034 -0.020
C(fbrace)[T.3.0] -0.0267 0.011 -2.405 0.016 -0.048 -0.005
C(fbrace)[T.4.0] 0.0076 0.005 1.670 0.095 -0.001 0.016
wic[T.Y] 0.0025 0.003 0.975 0.330 -0.002 0.007
fhisp -0.0161 0.003 -5.153 0.000 -0.022 -0.010

The effect of obesity disappears.


In [94]:
formula = ('boy ~ C(fbrace) + fhisp + obese')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692815
         Iterations 3
C(fbrace)[T.2.0]       105.9   103.3 *
C(fbrace)[T.3.0]       105.9   103.1 *
C(fbrace)[T.4.0]       105.9   106.5  
fhisp                  105.9   104.3 *
obese                  105.9   105.7  
Out[94]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3005073
Model: Logit Df Residuals: 3005067
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.947e-05
Time: 14:27:26 Log-Likelihood: -2.0820e+06
converged: True LL-Null: -2.0820e+06
LLR p-value: 5.013e-16
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0571 0.002 35.622 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0247 0.003 -7.305 0.000 -0.031 -0.018
C(fbrace)[T.3.0] -0.0266 0.011 -2.410 0.016 -0.048 -0.005
C(fbrace)[T.4.0] 0.0056 0.005 1.217 0.224 -0.003 0.015
fhisp -0.0151 0.003 -4.996 0.000 -0.021 -0.009
obese -0.0014 0.003 -0.524 0.600 -0.007 0.004

The effect of payment method is diminished, but self-payment is still associated with more boys.


In [95]:
formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692812
         Iterations 3
C(fbrace)[T.2.0]       106.1   103.3 *
C(fbrace)[T.3.0]       106.1   103.0 *
C(fbrace)[T.4.0]       106.1   106.7  
C(pay_rec)[T.2.0]      106.1   105.7  
C(pay_rec)[T.3.0]      106.1   108.3 *
C(pay_rec)[T.4.0]      106.1   105.4  
fhisp                  106.1   104.4 *
Out[95]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3086812
Model: Logit Df Residuals: 3086804
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.500e-05
Time: 14:28:14 Log-Likelihood: -2.1386e+06
converged: True LL-Null: -2.1386e+06
LLR p-value: 3.965e-20
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0593 0.002 25.249 0.000 0.055 0.064
C(fbrace)[T.2.0] -0.0271 0.003 -7.980 0.000 -0.034 -0.020
C(fbrace)[T.3.0] -0.0297 0.011 -2.696 0.007 -0.051 -0.008
C(fbrace)[T.4.0] 0.0056 0.004 1.239 0.216 -0.003 0.014
C(pay_rec)[T.2.0] -0.0043 0.003 -1.680 0.093 -0.009 0.001
C(pay_rec)[T.3.0] 0.0203 0.006 3.331 0.001 0.008 0.032
C(pay_rec)[T.4.0] -0.0063 0.006 -1.094 0.274 -0.018 0.005
fhisp -0.0167 0.003 -5.378 0.000 -0.023 -0.011

But the effect of prenatal visits is still a strong predictor of more girls.


In [96]:
formula = ('boy ~ C(fbrace) + fhisp + previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692778
         Iterations 3
C(fbrace)[T.2.0]       105.8   102.8 *
C(fbrace)[T.3.0]       105.8   102.3 *
C(fbrace)[T.4.0]       105.8   106.4  
fhisp                  105.8   104.0 *
previs                 105.8   104.8 *
Out[96]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155434
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 7.997e-05
Time: 14:28:40 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 2.081e-73
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0567 0.001 38.800 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0295 0.003 -9.008 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0341 0.011 -3.114 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0058 0.004 1.314 0.189 -0.003 0.014
fhisp -0.0172 0.003 -5.862 0.000 -0.023 -0.011
previs -0.0102 0.001 -16.235 0.000 -0.011 -0.009

And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits.


In [97]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692776
         Iterations 3
C(fbrace)[T.2.0]       105.9   102.8 *
C(fbrace)[T.3.0]       105.9   102.3 *
C(fbrace)[T.4.0]       105.9   106.5  
fhisp                  105.9   104.1 *
previs                 105.9   104.7 *
no_previs              105.9   101.0 *
Out[97]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155433
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.351e-05
Time: 14:29:06 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 8.674e-76
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0570 0.001 38.973 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0294 0.003 -8.984 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0342 0.011 -3.123 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0056 0.004 1.270 0.204 -0.003 0.014
fhisp -0.0171 0.003 -5.817 0.000 -0.023 -0.011
previs -0.0111 0.001 -16.625 0.000 -0.012 -0.010
no_previs -0.0469 0.012 -3.936 0.000 -0.070 -0.024

More controls

Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears.


In [98]:
formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692778
         Iterations 3
C(fbrace)[T.2.0]       105.3   102.1 *
C(fbrace)[T.3.0]       105.3   101.7 *
C(fbrace)[T.4.0]       105.3   106.0  
fhisp                  105.3   103.5 *
previs                 105.3   104.3 *
dmar                   105.3   105.7  
Out[98]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155433
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.045e-05
Time: 14:29:32 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 6.525e-73
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0521 0.003 15.015 0.000 0.045 0.059
C(fbrace)[T.2.0] -0.0309 0.003 -9.058 0.000 -0.038 -0.024
C(fbrace)[T.3.0] -0.0353 0.011 -3.210 0.001 -0.057 -0.014
C(fbrace)[T.4.0] 0.0062 0.004 1.394 0.163 -0.002 0.015
fhisp -0.0181 0.003 -6.033 0.000 -0.024 -0.012
previs -0.0102 0.001 -16.122 0.000 -0.011 -0.009
dmar 0.0037 0.003 1.446 0.148 -0.001 0.009

The effect of payment method disappears.


In [99]:
formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692777
         Iterations 3
C(fbrace)[T.2.0]       105.8   102.8 *
C(fbrace)[T.3.0]       105.8   102.2 *
C(fbrace)[T.4.0]       105.8   106.3  
C(pay_rec)[T.2.0]      105.8   105.9  
C(pay_rec)[T.3.0]      105.8   106.9  
C(pay_rec)[T.4.0]      105.8   105.0  
fhisp                  105.8   104.0 *
previs                 105.8   104.8 *
Out[99]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3009712
Model: Logit Df Residuals: 3009703
Method: MLE Df Model: 8
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.163e-05
Time: 14:30:20 Log-Likelihood: -2.0851e+06
converged: True LL-Null: -2.0852e+06
LLR p-value: 1.004e-68
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.002 23.765 0.000 0.052 0.061
C(fbrace)[T.2.0] -0.0295 0.003 -8.509 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0345 0.011 -3.090 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0046 0.005 1.012 0.312 -0.004 0.014
C(pay_rec)[T.2.0] 0.0005 0.003 0.174 0.862 -0.005 0.006
C(pay_rec)[T.3.0] 0.0100 0.006 1.619 0.105 -0.002 0.022
C(pay_rec)[T.4.0] -0.0074 0.006 -1.260 0.208 -0.019 0.004
fhisp -0.0178 0.003 -5.687 0.000 -0.024 -0.012
previs -0.0101 0.001 -15.540 0.000 -0.011 -0.009

Here's a version with the addition of a boolean for no prenatal visits.


In [100]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692776
         Iterations 3
C(fbrace)[T.2.0]       105.9   102.8 *
C(fbrace)[T.3.0]       105.9   102.3 *
C(fbrace)[T.4.0]       105.9   106.5  
fhisp                  105.9   104.1 *
previs                 105.9   104.7 *
no_previs              105.9   101.0 *
Out[100]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155433
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.351e-05
Time: 14:30:47 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 8.674e-76
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0570 0.001 38.973 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0294 0.003 -8.984 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0342 0.011 -3.123 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0056 0.004 1.270 0.204 -0.003 0.014
fhisp -0.0171 0.003 -5.817 0.000 -0.023 -0.011
previs -0.0111 0.001 -16.625 0.000 -0.012 -0.010
no_previs -0.0469 0.012 -3.936 0.000 -0.070 -0.024

Now, surprisingly, the mother's age has a small effect.


In [101]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692775
         Iterations 3
C(fbrace)[T.2.0]       106.8   103.6 *
C(fbrace)[T.3.0]       106.8   103.1 *
C(fbrace)[T.4.0]       106.8   107.4  
fhisp                  106.8   104.9 *
previs                 106.8   105.6 *
no_previs              106.8   101.9 *
mager9                 106.8   106.6 *
Out[101]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155432
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.440e-05
Time: 14:31:14 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 1.043e-75
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0656 0.005 14.344 0.000 0.057 0.075
C(fbrace)[T.2.0] -0.0300 0.003 -9.123 0.000 -0.036 -0.024
C(fbrace)[T.3.0] -0.0351 0.011 -3.200 0.001 -0.057 -0.014
C(fbrace)[T.4.0] 0.0062 0.004 1.413 0.158 -0.002 0.015
fhisp -0.0176 0.003 -5.974 0.000 -0.023 -0.012
previs -0.0110 0.001 -16.456 0.000 -0.012 -0.010
no_previs -0.0468 0.012 -3.926 0.000 -0.070 -0.023
mager9 -0.0019 0.001 -1.970 0.049 -0.004 -9.69e-06

So does the father's age. But both age effects are small and borderline significant.


In [104]:
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692775
         Iterations 3
C(fbrace)[T.2.0]       106.9   103.7 *
C(fbrace)[T.3.0]       106.9   103.2 *
C(fbrace)[T.4.0]       106.9   107.6  
fhisp                  106.9   105.0 *
previs                 106.9   105.7 *
no_previs              106.9   101.8 *
fagerrec11             106.9   106.7 *
Out[104]:
Logit Regression Results
Dep. Variable: boy No. Observations: 3148537
Model: Logit Df Residuals: 3148529
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.517e-05
Time: 14:32:34 Log-Likelihood: -2.1812e+06
converged: True LL-Null: -2.1814e+06
LLR p-value: 2.924e-76
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0663 0.004 15.399 0.000 0.058 0.075
C(fbrace)[T.2.0] -0.0299 0.003 -9.100 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0348 0.011 -3.170 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0067 0.004 1.518 0.129 -0.002 0.015
fhisp -0.0176 0.003 -5.974 0.000 -0.023 -0.012
previs -0.0110 0.001 -16.545 0.000 -0.012 -0.010
no_previs -0.0483 0.012 -4.039 0.000 -0.072 -0.025
fagerrec11 -0.0019 0.001 -2.278 0.023 -0.003 -0.000

What's up with prenatal visits?

The predictive power of prenatal visits is still surprising to me. To make sure we're controlled for race, I'll select cases where both parents are white:


In [110]:
white = df[(df.mbrace==1) & (df.fbrace==1)]
len(white)


Out[110]:
2400787

And compute sex ratios for each level of previs


In [111]:
var = 'previs'
white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)


Out[111]:
boy
previs
-6 107
-5 110
-4 108
-3 110
-2 108
-1 107
0 105
1 103
2 103
3 102
4 103

The effect holds up. People with fewer than average prenatal visits are substantially more likely to have boys.


In [112]:
formula = ('boy ~ previs + no_previs')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692749
         Iterations 3
previs                 105.5   104.3 *
no_previs              105.5   100.4 *
Out[112]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2346785
Model: Logit Df Residuals: 2346782
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 6.418e-05
Time: 14:40:39 Log-Likelihood: -1.6257e+06
converged: True LL-Null: -1.6258e+06
LLR p-value: 4.790e-46
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0534 0.001 40.728 0.000 0.051 0.056
previs -0.0113 0.001 -14.378 0.000 -0.013 -0.010
no_previs -0.0490 0.015 -3.352 0.001 -0.078 -0.020

In [113]:
inter = results.params['Intercept']
slope = results.params['previs']
inter, slope


Out[113]:
(0.053449172473506806, -0.011302385985286368)

In [114]:
previs = np.arange(-5, 5)
logodds = inter + slope * previs
odds = np.exp(logodds)
odds * 100


Out[114]:
array([ 111.62346508,  110.36895641,  109.12854687,  107.90207798,
        106.68939307,  105.49033723,  104.30475728,  103.13250177,
        101.97342096,  100.82736677])

In [116]:
formula = ('boy ~ dmar')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692788
         Iterations 3
dmar                   105.3   105.5  
Out[116]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2400787
Model: Logit Df Residuals: 2400785
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 7.406e-08
Time: 15:27:21 Log-Likelihood: -1.6632e+06
converged: True LL-Null: -1.6632e+06
LLR p-value: 0.6196
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0518 0.004 13.234 0.000 0.044 0.059
dmar 0.0014 0.003 0.496 0.620 -0.004 0.007

In [117]:
formula = ('boy ~ lowed')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692788
         Iterations 3
lowed                  105.6   105.0  
Out[117]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2301234
Model: Logit Df Residuals: 2301232
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.759e-07
Time: 15:28:01 Log-Likelihood: -1.5943e+06
converged: True LL-Null: -1.5943e+06
LLR p-value: 0.2180
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0542 0.001 38.603 0.000 0.051 0.057
lowed -0.0051 0.004 -1.232 0.218 -0.013 0.003

In [118]:
formula = ('boy ~ highbo')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692788
         Iterations 3
highbo                 105.5   105.6  
Out[118]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2391630
Model: Logit Df Residuals: 2391628
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.564e-09
Time: 15:28:25 Log-Likelihood: -1.6569e+06
converged: True LL-Null: -1.6569e+06
LLR p-value: 0.9021
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0535 0.001 40.493 0.000 0.051 0.056
highbo 0.0008 0.006 0.123 0.902 -0.012 0.013

In [119]:
formula = ('boy ~ wic')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692786
         Iterations 3
wic[T.Y]               105.6   105.3  
Out[119]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2266424
Model: Logit Df Residuals: 2266422
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.840e-07
Time: 15:28:57 Log-Likelihood: -1.5701e+06
converged: True LL-Null: -1.5701e+06
LLR p-value: 0.2721
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0548 0.002 33.369 0.000 0.052 0.058
wic[T.Y] -0.0031 0.003 -1.098 0.272 -0.009 0.002

In [120]:
formula = ('boy ~ obese')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692788
         Iterations 3
obese                  105.6   105.3  
Out[120]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2244349
Model: Logit Df Residuals: 2244347
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.725e-07
Time: 15:29:20 Log-Likelihood: -1.5549e+06
converged: True LL-Null: -1.5549e+06
LLR p-value: 0.4639
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0542 0.002 35.607 0.000 0.051 0.057
obese -0.0023 0.003 -0.732 0.464 -0.009 0.004

In [123]:
formula = ('boy ~ C(pay_rec)')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692786
         Iterations 3
C(pay_rec)[T.2.0]      105.4   105.5  
C(pay_rec)[T.3.0]      105.4   107.1 *
C(pay_rec)[T.4.0]      105.4   105.3  
Out[123]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2295681
Model: Logit Df Residuals: 2295677
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.666e-06
Time: 15:30:06 Log-Likelihood: -1.5904e+06
converged: True LL-Null: -1.5904e+06
LLR p-value: 0.1511
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0529 0.002 23.356 0.000 0.048 0.057
C(pay_rec)[T.2.0] 0.0004 0.003 0.147 0.883 -0.005 0.006
C(pay_rec)[T.3.0] 0.0159 0.007 2.235 0.025 0.002 0.030
C(pay_rec)[T.4.0] -0.0013 0.007 -0.197 0.844 -0.015 0.012

In [124]:
formula = ('boy ~ mager9')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692786
         Iterations 3
mager9                 107.0   106.7 *
Out[124]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2400787
Model: Logit Df Residuals: 2400785
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.516e-06
Time: 15:30:32 Log-Likelihood: -1.6632e+06
converged: True LL-Null: -1.6632e+06
LLR p-value: 0.003813
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0677 0.005 13.452 0.000 0.058 0.078
mager9 -0.0032 0.001 -2.893 0.004 -0.005 -0.001

In [125]:
formula = ('boy ~ youngm + oldm')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692787
         Iterations 3
youngm[T.True]         105.6   105.5  
oldm[T.True]           105.6   103.8 *
Out[125]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2400787
Model: Logit Df Residuals: 2400784
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.549e-06
Time: 15:31:04 Log-Likelihood: -1.6632e+06
converged: True LL-Null: -1.6632e+06
LLR p-value: 0.07608
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0542 0.001 40.370 0.000 0.052 0.057
youngm[T.True] -0.0011 0.006 -0.170 0.865 -0.013 0.011
oldm[T.True] -0.0173 0.008 -2.268 0.023 -0.032 -0.002

In [126]:
formula = ('boy ~ youngf + oldf')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()


Optimization terminated successfully.
         Current function value: 0.692787
         Iterations 3
youngf                 105.5   106.4  
oldf                   105.5   105.7  
Out[126]:
Logit Regression Results
Dep. Variable: boy No. Observations: 2396141
Model: Logit Df Residuals: 2396138
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.717e-07
Time: 15:31:50 Log-Likelihood: -1.6600e+06
converged: True LL-Null: -1.6600e+06
LLR p-value: 0.6370
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0534 0.001 40.229 0.000 0.051 0.056
youngf 0.0082 0.009 0.924 0.355 -0.009 0.026
oldf 0.0018 0.008 0.242 0.809 -0.013 0.017

In [ ]:


In [ ]: