In regression problems, we want to find a function that describes the relationship between observations and outputs. Our observations are recorded as rows in the matrix $X$, our response values as the column vector $Y$, and our estimate linear parameters as the column vector $\beta$. We think of our linear regression model as having a structural part and a random part: $Y= X\beta + \epsilon$ where $\epsilon$ ~ $N(0,1)$. The first two terms are structural and capture the linear relationship. The last is noise or variance in the data.

Linear regression doesn't require that the residuals $|r=Y - pred(X)|$ are normally distributed, but if they are, then the least-squares solution has many nice theoretical propoerties. Let's generate some fake data where the predictor and response are linearly related, and the residuals are normally distributed.



In [2]:

    
import numpy as np
from sklearn import cross_validation as cv
import matplotlib.pyplot as plt

%matplotlib inline
# use 100 samples
n_pts =100

beta= np.array([3.4])
X = np.expand_dims(np.linspace(0.001, 10, n_pts), axis=1)
def lin_resp_norm_err(beta, X):
    return np.dot(X, beta[:, np.newaxis]) + np.expand_dims(np.random.randn(n_pts), axis=1)
Y_lin_norm_err = lin_resp_norm_err(beta, X)

Let's split out data into test and train sets so that we can examine how our linear model performs on held out data.



In [3]:

    
X_train, X_test, Y_train, Y_test = cv.train_test_split(X, Y_lin_norm_err, test_size=0.33, random_state=42)

We will now train a linear model with an ordinary least squares predictor. We will calculate the mean squre error on the hold out set as well as the $R^{2}$ value for the fit.



In [4]:

    
from sklearn import linear_model
lin_regr = linear_model.LinearRegression()
lin_regr.fit(X_train, Y_train)

residuals = lin_regr.predict(X_test) - Y_test
mean_error = np.mean(residuals ** 2)
regr_score = lin_regr.score(X_test, Y_test)

plt.scatter(X, Y_lin_norm_err)
plt.plot(X, lin_regr.predict(X), color='red')
plt.text(1, 30, '$r^{2}=%s$' % regr_score)
plt.xlabel('Predictor Variable X')
plt.ylabel('Response Variable Y')
plt.show()

The norms of the residuals follow a normal distribution.



In [5]:

    
# the histogram of the data with histtype='step'
plt.hist(residuals, bins=25)
l = plt.show()

We commonly think of linear regression containing a structural ($X\beta$) part and a random ($\epsilon$) part. In generalized linear models, we instead focus on the structural component, the link function, and the response distribution. We think of $g(\mu)=X\beta$ where $g$ is the link function, $X\beta$ is the structural component, and $\mu$ is the conditional mean of the distribution at that point in feature space. Note that we can pad the feature vectors of our observations with a '1' to capture the offset in the model, exactly like we do in normal linear regression.

If the link function, $g$, is the identity, the repsonse funciton is linear, and the response function is conditionally normally distributed, we are back again at normal linear regression. Here is a pretty great explanation someone wrote about glms overs on stack overflow.

Count data is fundamentally different from continuous data-- it is always discrete (integer values) and nonnegative. Let's grab some real data to analyze. Here, we are grabbing grouped individual data where there is an indicator for geographic "cell" number, residence (1=Suva, 2=Urban, or 3=Rural), education level of the mother (1=none, 2=primary, 3=upper primary, 4=secondary), The mean number of children ever born, variance in children ever born, number of owmen in the cell.



In [56]:

    
import urllib2
import pandas as pd
import StringIO 

response = urllib2.urlopen('http://data.princeton.edu/wws509/datasets/ceb.dat')
children_ever_born = pd.read_csv(StringIO.StringIO(response.read()), sep='\s*')









    Out[56]:






  
    
      
      mean
    
  
  
    
      1 
       0.50
    
    
      2 
       1.14
    
    
      3 
       0.90
    
    
      4 
       0.73
    
    
      5 
       1.17
    
    
      6 
       0.85
    
    
      7 
       1.05
    
    
      8 
       0.69
    
    
      9 
       0.97
    
    
      10
       0.96
    
    
      11
       0.97
    
    
      12
       0.74
    
    
      13
       3.10
    
    
      14
       2.67
    
    
      15
       2.04
    
    
      16
       1.73
    
    
      17
       4.54
    
    
      18
       2.65
    
    
      19
       2.68
    
    
      20
       2.29
    
    
      21
       2.44
    
    
      22
       2.71
    
    
      23
       2.47
    
    
      24
       2.24
    
    
      25
       4.08
    
    
      26
       3.67
    
    
      27
       2.90
    
    
      28
       2.00
    
    
      29
       4.17
    
    
      30
       3.33
    
    
      ...
      ...
    
    
      41
       4.70
    
    
      42
       5.36
    
    
      43
       4.60
    
    
      44
       3.80
    
    
      45
       5.06
    
    
      46
       5.59
    
    
      47
       4.50
    
    
      48
       2.00
    
    
      49
       5.62
    
    
      50
       5.06
    
    
      51
       3.92
    
    
      52
       2.60
    
    
      53
       5.36
    
    
      54
       5.88
    
    
      55
       5.00
    
    
      56
       5.33
    
    
      57
       6.46
    
    
      58
       6.34
    
    
      59
       5.74
    
    
      60
       2.50
    
    
      61
       6.60
    
    
      62
       6.74
    
    
      63
       5.38
    
    
      64
       2.00
    
    
      65
       6.52
    
    
      66
       7.51
    
    
      67
       7.54
    
    
      69
       7.48
    
    
      70
       7.81
    
    
      71
       5.80
    
  

70 rows × 1 columns



In [62]:

    
sm.GLM(children_ever_born['mean'], children_ever_born[['dur', 'res', 'educ', 'y']], family=sm.families.Poisson())









    



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-52854b3f6a5b> in <module>()
----> 1 sm.GLM(children_ever_born['mean'], children_ever_born[['dur', 'res', 'educ', 'y']], family=sm.families.Poisson())

/home/juliet/anaconda/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.pyc in __init__(self, endog, exog, family, offset, exposure, missing)
    189         self._check_inputs(family, offset, exposure, endog)
    190         super(GLM, self).__init__(endog, exog, missing=missing,
--> 191                                   offset=self.offset, exposure=self.exposure)
    192         if offset is None:
    193             delattr(self, 'offset')

/home/juliet/anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc in __init__(self, endog, exog, **kwargs)
    135     def __init__(self, endog, exog=None, **kwargs):
    136         super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
--> 137         self.initialize()
    138 
    139     def initialize(self):

/home/juliet/anaconda/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.pyc in initialize(self)
    207                         'deviance' : [np.inf]}
    208 
--> 209         self.pinv_wexog = np.linalg.pinv(self.exog)
    210         self.normalized_cov_params = np.dot(self.pinv_wexog,
    211                                         np.transpose(self.pinv_wexog))

/home/juliet/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.pyc in pinv(a, rcond)
   1582     a, wrap = _makearray(a)
   1583     _assertNoEmpty2d(a)
-> 1584     a = a.conjugate()
   1585     u, s, vt = svd(a, 0)
   1586     m = u.shape[0]

AttributeError: 'str' object has no attribute 'conjugate'

If we plot the predictor and response variables we can see what we would expect out of the poisson distribution. As the mean increases, so does the variance in the data.



In [28]:

    
X_train, X_test, Y_p_train, Y_p_test = cv.train_test_split(mean, Y_poisson, test_size=0.33, random_state=42)
plt.scatter(X_test, Y_p_test)









    Out[28]:





<matplotlib.collections.PathCollection at 0x7f5578446350>



In [31]:

    
glm_poisson = sm.GLM(Y_p_train, X_train, family=sm.families.Poisson())
res = glm_poisson.fit()









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-811d80777d58> in <module>()
      1 glm_poisson = sm.GLM(Y_p_train, X_train, family=sm.families.Poisson())
----> 2 res = glm_poisson.fit()

/home/juliet/anaconda/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.pyc in fit(self, maxiter, method, tol, scale)
    389         dev = self.family.deviance(self.endog, mu)
    390         if np.isnan(dev):
--> 391             raise ValueError("The first guess on the deviance function "
    392                              "returned a nan.  This could be a boundary "
    393                              " problem and should be reported.")

ValueError: The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.



In [51]:









    Out[51]:






  
    
      
      dur
      res
      educ
      mean
      var
      n
      y
    
  
  
    
      1 
         0-4
        Suva
        none
       0.50
        1.14
         8
          4.00
    
    
      2 
         0-4
        Suva
       lower
       1.14
        0.73
        21
         23.94
    
    
      3 
         0-4
        Suva
       upper
       0.90
        0.67
        42
         37.80
    
    
      4 
         0-4
        Suva
        sec+
       0.73
        0.48
        51
         37.23
    
    
      5 
         0-4
       urban
        none
       1.17
        1.06
        12
         14.04
    
    
      6 
         0-4
       urban
       lower
       0.85
        1.59
        27
         22.95
    
    
      7 
         0-4
       urban
       upper
       1.05
        0.73
        39
         40.95
    
    
      8 
         0-4
       urban
        sec+
       0.69
        0.54
        51
         35.19
    
    
      9 
         0-4
       rural
        none
       0.97
        0.88
        62
         60.14
    
    
      10
         0-4
       rural
       lower
       0.96
        0.81
       102
         97.92
    
    
      11
         0-4
       rural
       upper
       0.97
        0.80
       107
        103.79
    
    
      12
         0-4
       rural
        sec+
       0.74
        0.59
        47
         34.78
    
    
      13
         5-9
        Suva
        none
       3.10
        1.66
        10
         31.00
    
    
      14
         5-9
        Suva
       lower
       2.67
        0.99
        30
         80.10
    
    
      15
         5-9
        Suva
       upper
       2.04
        1.87
        24
         48.96
    
    
      16
         5-9
        Suva
        sec+
       1.73
        0.68
        22
         38.06
    
    
      17
         5-9
       urban
        none
       4.54
        3.44
        13
         59.02
    
    
      18
         5-9
       urban
       lower
       2.65
        1.51
        37
         98.05
    
    
      19
         5-9
       urban
       upper
       2.68
        0.97
        44
        117.92
    
    
      20
         5-9
       urban
        sec+
       2.29
        0.81
        21
         48.09
    
    
      21
         5-9
       rural
        none
       2.44
        1.93
        70
        170.80
    
    
      22
         5-9
       rural
       lower
       2.71
        1.36
       117
        317.07
    
    
      23
         5-9
       rural
       upper
       2.47
        1.30
        81
        200.07
    
    
      24
         5-9
       rural
        sec+
       2.24
        1.19
        21
         47.04
    
    
      25
       10-14
        Suva
        none
       4.08
        1.72
        12
         48.96
    
    
      26
       10-14
        Suva
       lower
       3.67
        2.31
        27
         99.09
    
    
      27
       10-14
        Suva
       upper
       2.90
        1.57
        20
         58.00
    
    
      28
       10-14
        Suva
        sec+
       2.00
        1.82
        12
         24.00
    
    
      29
       10-14
       urban
        none
       4.17
        2.97
        18
         75.06
    
    
      30
       10-14
       urban
       lower
       3.33
        2.99
        43
        143.19
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      41
       15-19
       urban
        none
       4.70
        7.40
        23
        108.10
    
    
      42
       15-19
       urban
       lower
       5.36
        2.97
        42
        225.12
    
    
      43
       15-19
       urban
       upper
       4.60
        3.83
        20
         92.00
    
    
      44
       15-19
       urban
        sec+
       3.80
        0.70
         5
         19.00
    
    
      45
       15-19
       rural
        none
       5.06
        4.91
       114
        576.84
    
    
      46
       15-19
       rural
       lower
       5.59
        3.23
        86
        480.74
    
    
      47
       15-19
       rural
       upper
       4.50
        3.29
        30
        135.00
    
    
      48
       15-19
       rural
        sec+
       2.00
        0.00
         1
          2.00
    
    
      49
       20-24
        Suva
        none
       5.62
        4.15
        21
        118.02
    
    
      50
       20-24
        Suva
       lower
       5.06
        4.64
        18
         91.08
    
    
      51
       20-24
        Suva
       upper
       3.92
        4.08
        12
         47.04
    
    
      52
       20-24
        Suva
        sec+
       2.60
        4.30
         5
         13.00
    
    
      53
       20-24
       urban
        none
       5.36
        7.19
        22
        117.92
    
    
      54
       20-24
       urban
       lower
       5.88
        4.44
        25
        147.00
    
    
      55
       20-24
       urban
       upper
       5.00
        4.33
        13
         65.00
    
    
      56
       20-24
       urban
        sec+
       5.33
        0.33
         3
         15.99
    
    
      57
       20-24
       rural
        none
       6.46
        8.20
       117
        755.82
    
    
      58
       20-24
       rural
       lower
       6.34
        5.72
        68
        431.12
    
    
      59
       20-24
       rural
       upper
       5.74
        5.20
        23
        132.02
    
    
      60
       20-24
       rural
        sec+
       2.50
        0.50
         2
          5.00
    
    
      61
       25-29
        Suva
        none
       6.60
       12.40
        47
        310.20
    
    
      62
       25-29
        Suva
       lower
       6.74
       11.66
        27
        181.98
    
    
      63
       25-29
        Suva
       upper
       5.38
        4.27
         8
         43.04
    
    
      64
       25-29
        Suva
        sec+
       2.00
        0.00
         1
          2.00
    
    
      65
       25-29
       urban
        none
       6.52
       11.45
        46
        299.92
    
    
      66
       25-29
       urban
       lower
       7.51
       10.53
        45
        337.95
    
    
      67
       25-29
       urban
       upper
       7.54
       12.60
        13
         98.02
    
    
      69
       25-29
       rural
        none
       7.48
       11.34
       195
       1458.60
    
    
      70
       25-29
       rural
       lower
       7.81
        7.57
        59
        460.79
    
    
      71
       25-29
       rural
       upper
       5.80
        7.07
        10
         58.00
    
  

70 rows × 7 columns



In [50]:









    Out[50]:






  
    
      
      dur
      res
      educ
      mean
      var
      n
      y
    
  
  
    
      1 
         0-4
        Suva
        none
       0.50
        1.14
         8
          4.00
    
    
      2 
         0-4
        Suva
       lower
       1.14
        0.73
        21
         23.94
    
    
      3 
         0-4
        Suva
       upper
       0.90
        0.67
        42
         37.80
    
    
      4 
         0-4
        Suva
        sec+
       0.73
        0.48
        51
         37.23
    
    
      5 
         0-4
       urban
        none
       1.17
        1.06
        12
         14.04
    
    
      6 
         0-4
       urban
       lower
       0.85
        1.59
        27
         22.95
    
    
      7 
         0-4
       urban
       upper
       1.05
        0.73
        39
         40.95
    
    
      8 
         0-4
       urban
        sec+
       0.69
        0.54
        51
         35.19
    
    
      9 
         0-4
       rural
        none
       0.97
        0.88
        62
         60.14
    
    
      10
         0-4
       rural
       lower
       0.96
        0.81
       102
         97.92
    
    
      11
         0-4
       rural
       upper
       0.97
        0.80
       107
        103.79
    
    
      12
         0-4
       rural
        sec+
       0.74
        0.59
        47
         34.78
    
    
      13
         5-9
        Suva
        none
       3.10
        1.66
        10
         31.00
    
    
      14
         5-9
        Suva
       lower
       2.67
        0.99
        30
         80.10
    
    
      15
         5-9
        Suva
       upper
       2.04
        1.87
        24
         48.96
    
    
      16
         5-9
        Suva
        sec+
       1.73
        0.68
        22
         38.06
    
    
      17
         5-9
       urban
        none
       4.54
        3.44
        13
         59.02
    
    
      18
         5-9
       urban
       lower
       2.65
        1.51
        37
         98.05
    
    
      19
         5-9
       urban
       upper
       2.68
        0.97
        44
        117.92
    
    
      20
         5-9
       urban
        sec+
       2.29
        0.81
        21
         48.09
    
    
      21
         5-9
       rural
        none
       2.44
        1.93
        70
        170.80
    
    
      22
         5-9
       rural
       lower
       2.71
        1.36
       117
        317.07
    
    
      23
         5-9
       rural
       upper
       2.47
        1.30
        81
        200.07
    
    
      24
         5-9
       rural
        sec+
       2.24
        1.19
        21
         47.04
    
    
      25
       10-14
        Suva
        none
       4.08
        1.72
        12
         48.96
    
    
      26
       10-14
        Suva
       lower
       3.67
        2.31
        27
         99.09
    
    
      27
       10-14
        Suva
       upper
       2.90
        1.57
        20
         58.00
    
    
      28
       10-14
        Suva
        sec+
       2.00
        1.82
        12
         24.00
    
    
      29
       10-14
       urban
        none
       4.17
        2.97
        18
         75.06
    
    
      30
       10-14
       urban
       lower
       3.33
        2.99
        43
        143.19
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      41
       15-19
       urban
        none
       4.70
        7.40
        23
        108.10
    
    
      42
       15-19
       urban
       lower
       5.36
        2.97
        42
        225.12
    
    
      43
       15-19
       urban
       upper
       4.60
        3.83
        20
         92.00
    
    
      44
       15-19
       urban
        sec+
       3.80
        0.70
         5
         19.00
    
    
      45
       15-19
       rural
        none
       5.06
        4.91
       114
        576.84
    
    
      46
       15-19
       rural
       lower
       5.59
        3.23
        86
        480.74
    
    
      47
       15-19
       rural
       upper
       4.50
        3.29
        30
        135.00
    
    
      48
       15-19
       rural
        sec+
       2.00
        0.00
         1
          2.00
    
    
      49
       20-24
        Suva
        none
       5.62
        4.15
        21
        118.02
    
    
      50
       20-24
        Suva
       lower
       5.06
        4.64
        18
         91.08
    
    
      51
       20-24
        Suva
       upper
       3.92
        4.08
        12
         47.04
    
    
      52
       20-24
        Suva
        sec+
       2.60
        4.30
         5
         13.00
    
    
      53
       20-24
       urban
        none
       5.36
        7.19
        22
        117.92
    
    
      54
       20-24
       urban
       lower
       5.88
        4.44
        25
        147.00
    
    
      55
       20-24
       urban
       upper
       5.00
        4.33
        13
         65.00
    
    
      56
       20-24
       urban
        sec+
       5.33
        0.33
         3
         15.99
    
    
      57
       20-24
       rural
        none
       6.46
        8.20
       117
        755.82
    
    
      58
       20-24
       rural
       lower
       6.34
        5.72
        68
        431.12
    
    
      59
       20-24
       rural
       upper
       5.74
        5.20
        23
        132.02
    
    
      60
       20-24
       rural
        sec+
       2.50
        0.50
         2
          5.00
    
    
      61
       25-29
        Suva
        none
       6.60
       12.40
        47
        310.20
    
    
      62
       25-29
        Suva
       lower
       6.74
       11.66
        27
        181.98
    
    
      63
       25-29
        Suva
       upper
       5.38
        4.27
         8
         43.04
    
    
      64
       25-29
        Suva
        sec+
       2.00
        0.00
         1
          2.00
    
    
      65
       25-29
       urban
        none
       6.52
       11.45
        46
        299.92
    
    
      66
       25-29
       urban
       lower
       7.51
       10.53
        45
        337.95
    
    
      67
       25-29
       urban
       upper
       7.54
       12.60
        13
         98.02
    
    
      69
       25-29
       rural
        none
       7.48
       11.34
       195
       1458.60
    
    
      70
       25-29
       rural
       lower
       7.81
        7.57
        59
        460.79
    
    
      71
       25-29
       rural
       upper
       5.80
        7.07
        10
         58.00
    
  

70 rows × 7 columns



In [ ]:

	mean
1	0.50
2	1.14
3	0.90
4	0.73
5	1.17
6	0.85
7	1.05
8	0.69
9	0.97
10	0.96
11	0.97
12	0.74
13	3.10
14	2.67
15	2.04
16	1.73
17	4.54
18	2.65
19	2.68
20	2.29
21	2.44
22	2.71
23	2.47
24	2.24
25	4.08
26	3.67
27	2.90
28	2.00
29	4.17
30	3.33
...	...
41	4.70
42	5.36
43	4.60
44	3.80
45	5.06
46	5.59
47	4.50
48	2.00
49	5.62
50	5.06
51	3.92
52	2.60
53	5.36
54	5.88
55	5.00
56	5.33
57	6.46
58	6.34
59	5.74
60	2.50
61	6.60
62	6.74
63	5.38
64	2.00
65	6.52
66	7.51
67	7.54
69	7.48
70	7.81
71	5.80

	dur	res	educ	mean	var	n	y
1	0-4	Suva	none	0.50	1.14	8	4.00
2	0-4	Suva	lower	1.14	0.73	21	23.94
3	0-4	Suva	upper	0.90	0.67	42	37.80
4	0-4	Suva	sec+	0.73	0.48	51	37.23
5	0-4	urban	none	1.17	1.06	12	14.04
6	0-4	urban	lower	0.85	1.59	27	22.95
7	0-4	urban	upper	1.05	0.73	39	40.95
8	0-4	urban	sec+	0.69	0.54	51	35.19
9	0-4	rural	none	0.97	0.88	62	60.14
10	0-4	rural	lower	0.96	0.81	102	97.92
11	0-4	rural	upper	0.97	0.80	107	103.79
12	0-4	rural	sec+	0.74	0.59	47	34.78
13	5-9	Suva	none	3.10	1.66	10	31.00
14	5-9	Suva	lower	2.67	0.99	30	80.10
15	5-9	Suva	upper	2.04	1.87	24	48.96
16	5-9	Suva	sec+	1.73	0.68	22	38.06
17	5-9	urban	none	4.54	3.44	13	59.02
18	5-9	urban	lower	2.65	1.51	37	98.05
19	5-9	urban	upper	2.68	0.97	44	117.92
20	5-9	urban	sec+	2.29	0.81	21	48.09
21	5-9	rural	none	2.44	1.93	70	170.80
22	5-9	rural	lower	2.71	1.36	117	317.07
23	5-9	rural	upper	2.47	1.30	81	200.07
24	5-9	rural	sec+	2.24	1.19	21	47.04
25	10-14	Suva	none	4.08	1.72	12	48.96
26	10-14	Suva	lower	3.67	2.31	27	99.09
27	10-14	Suva	upper	2.90	1.57	20	58.00
28	10-14	Suva	sec+	2.00	1.82	12	24.00
29	10-14	urban	none	4.17	2.97	18	75.06
30	10-14	urban	lower	3.33	2.99	43	143.19
...	...	...	...	...	...	...	...
41	15-19	urban	none	4.70	7.40	23	108.10
42	15-19	urban	lower	5.36	2.97	42	225.12
43	15-19	urban	upper	4.60	3.83	20	92.00
44	15-19	urban	sec+	3.80	0.70	5	19.00
45	15-19	rural	none	5.06	4.91	114	576.84
46	15-19	rural	lower	5.59	3.23	86	480.74
47	15-19	rural	upper	4.50	3.29	30	135.00
48	15-19	rural	sec+	2.00	0.00	1	2.00
49	20-24	Suva	none	5.62	4.15	21	118.02
50	20-24	Suva	lower	5.06	4.64	18	91.08
51	20-24	Suva	upper	3.92	4.08	12	47.04
52	20-24	Suva	sec+	2.60	4.30	5	13.00
53	20-24	urban	none	5.36	7.19	22	117.92
54	20-24	urban	lower	5.88	4.44	25	147.00
55	20-24	urban	upper	5.00	4.33	13	65.00
56	20-24	urban	sec+	5.33	0.33	3	15.99
57	20-24	rural	none	6.46	8.20	117	755.82
58	20-24	rural	lower	6.34	5.72	68	431.12
59	20-24	rural	upper	5.74	5.20	23	132.02
60	20-24	rural	sec+	2.50	0.50	2	5.00
61	25-29	Suva	none	6.60	12.40	47	310.20
62	25-29	Suva	lower	6.74	11.66	27	181.98
63	25-29	Suva	upper	5.38	4.27	8	43.04
64	25-29	Suva	sec+	2.00	0.00	1	2.00
65	25-29	urban	none	6.52	11.45	46	299.92
66	25-29	urban	lower	7.51	10.53	45	337.95
67	25-29	urban	upper	7.54	12.60	13	98.02
69	25-29	rural	none	7.48	11.34	195	1458.60
70	25-29	rural	lower	7.81	7.57	59	460.79
71	25-29	rural	upper	5.80	7.07	10	58.00