Pandas Examples

MIT License: https://opensource.org/licenses/MIT



In [1]:

    
from __future__ import print_function, division

%matplotlib inline

import numpy as np

import nsfg
import first
import analytic

import thinkstats2

import seaborn

Let's read data from the BRFSS:



In [2]:

    
import brfss
df = brfss.ReadBrfss()



In [3]:

    
df.describe()









    Out[3]:






  
    
      
      age
      sex
      wtyrago
      finalwt
      wtkg2
      htm3
    
  
  
    
      count
      410856.000000
      414509.000000
      390399.000000
      414509.000000
      398484.000000
      409129.000000
    
    
      mean
      54.862180
      1.624368
      79.721319
      561.774700
      78.992453
      168.825190
    
    
      std
      16.737702
      0.484286
      20.565164
      1076.538764
      19.546157
      10.352653
    
    
      min
      18.000000
      1.000000
      22.727273
      1.695143
      20.000000
      61.000000
    
    
      25%
      43.000000
      1.000000
      64.545455
      97.006804
      64.550000
      160.000000
    
    
      50%
      55.000000
      2.000000
      77.272727
      234.010543
      77.270000
      168.000000
    
    
      75%
      67.000000
      2.000000
      90.909091
      590.775576
      90.910000
      175.000000
    
    
      max
      99.000000
      2.000000
      342.272727
      60995.111700
      309.090000
      236.000000

If we group by sex, we get a DataFrameGroupBy object.



In [4]:

    
groupby = df.groupby('sex')
groupby









    Out[4]:





<pandas.core.groupby.DataFrameGroupBy object at 0x7f6c6c1cf2d0>

If we select a particular column from the GroupBy, we get a SeriesGroupBy object.



In [5]:

    
seriesgroupby = groupby.htm3
seriesgroupby









    Out[5]:





<pandas.core.groupby.SeriesGroupBy object at 0x7f6c3c88d590>

If you invoke a reduce method on a DataFrameGroupBy, you get a DataFrame:



In [6]:

    
groupby.mean()

If you invoke a reduce method on a SeriesGroupBy, you get a Series:



In [7]:

    
seriesgroupby.mean()









    Out[7]:





sex
1    178.066221
2    163.223475
Name: htm3, dtype: float64

You can use aggregate to apply a collection of reduce methods:



In [8]:

    
groupby.aggregate(['mean', 'std'])









    Out[8]:






  
    
      
      age
      wtyrago
      finalwt
      wtkg2
      htm3
    
    
      
      mean
      std
      mean
      std
      mean
      std
      mean
      std
      mean
      std
    
    
      sex
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      54.114237
      16.383712
      89.880639
      19.260805
      727.426119
      1370.544678
      89.016966
      18.240528
      178.066221
      7.723563
    
    
      2
      55.314401
      16.932158
      73.278344
      18.678964
      462.115407
      836.457063
      72.684712
      17.609006
      163.223475
      7.269156



In [9]:

    
seriesgroupby.aggregate(['mean', 'std'])

If the reduce method you want is not available, you can make your own:



In [10]:

    
def trimmed_mean(series):
    lower, upper = series.quantile([0.05, 0.95])
    return series.clip(lower, upper).mean()

Here's how it works when we apply it directly:



In [11]:

    
trimmed_mean(df.htm3)









    Out[11]:





168.65401132650092

And we can use apply to apply it to each group:



In [12]:

    
seriesgroupby.apply(trimmed_mean)









    Out[12]:





sex
1    178.109121
2    163.210210
Name: htm3, dtype: float64

The digitize-groupby combo

Let's say we want to group people into deciles (bottom 10%, next 10%, and so on).

We can start by defining the cumulative probabilities that mark the borders between deciles.



In [13]:

    
ps = np.linspace(0, 1, 11)
ps









    Out[13]:





array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

And then use deciles to find the values that correspond to those cumulative probabilities.



In [14]:

    
series = df.htm3
bins = series.quantile(ps)
bins









    Out[14]:





0.0     61.0
0.1    157.0
0.2    160.0
0.3    163.0
0.4    165.0
0.5    168.0
0.6    170.0
0.7    175.0
0.8    178.0
0.9    183.0
1.0    236.0
Name: htm3, dtype: float64

digitize takes a series and a sequence of bin boundaries, and computes the bin index for each element in the series.



In [15]:

    
np.digitize(series, bins)









    Out[15]:





array([2, 4, 5, ..., 9, 2, 9])

Exercise: Collect the code snippets from the previous cells to write a function called digitize that takes a Series and a number of bins and return the results from np.digitize.



In [16]:

    
def digitize(series, n=11):
    ps = np.linspace(0, 1, n)
    bins = series.quantile(ps)
    return np.digitize(series, bins)

Now, if your digitize function is working, we can assign the results to a new column in the DataFrame:



In [17]:

    
df['height_decile'] = digitize(df.htm3)
df.height_decile.describe()









    Out[17]:





count    414509.000000
mean          5.897512
std           2.924237
min           1.000000
25%           4.000000
50%           6.000000
75%           9.000000
max          11.000000
Name: height_decile, dtype: float64

And then group by height_decile



In [18]:

    
groupby = df.groupby('height_decile')

Now we can compute means for each variable in each group:



In [19]:

    
groupby.mean()









    Out[19]:






  
    
      
      age
      sex
      wtyrago
      finalwt
      wtkg2
      htm3
    
    
      height_decile
      
      
      
      
      
      
    
  
  
    
      1
      59.343555
      1.978576
      65.732170
      482.633403
      65.071502
      152.147807
    
    
      2
      57.708848
      1.976869
      68.925145
      460.942597
      68.415169
      157.001891
    
    
      3
      56.693958
      1.964062
      71.014838
      461.123901
      70.420280
      160.002595
    
    
      4
      55.411945
      1.943195
      72.962619
      481.562164
      72.417512
      163.000728
    
    
      5
      55.010687
      1.890204
      74.834351
      498.016131
      74.251253
      165.002383
    
    
      6
      54.469226
      1.781678
      77.115205
      534.779185
      76.563233
      168.000509
    
    
      7
      53.708454
      1.573876
      80.716256
      570.787161
      80.072338
      171.405279
    
    
      8
      53.354186
      1.306171
      85.399642
      643.930094
      84.663829
      175.001213
    
    
      9
      53.838675
      1.134095
      89.774340
      651.359825
      88.932201
      178.888785
    
    
      10
      51.531986
      1.034134
      98.338009
      714.941775
      97.387332
      186.214057
    
    
      11
      55.246926
      1.758967
      71.630761
      792.074824
      71.236649
      236.000000

It looks like:

The shortest people are older than the tallest people, on average.
The shortest people are much more likely to be female (no surprise there).
The shortest people are lighter than the tallest people (wtkg2), and they were lighter last year, too (wtyrago).
Shorter people are more oversampled, so they have lower final weights. This is at least partly, and maybe entirely, due to the relationship with sex.

The fact that all of these variables are associates with height suggests that it will be important to control for age and sex for almost any analysis we want to do with this data.

Nevertheless, we'll start with a simple analysis looking at weights within each height group.



In [20]:

    
weights = groupby.wtkg2
weights









    Out[20]:





<pandas.core.groupby.SeriesGroupBy object at 0x7f6c34e6a6d0>

If we apply quantile to a SeriesGroupBy, we get back a Series with a MultiIndex.



In [21]:

    
quantiles = weights.quantile([0.25, 0.5, 0.75])
quantiles









    Out[21]:





height_decile      
1              0.25     54.55
               0.50     62.73
               0.75     72.73
2              0.25     57.73
               0.50     65.91
               0.75     76.36
3              0.25     59.09
               0.50     68.18
               0.75     78.64
4              0.25     61.36
               0.50     68.18
               0.75     81.36
5              0.25     63.18
               0.50     71.36
               0.75     81.82
6              0.25     64.09
               0.50     72.73
               0.75     84.09
7              0.25     68.18
               0.50     77.27
               0.75     88.64
8              0.25     72.73
               0.50     81.82
               0.75     90.91
9              0.25     77.27
               0.50     86.36
               0.75     97.27
10             0.25     84.09
               0.50     95.45
               0.75    106.82
11             0.25     59.09
               0.50     69.09
               0.75     80.00
dtype: float64



In [22]:

    
type(quantiles.index)









    Out[22]:





pandas.indexes.multi.MultiIndex

If you unstack a MultiIndex, the inner level of the MultIndex gets broken out into columns.



In [23]:

    
quantiles.unstack()









    Out[23]:






  
    
      
      0.25
      0.5
      0.75
    
    
      height_decile
      
      
      
    
  
  
    
      1
      54.55
      62.73
      72.73
    
    
      2
      57.73
      65.91
      76.36
    
    
      3
      59.09
      68.18
      78.64
    
    
      4
      61.36
      68.18
      81.36
    
    
      5
      63.18
      71.36
      81.82
    
    
      6
      64.09
      72.73
      84.09
    
    
      7
      68.18
      77.27
      88.64
    
    
      8
      72.73
      81.82
      90.91
    
    
      9
      77.27
      86.36
      97.27
    
    
      10
      84.09
      95.45
      106.82
    
    
      11
      59.09
      69.09
      80.00

Which makes it convenient to plot each of the columns as a line.



In [24]:

    
quantiles.unstack().plot()









    Out[24]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f6c34e43890>

The other view of this data we might like is the CDF of weight within each height group.

We can use apply with the Cdf constructor from thinkstats2. The results is a Series of Cdf objects.



In [25]:

    
from thinkstats2 import Cdf

cdfs = weights.apply(Cdf)
cdfs









    Out[25]:





height_decile
1     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
3     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
4     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
5     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
6     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
7     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
8     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
9     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
10    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
11    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: wtkg2, dtype: object

And now we can plot the CDFs



In [26]:

    
import thinkplot

thinkplot.Cdfs(cdfs[1:11:2])
thinkplot.Config(xlabel='Weight (kg)', ylabel='Cdf')

Exercise: Plot CDFs of weight for men and women separately, broken out by decile of height.



In [27]:

    
groupby = df.groupby(['sex', 'height_decile'])



In [28]:

    
groupby.mean()









    Out[28]:






  
    
      
      
      age
      wtyrago
      finalwt
      wtkg2
      htm3
    
    
      sex
      height_decile
      
      
      
      
      
    
  
  
    
      1
      1
      53.844743
      71.977922
      1191.234678
      71.514913
      150.233293
    
    
      2
      56.595658
      71.746929
      1003.079353
      71.003320
      157.025469
    
    
      3
      54.869454
      72.944564
      964.772919
      72.326525
      160.028037
    
    
      4
      56.042489
      74.553472
      879.420399
      74.016273
      163.005126
    
    
      5
      55.165862
      76.385110
      917.796857
      75.760722
      165.013377
    
    
      6
      56.300059
      79.337985
      814.286624
      78.624502
      168.001398
    
    
      7
      56.170706
      82.956049
      705.872453
      82.235605
      171.753016
    
    
      8
      55.213608
      86.572882
      706.591295
      85.843598
      175.001515
    
    
      9
      54.734034
      90.392173
      676.266016
      89.530755
      178.925710
    
    
      10
      51.677645
      98.621484
      721.615568
      97.668112
      186.240144
    
    
      11
      50.650927
      79.295724
      1195.072324
      78.971384
      236.000000
    
    
      2
      1
      59.464339
      65.595065
      467.120026
      64.928121
      152.189721
    
    
      2
      57.735134
      68.857631
      448.105460
      68.351735
      157.001333
    
    
      3
      56.762345
      70.942543
      442.349122
      70.347362
      160.001647
    
    
      4
      55.373809
      72.864145
      457.600597
      72.317857
      163.000463
    
    
      5
      54.991480
      74.636768
      446.241040
      74.057330
      165.001027
    
    
      6
      53.955434
      76.467033
      456.712813
      75.963038
      168.000260
    
    
      7
      51.872079
      78.972655
      470.481305
      78.390760
      171.147071
    
    
      8
      49.126449
      82.582291
      501.930594
      81.844168
      175.000528
    
    
      9
      48.030870
      85.524509
      490.531017
      84.845692
      178.650349
    
    
      10
      47.388489
      89.769177
      526.098271
      89.063530
      185.475904
    
    
      11
      56.722523
      69.371896
      664.090555
      68.941935
      NaN



In [29]:

    
groupby.wtkg2.mean()









    Out[29]:





sex  height_decile
1    1                71.514913
     2                71.003320
     3                72.326525
     4                74.016273
     5                75.760722
     6                78.624502
     7                82.235605
     8                85.843598
     9                89.530755
     10               97.668112
     11               78.971384
2    1                64.928121
     2                68.351735
     3                70.347362
     4                72.317857
     5                74.057330
     6                75.963038
     7                78.390760
     8                81.844168
     9                84.845692
     10               89.063530
     11               68.941935
Name: wtkg2, dtype: float64



In [30]:

    
cdfs = groupby.wtkg2.apply(Cdf)
cdfs









    Out[30]:





sex  height_decile
1    1                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     2                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     3                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     4                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     5                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     6                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     7                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     8                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     9                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     10               [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     11               [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2    1                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     2                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     3                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     4                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     5                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     6                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     7                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     8                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     9                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     10               [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     11               [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: wtkg2, dtype: object



In [31]:

    
cdfs.unstack()









    Out[31]:






  
    
      height_decile
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
    
    
      sex
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    
    
      2
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...



In [32]:

    
men = cdfs.unstack().loc[1]
men









    Out[32]:





height_decile
1     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
3     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
4     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
5     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
6     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
7     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
8     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
9     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
10    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
11    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: 1, dtype: object



In [33]:

    
thinkplot.Cdfs(men[1:11:2])



In [34]:

    
women = cdfs.unstack().loc[2]
women









    Out[34]:





height_decile
1     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
3     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
4     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
5     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
6     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
7     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
8     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
9     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
10    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
11    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: 2, dtype: object



In [35]:

    
thinkplot.Cdfs(women[1:11:2])



In [ ]:

	age	sex	wtyrago	finalwt	wtkg2	htm3
count	410856.000000	414509.000000	390399.000000	414509.000000	398484.000000	409129.000000
mean	54.862180	1.624368	79.721319	561.774700	78.992453	168.825190
std	16.737702	0.484286	20.565164	1076.538764	19.546157	10.352653
min	18.000000	1.000000	22.727273	1.695143	20.000000	61.000000
25%	43.000000	1.000000	64.545455	97.006804	64.550000	160.000000
50%	55.000000	2.000000	77.272727	234.010543	77.270000	168.000000
75%	67.000000	2.000000	90.909091	590.775576	90.910000	175.000000
max	99.000000	2.000000	342.272727	60995.111700	309.090000	236.000000

	age	wtyrago	finalwt	wtkg2	htm3
sex
1	54.114237	89.880639	727.426119	89.016966	178.066221
2	55.314401	73.278344	462.115407	72.684712	163.223475

	age	sex	wtyrago	finalwt	wtkg2	htm3
height_decile
1	59.343555	1.978576	65.732170	482.633403	65.071502	152.147807
2	57.708848	1.976869	68.925145	460.942597	68.415169	157.001891
3	56.693958	1.964062	71.014838	461.123901	70.420280	160.002595
4	55.411945	1.943195	72.962619	481.562164	72.417512	163.000728
5	55.010687	1.890204	74.834351	498.016131	74.251253	165.002383
6	54.469226	1.781678	77.115205	534.779185	76.563233	168.000509
7	53.708454	1.573876	80.716256	570.787161	80.072338	171.405279
8	53.354186	1.306171	85.399642	643.930094	84.663829	175.001213
9	53.838675	1.134095	89.774340	651.359825	88.932201	178.888785
10	51.531986	1.034134	98.338009	714.941775	97.387332	186.214057
11	55.246926	1.758967	71.630761	792.074824	71.236649	236.000000

	0.25	0.5	0.75
height_decile
1	54.55	62.73	72.73
2	57.73	65.91	76.36
3	59.09	68.18	78.64
4	61.36	68.18	81.36
5	63.18	71.36	81.82
6	64.09	72.73	84.09
7	68.18	77.27	88.64
8	72.73	81.82	90.91
9	77.27	86.36	97.27
10	84.09	95.45	106.82
11	59.09	69.09	80.00

		age	wtyrago	finalwt	wtkg2	htm3
sex	height_decile
1	1	53.844743	71.977922	1191.234678	71.514913	150.233293
	2	56.595658	71.746929	1003.079353	71.003320	157.025469
	3	54.869454	72.944564	964.772919	72.326525	160.028037
	4	56.042489	74.553472	879.420399	74.016273	163.005126
	5	55.165862	76.385110	917.796857	75.760722	165.013377
	6	56.300059	79.337985	814.286624	78.624502	168.001398
	7	56.170706	82.956049	705.872453	82.235605	171.753016
	8	55.213608	86.572882	706.591295	85.843598	175.001515
	9	54.734034	90.392173	676.266016	89.530755	178.925710
	10	51.677645	98.621484	721.615568	97.668112	186.240144
	11	50.650927	79.295724	1195.072324	78.971384	236.000000
2	1	59.464339	65.595065	467.120026	64.928121	152.189721
	2	57.735134	68.857631	448.105460	68.351735	157.001333
	3	56.762345	70.942543	442.349122	70.347362	160.001647
	4	55.373809	72.864145	457.600597	72.317857	163.000463
	5	54.991480	74.636768	446.241040	74.057330	165.001027
	6	53.955434	76.467033	456.712813	75.963038	168.000260
	7	51.872079	78.972655	470.481305	78.390760	171.147071
	8	49.126449	82.582291	501.930594	81.844168	175.000528
	9	48.030870	85.524509	490.531017	84.845692	178.650349
	10	47.388489	89.769177	526.098271	89.063530	185.475904
	11	56.722523	69.371896	664.090555	68.941935	NaN

height_decile	1	2	3	4	5	6	7	8	9	10	11
sex
1	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...