Predicting babyweight using BigQuery ML

This notebook illustrates:

Machine Learning using BigQuery
Jupyter Magic for BigQuery in Cloud Datalab

Please see this notebook for more context on this problem and how the features were chosen.



In [1]:

    
# change these to try this notebook out
PROJECT = 'cloud-training-demos'
REGION = 'us-central1'



In [2]:

    
import os
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION



In [3]:

    
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION









    



Updated property [core/project].
Updated property [compute/region].

Exploring the Data

Here, we will be taking natality data and training on features to predict the birth weight.

The CDC's Natality data has details on US births from 1969 to 2008 and is available in BigQuery as a public data set. More details: https://bigquery.cloud.google.com/table/publicdata:samples.natality?tab=details

Lets start by looking at the data since 2000 with useful values > 0!



In [4]:

    
%%bigquery
SELECT
    *
FROM
  publicdata.samples.natality
WHERE
  year > 2000
  AND gestation_weeks > 0
  AND mother_age > 0
  AND plurality > 0
  AND weight_pounds > 0
LIMIT 10









    Out[4]:





    source_year year month day wday state is_male child_race weight_pounds plurality apgar_1min apgar_5min mother_residence_state mother_race mother_age gestation_weeks lmp mother_married mother_birth_state cigarette_use cigarettes_per_day alcohol_use drinks_per_week weight_gain_pounds born_alive_alive born_alive_dead born_dead ever_born father_race father_age record_weight
2001 2001 8   4 NY True 9 8.313631900019999 1 99 10 NY 2 20 44 10012000 False NY     False   46 0 0 0 1 2 21 1
2001 2001 3   5 FL True 9 8.24969784404 1 99 9 FL 1 24 41 99999999 True Foreign     False   99 0 0 0 1 1 26 1
2001 2001 5   4 FL True 9 4.31224184472 1 99 9 FL 1 19 38 08992000 True MA     False   25 0 0 0 1 1 21 1
2001 2001 1   2 MO True 9 8.375361333379999 1 99 9 MO 1 30 40 04052000 True MO     False   27 1 0 0 2 1 31 1
2001 2001 12   4 IL True 9 9.25059651352 1 99 9 IL 1 28 41 02222001 True IL     False   40 1 0 2 2 1 28 1
2001 2001 2   4 NY False 9 6.06050758238 1 99 9 NY 1 31 38 05172000 True NY     False   14 1 0 0 2 1 31 1
2001 2001 4   2 OH False 9 7.3524164377 1 99 9 OH 1 24 39 07142000 True KY     False   13 1 0 1 2 1 24 1
2001 2001 11   2 MI False 9 6.75055446244 1 99 10 MI 1 35 42 02012001 True MI     False   15 2 0 1 3 1 30 1
2001 2001 11   5 CA False 9 6.13105550622 1 99 99 CA 1 32 38 02142001 True CA         99 2 0 0 3 1 22 1
2001 2001 3   1 IL False 9 10.37495404972 1 99 9 IL 1 34 39 06162000 True IL     False   42 3 0 1 4 1 33 1
    
(rows: 10, time: 1.9s,    23GB processed, job: job_Jre7EM0iWleBumtVWKqFL6hhPEQ7)

Define Features

Looking over the data set, there are a few columns of interest that could be leveraged into features for a reasonable prediction of approximate birth weight.

Further, some feature engineering may be accomplished with the BigQuery CAST function -- in BQML, all strings are considered categorical features and all numeric types are considered continuous ones.

The hashmonth is added so that we can repeatably split the data without leakage -- we want all babies that share a birthday to be either in training set or in test set and not spread between them (otherwise, there would be information leakage when it comes to triplets, etc.)



In [5]:

    
%%bigquery
SELECT
    weight_pounds, -- this is the label; because it is continuous, we need to use regression
    CAST(is_male AS STRING) AS is_male,
    mother_age,
    CAST(plurality AS STRING) AS plurality,
    gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
FROM
  publicdata.samples.natality
WHERE
  year > 2000
  AND gestation_weeks > 0
  AND mother_age > 0
  AND plurality > 0
  AND weight_pounds > 0
LIMIT 10









    Out[5]:





    weight_pounds is_male mother_age plurality gestation_weeks hashmonth
6.686620406459999 true 18 1 43 8904940584331855459
9.36082764452 true 32 1 41 1088037545023002395
6.9996768185 true 23 1 40 1088037545023002395
9.37405538024 true 34 1 40 1525201076796226340
8.37315671076 true 33 1 40 3408502330831153141
8.437090766739999 false 30 1 39 5896567601480310696
6.1244416383599996 false 24 1 40 6244544205302024223
7.12534030784 false 26 1 41 8029892925374153452
6.944561253 false 31 1 40 2126480030009879160
7.1870697412 false 23 1 40 1403073183891835564
    
(rows: 10, time: 1.2s,     6GB processed, job: job__zp60pgtV6soL1xBkFRDRng7iIJ3)

Train Model

With the relevant columns chosen to accomplish predictions, it is then possible to create (train) the model in BigQuery. First, a dataset will be needed store the model. (if this throws an error in Datalab, simply create the dataset from the BigQuery console).



In [ ]:

    
%%bash
bq --location=US mk -d demo

With the demo dataset ready, it is possible to create a linear regression model to train the model.

This will take approximately 4 minutes to run and will show Done when complete.



In [ ]:

    
%%bigquery
CREATE or REPLACE MODEL demo.babyweight_model_asis
OPTIONS
  (model_type='linear_reg', labels=['weight_pounds']) AS
  
WITH natality_data AS (
  SELECT
    weight_pounds,-- this is the label; because it is continuous, we need to use regression
    CAST(is_male AS STRING) AS is_male,
    mother_age,
    CAST(plurality AS STRING) AS plurality,
    gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000
    AND gestation_weeks > 0
    AND mother_age > 0
    AND plurality > 0
    AND weight_pounds > 0
)

SELECT
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks
FROM
    natality_data
WHERE
  ABS(MOD(hashmonth, 4)) < 3  -- select 75% of the data as training

Training Statistics

During the model training (and after the training), it is possible to see the model's training evaluation statistics.

For each training run, a table named <model_name>_eval is created. This table has basic performance statistics for each iteration.

While the new model is training, review the training statistics in the BigQuery UI to see the below model training: https://bigquery.cloud.google.com/

Since these statistics are updated after each iteration of model training, you will see different values for each refresh while the model is training.

The training details may also be viewed after the training completes from this notebook.



In [12]:

    
%%bigquery
SELECT * FROM ML.TRAINING_INFO(MODEL demo.babyweight_model_asis);









    Out[12]:





    training_run iteration loss eval_loss duration_ms learning_rate
0 5 1.13082175649 1.12667393796 98205 0.4
0 4 1.13242258421 1.1284767326 126255 0.8
0 3 1.14355263342 1.1400193056 99166 0.4
0 2 1.17905497472 1.17629137996 105757 0.4
0 1 1.57286336318 1.56866519873 98185 0.4
0 0 9.85574348435 9.86270726649 96382 0.2
    
(rows: 6, time: 1.3s,     0B processed, job: job_No9S9g6EeX4EdQgdOtcF538SKZ2w)

Some of these columns are obvious although what do the non-specific ML columns mean (specific to BQML)?

training_run - Will be zero for a newly created model. If the model is re-trained using warm_start, this will increment for each re-training.

iteration - Number of the associated training_run, starting with zero for the first iteration.

duration_ms - Indicates how long the iteration took (in ms).

Note: You can also see these stats by refreshing the BigQuery UI window, finding the <model_name> table, selecting on it, and then the Training Stats sub-header.

Let's plot the training and evaluation loss to see if the model has an overfit.



In [2]:

    
import google.datalab.bigquery as bq
df = bq.Query("SELECT * FROM ML.TRAINING_INFO(MODEL demo.babyweight_model_asis)").execute().result().to_dataframe()
# plot both lines in same graph
import matplotlib.pyplot as plt
plt.plot( 'iteration', 'loss', data=df, marker='o', color='orange', linewidth=2)
plt.plot( 'iteration', 'eval_loss', data=df, marker='', color='green', linewidth=2, linestyle='dashed')
plt.xlabel('iteration')
plt.ylabel('loss')
plt.legend();









    



/usr/local/envs/py3env/lib/python3.5/site-packages/matplotlib/font_manager.py:1320: UserWarning: findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans
  (prop.get_family(), self.defaultFamily[fontext]))

As you can see, the training loss and evaluation loss are essentially identical. We do not seem to be overfitting.

Make a Prediction with BQML using the Model

With a trained model, it is now possible to make a prediction on the values. The only difference from the second query above is the reference to the model. The data has been limited (LIMIT 100) to reduce amount of data returned.

When the ml.predict function is leveraged, output prediction column name for the model is predicted_<label_column_name>.



In [14]:

    
%%bigquery
SELECT
  *
FROM
  ml.PREDICT(MODEL demo.babyweight_model_asis,
      (SELECT
        weight_pounds,
        CAST(is_male AS STRING) AS is_male,
        mother_age,
        CAST(plurality AS STRING) AS plurality,
        gestation_weeks
      FROM
        publicdata.samples.natality
      WHERE
        year > 2000
        AND gestation_weeks > 0
        AND mother_age > 0
        AND plurality > 0
        AND weight_pounds > 0
    ))
LIMIT 100









    Out[14]:





    predicted_weight_pounds weight_pounds is_male mother_age plurality gestation_weeks
3.92758220737 3.62439958728 true 26 1 25
5.91794435659 4.46877005074 true 24 1 33
5.89704728035 4.2108292042 true 23 1 33
6.40508635577 8.31363190002 true 23 1 35
6.42598343201 7.05920162924 true 24 1 35
6.76359127467 6.062712205 true 28 1 36
6.5546205123 5.74965579296 true 18 1 36
7.05940496486 7.3524164377 true 30 1 37
7.01761081238 6.6248909731 true 28 1 37
6.85043420249 6.7571683303 true 20 1 37
7.05940496486 7.68751907594 true 30 1 37
7.48060111246 7.3744626639 true 38 1 38
7.25073327386 9.81277528162 true 27 1 38
7.14624789267 8.12623897732 true 22 1 38
7.12535081644 8.81187661214 true 21 1 38
7.35521865504 6.91369653632 true 32 1 38
7.22983619762 8.68841774542 true 26 1 38
7.29252742633 7.1870697412 true 29 1 38
7.1044537402 6.4374980504 true 20 1 38
7.14624789267 6.9996768185 true 22 1 38
7.25073327386 6.67118804812 true 27 1 38
7.18804204515 6.6248909731 true 24 1 38
7.2957820492 7.25100379718 true 17 1 39
7.50475281157 7.5618555866 true 27 1 39
7.6928264977 7.87491199864 true 36 1 39
    
(rows: 100, time: 1.8s,     0B processed, job: job_e39HFO9u7Ccr-sbTPhIGKvYcODKl)

More advanced...

In the original example, we were taking into account the idea that if no ultrasound has been performed, some of the features (e.g. is_male) will not be known. Therefore, we augmented the dataset with such masked features and trained a single model to deal with both these scenarios.

In addition, during data exploration, we learned that the data size set for mothers older than 45 was quite sparse, so we will discretize the mother age.



In [15]:

    
%%bigquery
SELECT
    weight_pounds,
    CAST(is_male AS STRING) AS is_male,
    IF(mother_age < 18, 'LOW',
         IF(mother_age > 45, 'HIGH',
            CAST(mother_age AS STRING))) AS mother_age,
    CAST(plurality AS STRING) AS plurality,
    CAST(gestation_weeks AS STRING) AS gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000
    AND gestation_weeks > 0
    AND mother_age > 0
    AND plurality > 0
    AND weight_pounds > 0
LIMIT 25









    Out[15]:





    weight_pounds is_male mother_age plurality gestation_weeks hashmonth
8.8074673669 true 39 1 42 1088037545023002395
7.3744626639 true 38 1 38 1088037545023002395
8.1350574678 true 20 1 39 6244544205302024223
7.25100379718 true LOW 1 39 1525201076796226340
6.25671899556 true 29 1 41 7146494315947640619
7.62578964258 true 22 1 41 6392072535155213407
7.3524164377 true 30 1 37 8904940584331855459
9.0940683075 true 25 1 40 8904940584331855459
7.87491199864 true 36 1 39 7170969733900686954
7.5618555866 true 27 1 39 6691862025345277042
8.31363190002 true 23 1 35 2126480030009879160
8.06230492134 true LOW 1 40 2126480030009879160
4.46877005074 true 24 1 33 2126480030009879160
8.12623897732 true 32 1 40 2126480030009879160
6.062712205 true 28 1 36 7108882242435606404
7.56846945446 true 22 1 46 1403073183891835564
7.12534030784 true 20 1 40 9068386407968572094
7.43839671988 false 31 1 38 6392072535155213407
5.43659938092 false 35 2 34 6392072535155213407
6.18837569434 false 20 1 40 8904940584331855459
8.06230492134 false 25 1 40 8904940584331855459
8.00057548798 false 27 1 40 7170969733900686954
6.0075966395 false 27 1 39 7170969733900686954
7.12534030784 false 34 1 40 7108882242435606404
6.56316153974 false 29 1 39 3408502330831153141
    
(rows: 25, time: 1.6s,     6GB processed, job: job_0XT7nQDNhs0gRi-S33srgx_5ruwO)

On the same dataset, will also suppose that it is unknown whether the child is male or female (on the same dataset) to simulate that an ultrasound was not been performed.



In [16]:

    
%%bigquery
SELECT
    weight_pounds,
    'Unknown' AS is_male,
    IF(mother_age < 18, 'LOW',
         IF(mother_age > 45, 'HIGH',
            CAST(mother_age AS STRING))) AS mother_age,
    IF(plurality > 1, 'Multiple', 'Single') AS plurality,
    CAST(gestation_weeks AS STRING) AS gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000
    AND gestation_weeks > 0
    AND mother_age > 0
    AND plurality > 0
    AND weight_pounds > 0
LIMIT 25









    Out[16]:





    weight_pounds is_male mother_age plurality gestation_weeks hashmonth
7.06140625186 Unknown 34 Single 37 1088037545023002395
6.9996768185 Unknown 23 Single 40 1088037545023002395
9.36082764452 Unknown 32 Single 41 1088037545023002395
6.12444163836 Unknown 24 Single 40 6244544205302024223
9.37405538024 Unknown 34 Single 40 1525201076796226340
6.2501051277 Unknown 30 Single 38 8904940584331855459
7.1870697412 Unknown 34 Single 39 8904940584331855459
6.68662040646 Unknown 18 Single 43 8904940584331855459
7.8153871879 Unknown 28 Single 38 7170969733900686954
7.62578964258 Unknown 20 Single 34 6691862025345277042
6.944561253 Unknown 31 Single 40 2126480030009879160
6.6248909731 Unknown 35 Single 40 2126480030009879160
6.9996768185 Unknown 37 Single 40 2126480030009879160
7.50012615324 Unknown LOW Single 40 7108882242435606404
7.50012615324 Unknown 30 Single 38 5896567601480310696
8.43709076674 Unknown 30 Single 39 5896567601480310696
7.40532738058 Unknown 22 Single 39 5896567601480310696
7.936641432 Unknown 33 Single 39 1403073183891835564
7.87491199864 Unknown 27 Single 39 1403073183891835564
7.1870697412 Unknown 23 Single 40 1403073183891835564
6.93574276252 Unknown 32 Single 41 1403073183891835564
8.24969784404 Unknown 24 Single 39 9068386407968572094
7.12534030784 Unknown 26 Single 41 8029892925374153452
6.9996768185 Unknown 34 Single 38 3408502330831153141
8.37315671076 Unknown 33 Single 40 3408502330831153141
    
(rows: 25, time: 1.4s,     6GB processed, job: job_hGtrg2opHeilvzVEsw9mNEtKNCzv)

Bringing these two separate data sets together, there is now a dataset for male or female children determined with ultrasound or unknown if without.



In [17]:

    
%%bigquery
WITH with_ultrasound AS (
  SELECT
    weight_pounds,
    CAST(is_male AS STRING) AS is_male,
    IF(mother_age < 18, 'LOW',
         IF(mother_age > 45, 'HIGH',
            CAST(mother_age AS STRING))) AS mother_age,
    CAST(plurality AS STRING) AS plurality,
    CAST(gestation_weeks AS STRING) AS gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000
    AND gestation_weeks > 0
    AND mother_age > 0
    AND plurality > 0
    AND weight_pounds > 0
),

without_ultrasound AS (
  SELECT
    weight_pounds,
    'Unknown' AS is_male,
    IF(mother_age < 18, 'LOW',
         IF(mother_age > 45, 'HIGH',
            CAST(mother_age AS STRING))) AS mother_age,
    IF(plurality > 1, 'Multiple', 'Single') AS plurality,
    CAST(gestation_weeks AS STRING) AS gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000
    AND gestation_weeks > 0
    AND mother_age > 0
    AND plurality > 0
    AND weight_pounds > 0
),

preprocessed AS (
  SELECT * from with_ultrasound
  UNION ALL
  SELECT * from without_ultrasound
)

SELECT
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks
FROM
    preprocessed
WHERE
  ABS(MOD(hashmonth, 4)) < 3
LIMIT 25









    Out[17]:





    weight_pounds is_male mother_age plurality gestation_weeks
4.68702769012 Unknown 30 Multiple 33
7.06361087448 Unknown 32 Single 37
7.5618555866 Unknown 31 Single 37
7.25100379718 Unknown 33 Single 37
5.8312268299 Unknown 27 Single 37
8.75014717878 Unknown 24 Single 38
8.8736060455 Unknown 30 Single 38
6.05389371452 Unknown LOW Single 38
7.50012615324 Unknown 23 Single 39
6.93794738514 Unknown 23 Single 39
10.7254890463 Unknown 28 Single 39
8.6200744442 Unknown 31 Single 39
7.89034435698 Unknown 31 Single 39
6.062712205 Unknown 19 Single 39
7.31273323054 Unknown 32 Single 40
7.6279942652 Unknown 30 Single 40
7.7492485093 Unknown 22 Single 40
8.75014717878 Unknown 34 Single 40
7.7492485093 Unknown 30 Single 40
6.2280589015 Unknown 18 Single 40
7.12534030784 Unknown 25 Single 41
6.2501051277 Unknown 28 Single 41
7.7602716224 Unknown 34 Single 43
6.9114919137 Unknown 24 Single 45
5.62399230362 Unknown 29 Single 47
    
(rows: 25, time: 1.6s,     6GB processed, job: job_QE5oV7jaNvRyUgjJmWGu53cOiywf)

Create a new model

With a data set which has been feature engineered, it is ready to create model with the CREATE or REPLACE MODEL statement

This will take 5-10 minutes and will show Done when complete.



In [18]:

    
%%bigquery
CREATE or REPLACE MODEL demo.babyweight_model_fc
OPTIONS
  (model_type='linear_reg', labels=['weight_pounds']) AS
  
WITH with_ultrasound AS (
  SELECT
    weight_pounds,
    CAST(is_male AS STRING) AS is_male,
    IF(mother_age < 18, 'LOW',
         IF(mother_age > 45, 'HIGH',
            CAST(mother_age AS STRING))) AS mother_age,
    CAST(plurality AS STRING) AS plurality,
    CAST(gestation_weeks AS STRING) AS gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000
    AND gestation_weeks > 0
    AND mother_age > 0
    AND plurality > 0
    AND weight_pounds > 0
),

without_ultrasound AS (
  SELECT
    weight_pounds,
    'Unknown' AS is_male,
    IF(mother_age < 18, 'LOW',
         IF(mother_age > 45, 'HIGH',
            CAST(mother_age AS STRING))) AS mother_age,
    IF(plurality > 1, 'Multiple', 'Single') AS plurality,
    CAST(gestation_weeks AS STRING) AS gestation_weeks,
    FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000
    AND gestation_weeks > 0
    AND mother_age > 0
    AND plurality > 0
    AND weight_pounds > 0
),

preprocessed AS (
  SELECT * from with_ultrasound
  UNION ALL
  SELECT * from without_ultrasound
)

SELECT
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks
FROM
    preprocessed
WHERE
  ABS(MOD(hashmonth, 4)) < 3









    Out[18]:




Done

Training Statistics

While the new model is training, review the training statistics in the BigQuery UI to see the below model training: https://bigquery.cloud.google.com/

The training details may also be viewed after the training completes from this notebook.



In [19]:

    
import google.datalab.bigquery as bq
df = bq.Query("SELECT * FROM ML.TRAINING_INFO(MODEL demo.babyweight_model_fc)").execute().result().to_dataframe()
# plot both lines in same graph
import matplotlib.pyplot as plt
plt.plot( 'iteration', 'loss', data=df, marker='o', color='orange', linewidth=2)
plt.plot( 'iteration', 'eval_loss', data=df, marker='', color='green', linewidth=2, linestyle='dashed')
plt.xlabel('iteration')
plt.ylabel('loss')
plt.legend();

Make a prediction with the new model

Perhaps it is of interest to make a prediction of the baby's weight given a number of other factors: Male, Mother is 28 years old, Mother will only have one child, and the baby was born after 38 weeks of pregnancy.

To make this prediction, these values will be passed into the SELECT statement.



In [20]:

    
%%bigquery
SELECT
  *
FROM
  ml.PREDICT(MODEL demo.babyweight_model_fc,
      (SELECT
          'True' AS is_male,
          '28' AS mother_age,
          '1' AS plurality,
          '38' AS gestation_weeks
    ))









    Out[20]:





    predicted_weight_pounds is_male mother_age plurality gestation_weeks
5.85625668152 True 28 1 38
    
(rows: 1, time: 1.4s,     0B processed, job: job__WDzbusrFc-NctUS5cpZSaBjWZD-)

Copyright 2018 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License

source_year	year	month	wday	state	is_male	child_race	weight_pounds	plurality	apgar_1min	apgar_5min	mother_residence_state	mother_race	mother_age	gestation_weeks	lmp	mother_married	mother_birth_state	alcohol_use	weight_gain_pounds	born_alive_alive	born_dead	ever_born	father_race	father_age	record_weight
2001	2001	8	4	NY	True	9	8.313631900019999	1	99	10	NY	2	20	44	10012000	False	NY	False	46	0	0	1	2	21	1
2001	2001	3	5	FL	True	9	8.24969784404	1	99	9	FL	1	24	41	99999999	True	Foreign	False	99	0	0	1	1	26	1
2001	2001	5	4	FL	True	9	4.31224184472	1	99	9	FL	1	19	38	08992000	True	MA	False	25	0	0	1	1	21	1
2001	2001	1	2	MO	True	9	8.375361333379999	1	99	9	MO	1	30	40	04052000	True	MO	False	27	1	0	2	1	31	1
2001	2001	12	4	IL	True	9	9.25059651352	1	99	9	IL	1	28	41	02222001	True	IL	False	40	1	2	2	1	28	1
2001	2001	2	4	NY	False	9	6.06050758238	1	99	9	NY	1	31	38	05172000	True	NY	False	14	1	0	2	1	31	1
2001	2001	4	2	OH	False	9	7.3524164377	1	99	9	OH	1	24	39	07142000	True	KY	False	13	1	1	2	1	24	1
2001	2001	11	2	MI	False	9	6.75055446244	1	99	10	MI	1	35	42	02012001	True	MI	False	15	2	1	3	1	30	1
2001	2001	11	5	CA	False	9	6.13105550622	1	99	99	CA	1	32	38	02142001	True	CA		99	2	0	3	1	22	1
2001	2001	3	1	IL	False	9	10.37495404972	1	99	9	IL	1	34	39	06162000	True	IL	False	42	3	1	4	1	33	1

weight_pounds	is_male	mother_age	plurality	gestation_weeks	hashmonth
6.686620406459999	true	18	1	43	8904940584331855459
9.36082764452	true	32	1	41	1088037545023002395
6.9996768185	true	23	1	40	1088037545023002395
9.37405538024	true	34	1	40	1525201076796226340
8.37315671076	true	33	1	40	3408502330831153141
8.437090766739999	false	30	1	39	5896567601480310696
6.1244416383599996	false	24	1	40	6244544205302024223
7.12534030784	false	26	1	41	8029892925374153452
6.944561253	false	31	1	40	2126480030009879160
7.1870697412	false	23	1	40	1403073183891835564

iteration	loss	eval_loss	duration_ms	learning_rate
5	1.13082175649	1.12667393796	98205	0.4
4	1.13242258421	1.1284767326	126255	0.8
3	1.14355263342	1.1400193056	99166	0.4
2	1.17905497472	1.17629137996	105757	0.4
1	1.57286336318	1.56866519873	98185	0.4
0	9.85574348435	9.86270726649	96382	0.2

predicted_weight_pounds	weight_pounds	is_male	mother_age	plurality	gestation_weeks
3.92758220737	3.62439958728	true	26	1	25
5.91794435659	4.46877005074	true	24	1	33
5.89704728035	4.2108292042	true	23	1	33
6.40508635577	8.31363190002	true	23	1	35
6.42598343201	7.05920162924	true	24	1	35
6.76359127467	6.062712205	true	28	1	36
6.5546205123	5.74965579296	true	18	1	36
7.05940496486	7.3524164377	true	30	1	37
7.01761081238	6.6248909731	true	28	1	37
6.85043420249	6.7571683303	true	20	1	37
7.05940496486	7.68751907594	true	30	1	37
7.48060111246	7.3744626639	true	38	1	38
7.25073327386	9.81277528162	true	27	1	38
7.14624789267	8.12623897732	true	22	1	38
7.12535081644	8.81187661214	true	21	1	38
7.35521865504	6.91369653632	true	32	1	38
7.22983619762	8.68841774542	true	26	1	38
7.29252742633	7.1870697412	true	29	1	38
7.1044537402	6.4374980504	true	20	1	38
7.14624789267	6.9996768185	true	22	1	38
7.25073327386	6.67118804812	true	27	1	38
7.18804204515	6.6248909731	true	24	1	38
7.2957820492	7.25100379718	true	17	1	39
7.50475281157	7.5618555866	true	27	1	39
7.6928264977	7.87491199864	true	36	1	39

weight_pounds	is_male	mother_age	plurality	gestation_weeks	hashmonth
8.8074673669	true	39	1	42	1088037545023002395
7.3744626639	true	38	1	38	1088037545023002395
8.1350574678	true	20	1	39	6244544205302024223
7.25100379718	true	LOW	1	39	1525201076796226340
6.25671899556	true	29	1	41	7146494315947640619
7.62578964258	true	22	1	41	6392072535155213407
7.3524164377	true	30	1	37	8904940584331855459
9.0940683075	true	25	1	40	8904940584331855459
7.87491199864	true	36	1	39	7170969733900686954
7.5618555866	true	27	1	39	6691862025345277042
8.31363190002	true	23	1	35	2126480030009879160
8.06230492134	true	LOW	1	40	2126480030009879160
4.46877005074	true	24	1	33	2126480030009879160
8.12623897732	true	32	1	40	2126480030009879160
6.062712205	true	28	1	36	7108882242435606404
7.56846945446	true	22	1	46	1403073183891835564
7.12534030784	true	20	1	40	9068386407968572094
7.43839671988	false	31	1	38	6392072535155213407
5.43659938092	false	35	2	34	6392072535155213407
6.18837569434	false	20	1	40	8904940584331855459
8.06230492134	false	25	1	40	8904940584331855459
8.00057548798	false	27	1	40	7170969733900686954
6.0075966395	false	27	1	39	7170969733900686954
7.12534030784	false	34	1	40	7108882242435606404
6.56316153974	false	29	1	39	3408502330831153141

weight_pounds	is_male	mother_age	plurality	gestation_weeks	hashmonth
7.06140625186	Unknown	34	Single	37	1088037545023002395
6.9996768185	Unknown	23	Single	40	1088037545023002395
9.36082764452	Unknown	32	Single	41	1088037545023002395
6.12444163836	Unknown	24	Single	40	6244544205302024223
9.37405538024	Unknown	34	Single	40	1525201076796226340
6.2501051277	Unknown	30	Single	38	8904940584331855459
7.1870697412	Unknown	34	Single	39	8904940584331855459
6.68662040646	Unknown	18	Single	43	8904940584331855459
7.8153871879	Unknown	28	Single	38	7170969733900686954
7.62578964258	Unknown	20	Single	34	6691862025345277042
6.944561253	Unknown	31	Single	40	2126480030009879160
6.6248909731	Unknown	35	Single	40	2126480030009879160
6.9996768185	Unknown	37	Single	40	2126480030009879160
7.50012615324	Unknown	LOW	Single	40	7108882242435606404
7.50012615324	Unknown	30	Single	38	5896567601480310696
8.43709076674	Unknown	30	Single	39	5896567601480310696
7.40532738058	Unknown	22	Single	39	5896567601480310696
7.936641432	Unknown	33	Single	39	1403073183891835564
7.87491199864	Unknown	27	Single	39	1403073183891835564
7.1870697412	Unknown	23	Single	40	1403073183891835564
6.93574276252	Unknown	32	Single	41	1403073183891835564
8.24969784404	Unknown	24	Single	39	9068386407968572094
7.12534030784	Unknown	26	Single	41	8029892925374153452
6.9996768185	Unknown	34	Single	38	3408502330831153141
8.37315671076	Unknown	33	Single	40	3408502330831153141

weight_pounds	is_male	mother_age	plurality	gestation_weeks
4.68702769012	Unknown	30	Multiple	33
7.06361087448	Unknown	32	Single	37
7.5618555866	Unknown	31	Single	37
7.25100379718	Unknown	33	Single	37
5.8312268299	Unknown	27	Single	37
8.75014717878	Unknown	24	Single	38
8.8736060455	Unknown	30	Single	38
6.05389371452	Unknown	LOW	Single	38
7.50012615324	Unknown	23	Single	39
6.93794738514	Unknown	23	Single	39
10.7254890463	Unknown	28	Single	39
8.6200744442	Unknown	31	Single	39
7.89034435698	Unknown	31	Single	39
6.062712205	Unknown	19	Single	39
7.31273323054	Unknown	32	Single	40
7.6279942652	Unknown	30	Single	40
7.7492485093	Unknown	22	Single	40
8.75014717878	Unknown	34	Single	40
7.7492485093	Unknown	30	Single	40
6.2280589015	Unknown	18	Single	40
7.12534030784	Unknown	25	Single	41
6.2501051277	Unknown	28	Single	41
7.7602716224	Unknown	34	Single	43
6.9114919137	Unknown	24	Single	45
5.62399230362	Unknown	29	Single	47