TF-DNNRegressor - ReLU - Spitzer Calibration Data

This script show a simple example of using tf.contrib.learn library to create our model.

The code is divided in following steps:

  • Load CSVs data
  • Filtering Categorical and Continuous features
  • Converting Data into Tensors
  • Selecting and Engineering Features for the Model
  • Defining The Regression Model
  • Training and Evaluating Our Model
  • Predicting output for test data

v0.1: Added code for data loading, modeling and prediction model.

v0.2: Removed unnecessary output logs.

PS: I was able to get a score of 1295.07972 using this script with 70% (of train.csv) data used for training and rest for evaluation. Script took 2hrs for training and 3000 steps were used.


In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline
from matplotlib import pyplot as plt
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler, minmax_scale

from sklearn.metrics import r2_score

from time import time
start0 = time()
plt.rcParams['figure.dpi'] = 300


/Users/jonathan/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

Load CSVs data

df_train_ori = pd.read_csv('train.csv') df_test_ori = pd.read_csv('test.csv')

In [2]:
nSkip = 20
spitzerDataRaw  = pd.read_csv('pmap_ch2_0p1s_x4_rmulti_s3_7.csv')[::nSkip]

In [3]:
PLDpixels = pd.DataFrame({key:spitzerDataRaw[key] for key in spitzerDataRaw.columns.values if 'pix' in key})
PLDpixels


Out[3]:
pix1 pix2 pix3 pix4 pix5 pix6 pix7 pix8 pix9
0 577.447021 3465.876709 1118.598145 550.165466 2460.376953 994.374207 141.741592 521.385254 694.330688
20 291.785248 696.303345 434.420410 1808.774414 4967.067383 288.906006 465.871857 848.896790 584.660889
40 377.656250 981.755493 432.597595 1676.837646 4871.034668 310.728760 424.352539 825.910522 567.011108
60 957.022339 2441.827637 402.665283 1260.974243 3135.986084 400.791077 353.384857 850.850586 395.431366
80 1322.453613 2773.945801 352.199768 1200.719727 2465.056885 424.106323 333.421387 802.859375 291.649902
100 2007.827271 1886.431885 323.229889 1844.838623 1875.485474 334.029724 427.879150 814.207764 200.617279
120 1566.299927 2280.376465 337.545166 1578.394043 2318.075195 356.124207 381.274872 817.011719 252.403473
140 584.764038 3915.067871 517.304016 605.464722 2961.010010 641.317139 223.332855 682.244873 539.859863
160 543.166992 3069.214844 476.013794 665.751648 3757.745605 596.858459 267.027344 685.150940 605.093628
180 451.328369 2087.002930 467.328674 852.749023 4688.578125 513.216675 318.497223 719.530151 664.354797
200 350.573456 1315.519043 492.374207 951.983887 5410.908691 477.526276 370.131714 732.712585 713.389648
220 348.076660 1362.047607 498.790955 827.281433 5530.880859 632.804199 336.072266 713.381226 794.845093
240 579.250000 3397.274414 475.278137 670.108032 3423.658203 602.615417 258.438812 689.837097 574.808228
260 350.787445 1008.846680 676.405518 669.405518 4093.391113 2214.382080 238.733459 547.705627 893.658569
280 236.634949 493.823120 548.321777 678.572510 4737.148438 2048.604004 353.338562 731.121643 896.231018
300 231.696365 396.443329 532.151245 705.671692 3865.480713 2945.070312 320.593811 652.334106 897.216919
320 388.209534 1274.972656 786.391357 668.139648 3834.410645 2090.838135 220.785294 539.728027 906.975464
340 468.711243 1828.344482 939.319092 633.075684 3413.272949 1758.666504 181.595520 520.476624 891.550415
360 566.607727 3199.396973 1363.388428 557.654236 2245.422607 1175.826294 127.440514 475.335327 718.362183
380 506.795410 3159.170654 717.527649 610.504089 3377.698242 870.390930 211.116119 625.420227 707.206909
400 450.508301 2331.235596 571.043701 660.036743 4318.423340 842.261230 261.374878 651.025085 780.464722
420 579.428467 3765.572754 958.830811 548.711914 2622.649170 899.118591 162.867905 535.411255 627.030457
440 446.872070 1672.167480 1234.078369 597.369141 2740.287354 2181.250977 151.540344 473.339752 862.977661
460 187.009293 472.814728 468.026794 1143.611328 5519.829102 387.314056 518.479492 1198.846924 661.541687
480 379.277435 1345.949463 473.098663 1099.639893 5219.842285 421.992523 382.804688 780.066895 680.612244
500 423.313690 1454.232056 467.154480 1189.994141 4994.313477 392.073364 369.074158 757.771606 614.989807
520 581.744385 1965.211304 448.220428 1174.853149 4270.922852 382.741882 366.949066 792.233032 536.375488
540 1104.622314 3327.730469 378.166748 961.997498 2600.892822 464.678406 319.079376 752.399475 356.152161
560 1313.117676 2412.868164 384.651062 1400.491821 2646.587402 373.034912 362.121490 826.352905 318.729736
580 1036.286621 1455.090576 378.979675 2104.846191 3048.426758 314.207642 425.377472 905.315796 321.185394
... ... ... ... ... ... ... ... ... ...
784700 490.277069 2778.427246 578.676941 669.761353 3909.416016 704.975708 251.191345 651.744385 680.994019
784720 488.245300 2811.830566 570.076233 699.562134 3940.356445 698.549377 253.492432 688.682251 691.294067
784740 528.388733 3077.244873 552.288025 664.305908 3693.994141 690.014465 250.742920 663.641479 670.129395
784760 524.138672 3079.202393 581.365173 653.116577 3608.978760 708.345337 243.585800 631.321350 681.922424
784780 515.400085 3104.853027 577.355774 622.147217 3662.642090 712.164673 222.895447 655.224121 666.488892
784800 507.664368 2975.806396 563.121521 630.466675 3742.873779 729.497925 230.337006 634.210693 702.261719
784820 525.488770 3126.275635 539.566467 651.108948 3622.288330 713.951538 239.975967 661.883240 655.173401
784840 542.380615 3485.441895 591.489502 615.051270 3340.654541 714.595215 213.653717 645.979675 661.064087
784860 564.337219 3457.292480 603.931641 613.116577 3290.934326 704.573792 212.601166 616.944214 636.394165
784880 519.687866 3092.465088 562.407776 678.326660 3722.915771 678.802917 229.890488 667.932495 663.084473
784900 504.459045 2937.398438 579.616455 651.453796 3756.773438 742.832336 243.829163 656.706970 690.176086
784920 535.473450 3312.164062 639.049072 612.776123 3343.647461 759.408752 214.106781 624.147949 656.321777
784940 523.319702 3217.937012 691.253662 631.950073 3356.177002 754.292603 224.973526 623.150574 660.076111
784960 526.920349 3194.552002 628.663696 666.446594 3468.737305 689.300354 223.388428 640.530457 657.530396
784980 523.742554 3142.258545 656.946533 604.927246 3421.854004 794.392700 207.501526 608.047974 692.273865
785000 517.568970 3081.646973 599.225220 677.949646 3638.717285 711.716675 228.419205 651.746582 669.180176
785020 521.389893 3125.872314 584.240967 663.873535 3560.931885 716.694885 225.556595 630.833984 673.392029
785040 486.011292 2723.162598 601.634949 643.771729 3938.919189 764.650757 232.681793 635.134277 714.053467
785060 511.766632 2929.545654 548.445679 666.829224 3825.424072 687.783813 260.151520 680.642822 677.628540
785080 509.366058 3109.286133 620.603149 650.118958 3605.497803 700.597412 226.746719 633.923828 663.708801
785100 515.094360 3108.921387 573.314392 662.386719 3644.168213 743.267822 227.291550 641.129333 696.865051
785120 544.855469 3295.288086 559.269470 645.822754 3474.819336 677.296326 236.733521 635.386780 631.155273
785140 558.862793 3331.642334 663.679565 628.280273 3258.815918 711.148376 218.401016 616.558899 625.158752
785160 493.991119 2989.765137 593.615234 651.431274 3742.384033 740.425659 228.513855 639.505127 705.173889
785180 493.111755 2913.450195 602.129028 640.239563 3805.878418 727.839722 244.469452 632.206238 710.215698
785200 508.138214 2855.612793 582.333862 685.463379 3860.480713 711.120911 244.687363 655.192139 710.882629
785220 497.813751 2773.579590 577.250427 691.648254 3945.376953 726.320862 257.743774 670.169983 696.477234
785240 501.719788 2894.324463 611.618896 634.149353 3797.842285 745.070801 240.009537 643.881836 701.046631
785260 476.991699 2791.731445 579.058228 643.239990 3923.991455 765.078247 235.408035 661.097107 718.653198
785280 506.839264 3055.761230 596.532349 647.959351 3610.522217 731.254211 234.202530 658.012695 675.393433

39265 rows × 9 columns


In [4]:
PLDnorm = np.sum(np.array(PLDpixels),axis=1)

In [5]:
PLDpixels = (PLDpixels.T / PLDnorm).T
PLDpixels


Out[5]:
pix1 pix2 pix3 pix4 pix5 pix6 pix7 pix8 pix9
0 0.054868 0.329321 0.106287 0.052276 0.233781 0.094484 0.013468 0.049541 0.065974
20 0.028092 0.067038 0.041825 0.174144 0.478215 0.027815 0.044853 0.081729 0.056289
40 0.036078 0.093787 0.041326 0.160189 0.465331 0.029684 0.040539 0.078899 0.054167
60 0.093836 0.239420 0.039481 0.123638 0.307482 0.039297 0.034649 0.083425 0.038772
80 0.132691 0.278329 0.035339 0.120477 0.247336 0.042554 0.033455 0.080557 0.029263
100 0.206683 0.194186 0.033273 0.189905 0.193059 0.034384 0.044045 0.083813 0.020651
120 0.158412 0.230632 0.034139 0.159635 0.234445 0.036018 0.038561 0.082631 0.025528
140 0.054803 0.366910 0.048480 0.056743 0.277498 0.060103 0.020930 0.063938 0.050594
160 0.050925 0.287756 0.044629 0.062418 0.352310 0.055959 0.025035 0.064237 0.056731
180 0.041935 0.193913 0.043422 0.079233 0.435637 0.047685 0.029593 0.066855 0.061728
200 0.032415 0.121637 0.045526 0.088023 0.500310 0.044154 0.034224 0.067749 0.065962
220 0.031517 0.123327 0.045163 0.074907 0.500796 0.057298 0.030430 0.064593 0.071970
240 0.054281 0.318357 0.044538 0.062796 0.320830 0.056471 0.024218 0.064644 0.053865
260 0.032804 0.094344 0.063255 0.062600 0.382799 0.207081 0.022325 0.051219 0.083572
280 0.022066 0.046049 0.051131 0.063277 0.441742 0.191033 0.032949 0.068178 0.083574
300 0.021969 0.037589 0.050457 0.066910 0.366512 0.279242 0.030398 0.061852 0.085071
320 0.036246 0.119040 0.073423 0.062382 0.358006 0.195215 0.020614 0.050393 0.084681
340 0.044072 0.171917 0.088323 0.059527 0.320947 0.165366 0.017075 0.048940 0.083832
360 0.054328 0.306766 0.130725 0.053469 0.215297 0.112741 0.012219 0.045576 0.068878
380 0.046987 0.292900 0.066525 0.056602 0.313161 0.080698 0.019573 0.057985 0.065568
400 0.041459 0.214537 0.052551 0.060741 0.397412 0.077511 0.024054 0.059912 0.071824
420 0.054154 0.351935 0.089614 0.051283 0.245116 0.084033 0.015222 0.050040 0.058603
440 0.043135 0.161408 0.119121 0.057662 0.264509 0.210548 0.014628 0.045690 0.083300
460 0.017713 0.044785 0.044331 0.108322 0.522836 0.036686 0.049110 0.113554 0.062661
480 0.035173 0.124818 0.043873 0.101976 0.484068 0.039134 0.035500 0.072340 0.063117
500 0.039700 0.136382 0.043811 0.111601 0.468382 0.036770 0.034613 0.071066 0.057676
520 0.055303 0.186820 0.042610 0.111686 0.406010 0.036385 0.034884 0.075313 0.050990
540 0.107603 0.324160 0.036838 0.093710 0.253357 0.045265 0.031082 0.073292 0.034693
560 0.130815 0.240374 0.038320 0.139520 0.263658 0.037162 0.036075 0.082323 0.031752
580 0.103735 0.145659 0.037937 0.210701 0.305156 0.031453 0.042582 0.090625 0.032152
... ... ... ... ... ... ... ... ... ...
784700 0.045754 0.259291 0.054004 0.062504 0.364839 0.065790 0.023442 0.060823 0.063552
784720 0.045032 0.259344 0.052580 0.064523 0.363431 0.064429 0.023380 0.063519 0.063760
784740 0.048967 0.285174 0.051182 0.061563 0.342330 0.063945 0.023237 0.061501 0.062102
784760 0.048930 0.287454 0.054272 0.060971 0.336911 0.066126 0.022740 0.058936 0.063660
784780 0.047993 0.289115 0.053762 0.057933 0.341054 0.066315 0.020755 0.061013 0.062061
784800 0.047373 0.277691 0.052548 0.058833 0.349271 0.068074 0.021494 0.059182 0.065532
784820 0.048948 0.291203 0.050259 0.060649 0.337405 0.066502 0.022353 0.061652 0.061027
784840 0.050173 0.322418 0.054715 0.056895 0.309025 0.066103 0.019764 0.059756 0.061151
784860 0.052741 0.323108 0.056442 0.057300 0.307560 0.065847 0.019869 0.057658 0.059475
784880 0.048050 0.285929 0.052000 0.062718 0.344220 0.062762 0.021256 0.061757 0.061309
784900 0.046869 0.272910 0.053851 0.060526 0.349037 0.069016 0.022654 0.061014 0.064123
784920 0.050058 0.309632 0.059740 0.057284 0.312575 0.070992 0.020015 0.058347 0.061355
784940 0.048986 0.301217 0.064705 0.059154 0.314157 0.070606 0.021059 0.058330 0.061787
784960 0.049263 0.298666 0.058775 0.062308 0.324300 0.064444 0.020885 0.059885 0.061474
784980 0.049169 0.294994 0.061674 0.056790 0.321242 0.074577 0.019480 0.057083 0.064990
785000 0.048029 0.285969 0.055607 0.062912 0.337663 0.066045 0.021197 0.060480 0.062098
785020 0.048715 0.292062 0.054588 0.062028 0.332711 0.066963 0.021075 0.058941 0.062917
785040 0.045252 0.253553 0.056018 0.059941 0.366752 0.071196 0.021665 0.059137 0.066485
785060 0.047438 0.271550 0.050837 0.061811 0.354593 0.063753 0.024114 0.063091 0.062812
785080 0.047516 0.290049 0.057893 0.060646 0.336338 0.065355 0.021152 0.059136 0.061914
785100 0.047639 0.287532 0.053024 0.061262 0.337035 0.068742 0.021021 0.059296 0.064450
785120 0.050918 0.307953 0.052265 0.060354 0.324730 0.063295 0.022123 0.059378 0.058983
785140 0.052661 0.313934 0.062537 0.059202 0.307072 0.067010 0.020580 0.058097 0.058908
785160 0.045804 0.277220 0.055042 0.060403 0.347005 0.068655 0.021189 0.059297 0.065386
785180 0.045788 0.270527 0.055910 0.059449 0.353393 0.067583 0.022700 0.058703 0.065947
785200 0.046989 0.264068 0.053850 0.063387 0.356992 0.065760 0.022627 0.060588 0.065738
785220 0.045939 0.255951 0.053270 0.063826 0.364086 0.067026 0.023785 0.061844 0.064272
785240 0.046586 0.268748 0.056791 0.058883 0.352643 0.069182 0.022286 0.059787 0.065095
785260 0.044185 0.258607 0.053640 0.059585 0.363492 0.070872 0.021807 0.061240 0.066571
785280 0.047295 0.285146 0.055665 0.060464 0.336913 0.068236 0.021854 0.061402 0.063024

39265 rows × 9 columns

[plt.plot(PLDpixels[key]) for key in PLDpixels.columns.values];

In [6]:
spitzerData = spitzerDataRaw.copy()
for key in spitzerDataRaw.columns: 
    if key in PLDpixels.columns:
        spitzerData[key] = PLDpixels[key]

In [7]:
testPLD = np.array(pd.DataFrame({key:spitzerData[key] for key in spitzerData.columns.values if 'pix' in key}))
assert(not sum(abs(testPLD - np.array(PLDpixels))).all())
print('Confirmed that PLD Pixels have been Normalized to Spec')


Confirmed that PLD Pixels have been Normalized to Spec

In [8]:
notFeatures     = ['flux', 'fluxerr', 'xerr', 'yerr', 'xycov']
feature_columns = spitzerData.drop(notFeatures,axis=1).columns.values
features        = spitzerData.drop(notFeatures,axis=1).values
labels          = spitzerData['flux'].values

In [9]:
stdScaler = StandardScaler()

In [11]:
features_scaled = stdScaler.fit_transform(features)
labels_scaled   = stdScaler.fit_transform(labels[:,None]).ravel()

x_valtest, x_train, y_valtest, y_train = train_test_split(features_scaled, labels_scaled, test_size=0.6, random_state=42)
x_val, x_test, y_val, y_test           = train_test_split(x_valtest, y_valtest, test_size=0.5, random_state=42)

# x_val   = minmax_scale(x_val.astype('float32'))
# x_train = minmax_scale(x_train.astype('float32'))
# x_test  = minmax_scale(x_test.astype('float32'))

# y_val   = minmax_scale(y_val.astype('float32'))
# y_train = minmax_scale(y_train.astype('float32'))
# y_test  = minmax_scale(y_test.astype('float32'))

print(x_val.shape[0]  , 'validation samples')
print(x_train.shape[0], 'train samples')
print(x_test.shape[0] , 'test samples')


7853 validation samples
23559 train samples
7853 test samples

In [12]:
train_df    = pd.DataFrame(np.c_[x_train, y_train], columns=list(feature_columns) + ['flux'])
test_df     = pd.DataFrame(np.c_[x_test , y_test ], columns=list(feature_columns) + ['flux'])
evaluate_df = pd.DataFrame(np.c_[x_val  , y_val  ], columns=list(feature_columns) + ['flux'])
plt.scatter(train_df['xpos'].values, train_df['ypos'].values, c=train_df['flux'].values, alpha=0.1); plt.colorbar();

Filtering Categorical and Continuous features

We store Categorical, Continuous and Target features names in different variables. This will be helpful in later steps.


In [14]:
# categorical_features = [feature for feature in features if 'cat' in feature]
categorical_features  = []
continuous_features   = [feature for feature in train_df.columns]# if 'cat' in feature]
LABEL_COLUMN          = 'flux'

Converting Data into Tensors

When building a TF.Learn model, the input data is specified by means of an Input Builder function. This builder function will not be called until it is later passed to TF.Learn methods such as fit and evaluate. The purpose of this function is to construct the input data, which is represented in the form of Tensors or SparseTensors.

Note that input_fn will be called while constructing the TensorFlow graph, not while running the graph. What it is returning is a representation of the input data as the fundamental unit of TensorFlow computations, a Tensor (or SparseTensor).

More detail on input_fn.


In [15]:
# Converting Data into Tensors
def input_fn(df, training = True):
    # Creates a dictionary mapping from each continuous feature column name (k) to
    # the values of that column stored in a constant Tensor.
    continuous_cols = {k: tf.constant(df[k].values)
                       for k in continuous_features}

    # Creates a dictionary mapping from each categorical feature column name (k)
    # to the values of that column stored in a tf.SparseTensor.
    # categorical_cols = {k: tf.SparseTensor(
    #     indices=[[i, 0] for i in range(df[k].size)],
    #     values=df[k].values,
    #     shape=[df[k].size, 1])
    #     for k in categorical_features}

    # Merges the two dictionaries into one.
    feature_cols = continuous_cols
    # feature_cols = dict(list(continuous_cols.items()) + list(categorical_cols.items()))
    
    if training:
        # Converts the label column into a constant Tensor.
        label = tf.constant(df[LABEL_COLUMN].values)

        # Returns the feature columns and the label.
        return feature_cols, label
    
    # Returns the feature columns    
    return feature_cols

def train_input_fn():
    return input_fn(train_df, training=True)

def eval_input_fn():
    return input_fn(evaluate_df, training=True)

# def test_input_fn():
#     return input_fn(test_df.drop(LABEL_COLUMN,axis=1), training=False)

def test_input_fn():
    return input_fn(test_df, training=False)

Selecting and Engineering Features for the Model

We use tf.learn's concept of FeatureColumn which help in transforming raw data into suitable input features.

These engineered features will be used when we construct our model.


In [16]:
engineered_features = []

for continuous_feature in continuous_features:
    engineered_features.append(
        tf.contrib.layers.real_valued_column(continuous_feature))


# for categorical_feature in categorical_features:
#     sparse_column = tf.contrib.layers.sparse_column_with_hash_bucket(
#         categorical_feature, hash_bucket_size=1000)

#     engineered_features.append(tf.contrib.layers.embedding_column(sparse_id_column=sparse_column, dimension=16,
#                                                                   combiner="sum"))

Defining The Regression Model

Following is the simple DNNRegressor model. More detail about hidden_units, etc can be found here.

model_dir is used to save and restore our model. This is because once we have trained the model we don't want to train it again, if we only want to predict on new data-set.


In [24]:
# train_df = df_train_ori.head(1000)
# evaluate_df = df_train_ori.tail(500)

# test_df = df_test_ori.head(1000)

# MODEL_DIR = "tf_model_spitzer/withNormalization_drop50/relu"
MODEL_DIR = "tf_model_spitzer/adamOptimizer_with_drop50/tanh/"
# MODEL_DIR = "tf_model_spitzer/xgf"

print("train_df.shape = "   , train_df.shape)
print("test_df.shape = "    , test_df.shape)
print("evaluate_df.shape = ", evaluate_df.shape)


train_df.shape =  (23559, 20)
test_df.shape =  (7853, 20)
evaluate_df.shape =  (7853, 20)

In [28]:
nHidden1  = 10
nHidden2  = 5
nHidden3  = 10

regressor = tf.contrib.learn.DNNRegressor(activation_fn=tf.nn.relu, dropout=0.5, optimizer=tf.train.AdamOptimizer,
    feature_columns=engineered_features, hidden_units=[nHidden1, nHidden2, nHidden3], model_dir=MODEL_DIR)

Training and Evaluating Our Model

add progress bar through python logging


In [ ]:
import logging
logging.getLogger().setLevel(logging.INFO)

In [96]:
# Training Our Model
nFitSteps = 100000
start = time()
wrap  = regressor.fit(input_fn=train_input_fn, steps=nFitSteps)
print('TF Regressor took {} seconds'.format(time()-start))


TF Regressor took 1062.6561288833618 seconds

In [29]:
# Evaluating Our Model
print('Evaluating ...')
results = regressor.evaluate(input_fn=eval_input_fn, steps=1)

for key in sorted(results):
    print("{}: {}".format(key, results[key]))

print("Val Acc: {:.3f}".format((1-results['loss'])*100))


Evaluating ...
global_step: 100002
loss: 0.356687068939209
Val Acc: 64.331

Track Scalable Growth

Shrunk data set to 23559 Training samples and 7853 Val/Test samples

n_iters time (s) val acc multicore gpu
100 5.869 6.332 yes no
200 6.380 13.178 yes no
500 8.656 54.220 yes no
1000 12.170 66.596 yes no
2000 19.891 62.996 yes no
5000 43.589 76.586 yes no
10000 80.581 66.872 yes no
20000 162.435 78.927 yes no
50000 535.584 75.493 yes no
100000 1062.656 73.162 yes no

In [98]:
nItersList = [100,200,500,1000,2000,5000,10000,20000,50000,100000]
rtimesList = [5.869, 6.380, 8.656, 12.170, 19.891, 43.589, 80.581, 162.435, 535.584, 1062.656]
valAccList = [6.332, 13.178, 54.220, 66.596, 62.996, 76.586, 66.872, 78.927, 75.493, 73.162]

In [106]:
plt.loglog(nItersList, rtimesList,'o-');
plt.twinx()
plt.semilogx(nItersList, valAccList,'o-', color='orange');


Predicting output for test data

Most of the time prediction script would be separate from training script (we need not to train on same data again) but I am providing both in same script here; as I am not sure if we can create multiple notebook and somehow share data between them in Kaggle.


In [47]:
def de_median(x):
    return x - np.median(x)

In [ ]:
predicted_output = list(regressor.predict(input_fn=test_input_fn))
# x = list(predicted_output)
# print([predicted_output() for _ in range(10)]) plt.plot((predicted_output - np.median(predicted_output)) / np.std(predicted_output),'.',alpha=0.1); plt.plot((test_df['flux'].values - np.median(test_df['flux'].values)) / np.std(test_df['flux'].values),'.',alpha=0.1);
plt.plot(de_median(x - test_df['flux'].values)/x,'.',alpha=0.1); plt.ylim(-1.,1.);

In [ ]:
r2_score(test_df['flux'].values,predicted_output)*100

In [ ]:
print('Full notebook took {} seconds'.format(time()-start0))
saveDir = 'tfSaveModels' regressor.export_savedmodel(saveDir, regressor)
saveDir = 'tfSaveModels' reg_args = {'feature_columns': fc, 'hidden_units': hu_array, ...} regressor = tf.contrib.learn.DNNRegressor(**reg_args) pickle.dump(reg_args, open('reg_args.pkl', 'wb'))
reg_args = pickle.load(open('reg_args.pkl', 'rb')) # On another machine and so my model dir path changed: reg_args['model_dir'] = NEW_MODEL_DIR regressor = tf.contrib.learn.DNNRegressor(**reg_args)