Text mining

Anton Petkov, 25915, Sofia University

Lab 08 Multi-Layer Perceptron on Reuters with reguarization techniques

We are going to train a Neural Network to predict the origin of a document coming from the 20newsgroup dataset.

For this puprose we'll use Tensorflow, and sklearn. Your job is to fill in the missing code into the cells below.

You will find the steps you need to perform in the Task section in each cell.

Homework

Tasks

  1. Load and preprocess the data for 20newsgroups
  2. Create a multi layer perceptron (MLP) with N hidden layers with ReLU activation, and an Softmax output layer.
  3. Compute it's gradients (manually or using tensorflow's API)
  4. Compile the model with 'categorical_crossentropy' loss, and add metric 'accuracy'
  5. Fill the dropout logic (it must be per layer)
  6. Fill the l2 regularization logic (it must be per layer)
  7. Fill the logic for early stopping
  8. Fix the plotting function plot_history()

Submission

You must submit your code with experiments:

  1. Different number of layers
  2. Compare l2, droput and early stopping
  3. Learning curves for train and test metrics (acc, loss) per each experiment

In [1]:
import tensorflow as tf

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_20newsgroups
from nltk import TweetTokenizer
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras import regularizers, initializers

In [2]:
def next_batch(x_, y_, batch_size, ids = None):
    if (ids is None):
        # Random sample from the dataset. It can be sequential (but must be shuffled) within epoch, that will guarantee that you'll use all the data.
        # The two approaches are practically equal when using a large number of epochs.
        ids = np.random.choice(x_.shape[0], batch_size, replace=False)


    feed_dict = {
      'x': x_[ids],
      'y': y_[ids]
    }

    return feed_dict

def tweet_tokenize(text):
    tknzr = TweetTokenizer(preserve_case=True, strip_handles=True)
    return tknzr.tokenize(text)
  
def evalute_accuracy(x, y):
    return sess.run(accuracy, feed_dict = next_batch(x, y, len(x)))

In [3]:
hparams = tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = False,
    use_early_stoppoing = True,
    early_stopping_patience = 3,
    use_l2_reg = False,
    layers = 2,
    seed = 42
)


WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Data Loading

We are going to use the 20newsgroup dataset for multi-class text classification with Tensorflow.

First we use the fetch_20newgroup module from sklearn.


In [4]:
print('Loading data...')

# Passing none as we want to train over all the data.
newsgroups_train = fetch_20newsgroups(subset='train',
                                      categories=None)

newsgroups_test = fetch_20newsgroups(subset='test',
                                      categories=None)

print('Data loaded.')


Loading data...
Data loaded.

Preprocessing

In this paragraph you need to pre-process your data and create vectors suitable for feeding the NN. You can try different transorfmations and features, TFIDF would be a good start.

You can use:<

  1. Tokenizer from Keras, and to convert the list in newsgrops_*.data into BOW (Bag-Of-Words) vectors.
  2. Convert the labels to OneHot encoded vectors e.g. for label '2' your vector should look like this [0, 0, ..., 1, 0] helpers can be found here;

Expected output

20 classes

Vectorizing sequence data...

x_train shape: (11314, max_features)

x_test shape: (7532, max_features)

Convert class vector to binary class matrix (for use with categorical_crossentropy)

y_train shape: (11314, 20)

y_test shape: (7532, 20)


In [5]:
num_classes = np.max(newsgroups_train.target) + 1

print(num_classes, 'classes')

print('Vectorizing sequence data...')

tokenizer = Tokenizer(num_words=hparams.max_features)
tokenizer.fit_on_texts(newsgroups_train.data)

x_train = tokenizer.texts_to_matrix(newsgroups_train.data, mode='binary')
x_test = tokenizer.texts_to_matrix(newsgroups_test.data, mode='binary')
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Convert class vector to binary class matrix '
      '(for use with categorical_crossentropy)')

y_train = to_categorical(newsgroups_train.target, num_classes)
y_test = to_categorical(newsgroups_test.target, num_classes)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)


20 classes
Vectorizing sequence data...
x_train shape: (11314, 1000)
x_test shape: (7532, 1000)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
y_train shape: (11314, 20)
y_test shape: (7532, 20)

Model building

For the architecture you can refer to the picture below.

You can find detailed overview of backprop here.

You can find detailed overview of regularization here.


In [6]:
def create_model(hparams):
    input_layer = layers.Input(shape=(hparams.max_features,), name='input')
    hidden = input_layer

    for i in range(hparams.layers):
        #create layers
        hidden = layers.Dense(
            128,
            activation='relu',
            kernel_regularizer=regularizers.l2(hparams.reg_param) if hparams.use_l2_reg else None,
            kernel_initializer=initializers.glorot_normal(seed=hparams.seed),
            name='dense-{}'.format(i)
        )(hidden)

        if hparams.use_dropout:
            #use hparams.dropout_keep_prob and add dropout mask
            hidden = layers.Dropout(rate=1 - hparams.dropout_keep_prob)(hidden)

    # Softmax over classes for ouput
    output_layer = layers.Dense(
        num_classes,
        activation='softmax',
        kernel_regularizer=regularizers.l2(hparams.reg_param) if hparams.use_l2_reg else None,
        kernel_initializer=initializers.glorot_normal(seed=hparams.seed),
        name='output'
    )(hidden)

    if hparams.use_dropout:
        #use hparams.dropout_keep_prob and add dropout mask
        output_layer = layers.Dropout(rate=1 - hparams.dropout_keep_prob)(output_layer)

    model = Model(inputs=[input_layer], outputs=output_layer)

    # Minimize error using cross entropy
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

    model.summary()
    return model

model = create_model(hparams)


WARNING:tensorflow:From /home/tony/.local/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
output (Dense)               (None, 20)                2580      
=================================================================
Total params: 147,220
Trainable params: 147,220
Non-trainable params: 0
_________________________________________________________________

Model training

In this section you'll only need to run the cells, you don't have to modify them!


In [7]:
def train_model(model, hparams):
    full_history = {
        'loss': [[], []],
        'acc': [[], []]
    }

    patience = 0
    best_test_loss = np.inf
    best_epoch = 0

    # Training cycle
    for epoch in range(hparams.max_epochs):
        history = model.fit(
            x=x_train,
            y=y_train,
            batch_size=hparams.batch_size,
            epochs=1,
            shuffle=True
        )

        train_loss = history.history['loss'][0]
        train_acc = history.history['acc'][0]

        test_loss, test_acc = model.evaluate(
            x=x_test,
            y=y_test,
            batch_size=hparams.batch_size
        )

        full_history['loss'][0].append(train_loss)
        full_history['loss'][1].append(test_loss)

        full_history['acc'][0].append(train_acc)
        full_history['acc'][1].append(test_acc)

        if hparams.use_early_stoppoing:
            if test_loss < best_test_loss:
                best_test_loss = test_loss
                best_epoch = epoch
            else:
                if patience < hparams.early_stopping_patience:
                    patience = patience + 1
                else:
                    print('best epoch to stop is: {} with loss: {}'.format(best_epoch, best_test_loss))
                    break

    print("Optimization Finished!")
    return full_history

history = train_model(model, hparams)


WARNING:tensorflow:From /home/tony/.local/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
11314/11314 [==============================] - 1s 87us/sample - loss: 1.5943 - acc: 0.5354
7532/7532 [==============================] - 0s 27us/sample - loss: 1.1848 - acc: 0.6415
11314/11314 [==============================] - 1s 70us/sample - loss: 0.6609 - acc: 0.8035
7532/7532 [==============================] - 0s 33us/sample - loss: 1.1854 - acc: 0.6434
11314/11314 [==============================] - 1s 72us/sample - loss: 0.4241 - acc: 0.8719
7532/7532 [==============================] - 0s 28us/sample - loss: 1.2581 - acc: 0.6443
11314/11314 [==============================] - 1s 72us/sample - loss: 0.2905 - acc: 0.9180
7532/7532 [==============================] - 0s 27us/sample - loss: 1.3887 - acc: 0.6399
11314/11314 [==============================] - 1s 66us/sample - loss: 0.1900 - acc: 0.9499
7532/7532 [==============================] - 0s 25us/sample - loss: 1.5924 - acc: 0.6241
best epoch to stop is: 0 with loss: 1.1848081243335726
Optimization Finished!

In [8]:
def visualize_history(history, key='loss'):
    plt.plot(history[key][0])
    plt.plot(history[key][1])
    plt.title('model {}'.format(key))
    plt.ylabel(key)
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    return plt

In [9]:
visualize_history(history, key='loss').show()



In [10]:
visualize_history(history, key='acc').show()



In [11]:
def run_experiment(hparams, title='Experiment'):
    print('RUNNING EXPERIMENT: {}'.format(title))
    model = create_model(hparams)
    history = train_model(model, hparams)
    visualize_history(history, key='loss').show()
    visualize_history(history, key='acc').show()
    final_test_loss = history['loss'][1][-1]
    final_test_acc = history['acc'][1][-1]
    print('Final test loss: {}'.format(final_test_loss))
    print('Final test accuracy: {}'.format(final_test_acc))

Experiments

All experiments were carried out with the following parameters set to:

  • batch_size = 32
  • max_epochs = 100
  • max_features = 1000
  • learning_rate = 0.03
  • reg_param = 0.03
  • dropout_keep_prob = 0.9
  • seed = 42

The number of neurons in each layer is 128.

A grid search on all these parameters would be too big, so we emphasize on Layer Count, using L2 Regularization, using Dropout or using Early Stopping.

Table with results

Experiment # Layer count L2 Regularization Dropout Early Stopping Accuracy Loss Comment
1 2 yes no no 0.2493 2.8602
2 2 no yes no 0.0426 NaN gradient exploded
3 2 no no yes 0.6279 1.7183 stopped at first epoch with patience 1 (also with patience 3)
4 2 yes yes no 0.1251 2.8817
5 3 no yes yes 0.6480 1.5360 best result
6 3 yes no yes 0.1225 2.8889 stopped at epoch 10
7 3 yes yes yes 0.0528 2.9902

Analysis

  • Early Stopping is very efficient, because it is computationally cheaper than Dropout and L2 Regularization and also early stopping doesn't modify the model itself. Thus, it is simpler than Dropout and L2.
  • I expected better results with L2, but it has a very low accuracy score and even after 100 epochs the loss doesn't seem to improve.
  • The best result is in bold and uses Early Stopping and Dropout with 3 layers.
  • Each experiment has a graph with the loss and accuracy plotted for each epoch for both test and train datasets.

In [12]:
run_experiment(tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = False,
    use_early_stoppoing = False,
    early_stopping_patience = 1,
    use_l2_reg = True,
    layers = 2,
    seed = 42
), title="1) 2 Layers, L2")


RUNNING EXPERIMENT: 1) 2 Layers, L2
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
output (Dense)               (None, 20)                2580      
=================================================================
Total params: 147,220
Trainable params: 147,220
Non-trainable params: 0
_________________________________________________________________
11314/11314 [==============================] - 1s 88us/sample - loss: 4.3533 - acc: 0.1921
7532/7532 [==============================] - 0s 29us/sample - loss: 2.9915 - acc: 0.2029
11314/11314 [==============================] - 1s 85us/sample - loss: 2.9260 - acc: 0.2166
7532/7532 [==============================] - 0s 33us/sample - loss: 2.9334 - acc: 0.2307
11314/11314 [==============================] - 1s 88us/sample - loss: 2.8977 - acc: 0.2380
7532/7532 [==============================] - 0s 34us/sample - loss: 2.9262 - acc: 0.2326
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8933 - acc: 0.2290
7532/7532 [==============================] - 0s 29us/sample - loss: 2.9133 - acc: 0.2432
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8813 - acc: 0.2398
7532/7532 [==============================] - 0s 31us/sample - loss: 2.9108 - acc: 0.2459
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8765 - acc: 0.2347
7532/7532 [==============================] - 0s 30us/sample - loss: 2.9193 - acc: 0.2337
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8684 - acc: 0.2436
7532/7532 [==============================] - 0s 30us/sample - loss: 2.9039 - acc: 0.2331
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8644 - acc: 0.2439
7532/7532 [==============================] - 0s 29us/sample - loss: 2.9007 - acc: 0.2350
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8615 - acc: 0.2388
7532/7532 [==============================] - 0s 30us/sample - loss: 2.9012 - acc: 0.2309
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8581 - acc: 0.2359
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8871 - acc: 0.2525
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8561 - acc: 0.2452
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8828 - acc: 0.2426
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8558 - acc: 0.2357
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8863 - acc: 0.2476
11314/11314 [==============================] - 1s 82us/sample - loss: 2.8549 - acc: 0.2416
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8769 - acc: 0.2347
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8481 - acc: 0.2395
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8773 - acc: 0.2435
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8468 - acc: 0.2437
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8919 - acc: 0.2172
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8467 - acc: 0.2431
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8775 - acc: 0.2365
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8461 - acc: 0.2408
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8792 - acc: 0.2359
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8484 - acc: 0.2389
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8834 - acc: 0.2355
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8426 - acc: 0.2423
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8852 - acc: 0.2286
11314/11314 [==============================] - 1s 82us/sample - loss: 2.8422 - acc: 0.2398
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8756 - acc: 0.2419
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8417 - acc: 0.2395
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8726 - acc: 0.2391
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8393 - acc: 0.2461
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8816 - acc: 0.2262
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8401 - acc: 0.2448
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8780 - acc: 0.2370
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8380 - acc: 0.2447
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8767 - acc: 0.2353
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8388 - acc: 0.2442
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8737 - acc: 0.2301
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8366 - acc: 0.2462
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8751 - acc: 0.2272
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8384 - acc: 0.2433
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8652 - acc: 0.2398
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8347 - acc: 0.2462
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8659 - acc: 0.2383
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8353 - acc: 0.2471
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8701 - acc: 0.2403
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8319 - acc: 0.2503
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8670 - acc: 0.2511
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8330 - acc: 0.2519
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8710 - acc: 0.2365
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8321 - acc: 0.2489
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8724 - acc: 0.2367
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8322 - acc: 0.2464
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8717 - acc: 0.2544
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8306 - acc: 0.2511
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8748 - acc: 0.2512
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8308 - acc: 0.2530
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8747 - acc: 0.2431
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8314 - acc: 0.2508
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8638 - acc: 0.2415
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8314 - acc: 0.2472
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8691 - acc: 0.2416
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8322 - acc: 0.2458
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8679 - acc: 0.2497
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8305 - acc: 0.2486
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8612 - acc: 0.2481
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8274 - acc: 0.2499
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8658 - acc: 0.2370
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8284 - acc: 0.2496
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8676 - acc: 0.2412
11314/11314 [==============================] - 1s 77us/sample - loss: 2.8272 - acc: 0.2545
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8655 - acc: 0.2479
11314/11314 [==============================] - 1s 77us/sample - loss: 2.8269 - acc: 0.2494
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8759 - acc: 0.2426
11314/11314 [==============================] - 1s 77us/sample - loss: 2.8274 - acc: 0.2558
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8651 - acc: 0.2431
11314/11314 [==============================] - 1s 76us/sample - loss: 2.8289 - acc: 0.2495
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8686 - acc: 0.2359
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8251 - acc: 0.2532
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8642 - acc: 0.2339
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8275 - acc: 0.2508
7532/7532 [==============================] - 0s 26us/sample - loss: 2.8625 - acc: 0.2464
11314/11314 [==============================] - 1s 75us/sample - loss: 2.8270 - acc: 0.2530
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8596 - acc: 0.2528
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8264 - acc: 0.2500
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8652 - acc: 0.2431
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8254 - acc: 0.2508
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8684 - acc: 0.2461
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8259 - acc: 0.2530
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8649 - acc: 0.2442
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8261 - acc: 0.2520
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8604 - acc: 0.2428
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8259 - acc: 0.2544
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8758 - acc: 0.2365
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8262 - acc: 0.2532
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8640 - acc: 0.2411
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8244 - acc: 0.2544
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8612 - acc: 0.2473
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8230 - acc: 0.2553
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8606 - acc: 0.2537
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8252 - acc: 0.2593
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8634 - acc: 0.2469
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8255 - acc: 0.2511
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8626 - acc: 0.2427
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8241 - acc: 0.2579
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8606 - acc: 0.2496
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8239 - acc: 0.2537
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8715 - acc: 0.2420
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8235 - acc: 0.2568
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8595 - acc: 0.2431
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8251 - acc: 0.2562
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8623 - acc: 0.2423
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8244 - acc: 0.2553
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8729 - acc: 0.2341
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8245 - acc: 0.2522
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8754 - acc: 0.2359
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8226 - acc: 0.2568
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8603 - acc: 0.2527
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8217 - acc: 0.2537
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8695 - acc: 0.2497
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8245 - acc: 0.2571
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8714 - acc: 0.2365
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8223 - acc: 0.2577
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8620 - acc: 0.2507
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8218 - acc: 0.2502
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8644 - acc: 0.2531
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8241 - acc: 0.2510
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8723 - acc: 0.2346
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8225 - acc: 0.2552
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8582 - acc: 0.2512
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8222 - acc: 0.2577
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8725 - acc: 0.2367
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8229 - acc: 0.2557
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8602 - acc: 0.2484
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8225 - acc: 0.2569
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8656 - acc: 0.2350
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8213 - acc: 0.2521
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8622 - acc: 0.2491
11314/11314 [==============================] - 1s 85us/sample - loss: 2.8226 - acc: 0.2551
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8562 - acc: 0.2434
11314/11314 [==============================] - 1s 77us/sample - loss: 2.8215 - acc: 0.2583
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8662 - acc: 0.2379
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8201 - acc: 0.2566
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8743 - acc: 0.2338
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8220 - acc: 0.2576
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8689 - acc: 0.2343
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8217 - acc: 0.2602
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8641 - acc: 0.2443
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8216 - acc: 0.2528
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8598 - acc: 0.2446
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8209 - acc: 0.2553
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8582 - acc: 0.2301
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8214 - acc: 0.2546
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8640 - acc: 0.2523
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8204 - acc: 0.2557
7532/7532 [==============================] - 0s 27us/sample - loss: 2.8644 - acc: 0.2358
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8195 - acc: 0.2578
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8575 - acc: 0.2451
11314/11314 [==============================] - 1s 76us/sample - loss: 2.8221 - acc: 0.2591
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8599 - acc: 0.2477
11314/11314 [==============================] - 1s 76us/sample - loss: 2.8217 - acc: 0.2539
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8621 - acc: 0.2481
11314/11314 [==============================] - 1s 77us/sample - loss: 2.8205 - acc: 0.2576
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8673 - acc: 0.2359
11314/11314 [==============================] - 1s 77us/sample - loss: 2.8188 - acc: 0.2639
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8638 - acc: 0.2485
11314/11314 [==============================] - 1s 74us/sample - loss: 2.8213 - acc: 0.2604
7532/7532 [==============================] - 0s 27us/sample - loss: 2.8594 - acc: 0.2438
11314/11314 [==============================] - 1s 77us/sample - loss: 2.8200 - acc: 0.2586
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8653 - acc: 0.2383
11314/11314 [==============================] - 1s 76us/sample - loss: 2.8215 - acc: 0.2592
7532/7532 [==============================] - 0s 34us/sample - loss: 2.8568 - acc: 0.2497
11314/11314 [==============================] - 1s 78us/sample - loss: 2.8206 - acc: 0.2579
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8607 - acc: 0.2608
11314/11314 [==============================] - 1s 97us/sample - loss: 2.8192 - acc: 0.2607
7532/7532 [==============================] - 0s 37us/sample - loss: 2.8644 - acc: 0.2539
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8200 - acc: 0.2584
7532/7532 [==============================] - 0s 28us/sample - loss: 2.8730 - acc: 0.2554
11314/11314 [==============================] - 1s 88us/sample - loss: 2.8208 - acc: 0.2588
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8612 - acc: 0.2606
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8213 - acc: 0.2542
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8578 - acc: 0.2472
11314/11314 [==============================] - 1s 80us/sample - loss: 2.8204 - acc: 0.2560
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8574 - acc: 0.2493
11314/11314 [==============================] - 1s 81us/sample - loss: 2.8189 - acc: 0.2584
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8676 - acc: 0.2395
11314/11314 [==============================] - 1s 79us/sample - loss: 2.8194 - acc: 0.2603
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8602 - acc: 0.2493
Optimization Finished!
Final test loss: 2.860218814757677
Final test accuracy: 0.24933616816997528

In [13]:
run_experiment(tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = True,
    use_early_stoppoing = False,
    early_stopping_patience = 1,
    use_l2_reg = False,
    layers = 2,
    seed = 42
), title="2) 2 Layers, Dropout")


RUNNING EXPERIMENT: 2) 2 Layers, Dropout
WARNING:tensorflow:From /home/tony/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
output (Dense)               (None, 20)                2580      
_________________________________________________________________
dropout_2 (Dropout)          (None, 20)                0         
=================================================================
Total params: 147,220
Trainable params: 147,220
Non-trainable params: 0
_________________________________________________________________
11314/11314 [==============================] - 1s 81us/sample - loss: 3.1312 - acc: 0.4314
7532/7532 [==============================] - 0s 28us/sample - loss: 1.2523 - acc: 0.6195
11314/11314 [==============================] - 1s 71us/sample - loss: 2.3319 - acc: 0.6835
7532/7532 [==============================] - 0s 25us/sample - loss: 1.1489 - acc: 0.6483
11314/11314 [==============================] - 1s 74us/sample - loss: 2.0671 - acc: 0.7523
7532/7532 [==============================] - 0s 27us/sample - loss: 1.1652 - acc: 0.6492
11314/11314 [==============================] - 1s 79us/sample - loss: 1.9577 - acc: 0.7858
7532/7532 [==============================] - 0s 28us/sample - loss: 1.2638 - acc: 0.6413
11314/11314 [==============================] - 1s 77us/sample - loss: 1.8881 - acc: 0.8134
7532/7532 [==============================] - 0s 28us/sample - loss: 1.3236 - acc: 0.6407
11314/11314 [==============================] - 1s 74us/sample - loss: 1.8021 - acc: 0.8372
7532/7532 [==============================] - 0s 28us/sample - loss: 1.3979 - acc: 0.6441
11314/11314 [==============================] - 1s 78us/sample - loss: 1.8415 - acc: 0.8445
7532/7532 [==============================] - 0s 27us/sample - loss: 1.4901 - acc: 0.6486
11314/11314 [==============================] - 1s 85us/sample - loss: 1.7454 - acc: 0.8649
7532/7532 [==============================] - 0s 30us/sample - loss: 1.6336 - acc: 0.6361
11314/11314 [==============================] - 1s 89us/sample - loss: 1.7695 - acc: 0.8653
7532/7532 [==============================] - 0s 29us/sample - loss: 1.6942 - acc: 0.6350
11314/11314 [==============================] - 1s 80us/sample - loss: 1.6878 - acc: 0.8736
7532/7532 [==============================] - 0s 29us/sample - loss: 1.7560 - acc: 0.6371
11314/11314 [==============================] - 1s 75us/sample - loss: 1.6563 - acc: 0.8786
7532/7532 [==============================] - 0s 30us/sample - loss: 1.8231 - acc: 0.6389
11314/11314 [==============================] - 1s 82us/sample - loss: 1.7179 - acc: 0.8780
7532/7532 [==============================] - 0s 30us/sample - loss: 1.8975 - acc: 0.6334
11314/11314 [==============================] - 1s 90us/sample - loss: 1.5881 - acc: 0.8885
7532/7532 [==============================] - 0s 29us/sample - loss: 1.9863 - acc: 0.6329
11314/11314 [==============================] - 1s 87us/sample - loss: 1.6793 - acc: 0.8865
7532/7532 [==============================] - 0s 34us/sample - loss: 2.0762 - acc: 0.6353
11314/11314 [==============================] - 1s 79us/sample - loss: 1.6557 - acc: 0.8859
7532/7532 [==============================] - 0s 37us/sample - loss: 2.1668 - acc: 0.6261
11314/11314 [==============================] - 1s 89us/sample - loss: 1.6915 - acc: 0.8834
7532/7532 [==============================] - 0s 32us/sample - loss: 2.2056 - acc: 0.6256
11314/11314 [==============================] - 1s 92us/sample - loss: 1.5060 - acc: 0.8970
7532/7532 [==============================] - 0s 37us/sample - loss: 2.2064 - acc: 0.6356
11314/11314 [==============================] - 1s 88us/sample - loss: 1.7196 - acc: 0.8853
7532/7532 [==============================] - 0s 31us/sample - loss: 2.3038 - acc: 0.6316
11314/11314 [==============================] - 1s 84us/sample - loss: 1.6307 - acc: 0.8892
7532/7532 [==============================] - 0s 30us/sample - loss: 2.3756 - acc: 0.6253
11314/11314 [==============================] - 1s 82us/sample - loss: 1.6323 - acc: 0.8885
7532/7532 [==============================] - 0s 31us/sample - loss: 2.3459 - acc: 0.6287
11314/11314 [==============================] - 1s 82us/sample - loss: 1.6182 - acc: 0.8917
7532/7532 [==============================] - 0s 30us/sample - loss: 2.4403 - acc: 0.6328
11314/11314 [==============================] - 1s 82us/sample - loss: 1.5956 - acc: 0.8933
7532/7532 [==============================] - 0s 30us/sample - loss: 2.3679 - acc: 0.6289
11314/11314 [==============================] - 1s 81us/sample - loss: 1.6403 - acc: 0.8898
7532/7532 [==============================] - 0s 31us/sample - loss: 2.4887 - acc: 0.6273
11314/11314 [==============================] - 1s 82us/sample - loss: 1.6401 - acc: 0.8879
7532/7532 [==============================] - 0s 29us/sample - loss: 2.5090 - acc: 0.6271
11314/11314 [==============================] - 1s 85us/sample - loss: 1.6452 - acc: 0.8889
7532/7532 [==============================] - 0s 29us/sample - loss: 2.5556 - acc: 0.6294
11314/11314 [==============================] - 1s 83us/sample - loss: 1.6190 - acc: 0.8896
7532/7532 [==============================] - 0s 34us/sample - loss: 2.5619 - acc: 0.6284
11314/11314 [==============================] - 1s 78us/sample - loss: 1.6329 - acc: 0.8908
7532/7532 [==============================] - 0s 28us/sample - loss: 2.6036 - acc: 0.6219
11314/11314 [==============================] - 1s 77us/sample - loss: 1.6122 - acc: 0.8927
7532/7532 [==============================] - 0s 29us/sample - loss: 2.6155 - acc: 0.6300
11314/11314 [==============================] - 1s 78us/sample - loss: 1.7915 - acc: 0.8816
7532/7532 [==============================] - 0s 29us/sample - loss: 2.5851 - acc: 0.6341
11314/11314 [==============================] - 1s 86us/sample - loss: 1.6684 - acc: 0.8911
7532/7532 [==============================] - 0s 30us/sample - loss: 2.6829 - acc: 0.6345
11314/11314 [==============================] - 1s 91us/sample - loss: 1.6950 - acc: 0.8862
7532/7532 [==============================] - 0s 31us/sample - loss: 2.7992 - acc: 0.6140
11314/11314 [==============================] - 1s 87us/sample - loss: 1.6889 - acc: 0.8869
7532/7532 [==============================] - 0s 30us/sample - loss: 2.6443 - acc: 0.6312
11314/11314 [==============================] - 1s 87us/sample - loss: 1.6748 - acc: 0.8901
7532/7532 [==============================] - 0s 31us/sample - loss: 2.7467 - acc: 0.6287
11314/11314 [==============================] - 1s 90us/sample - loss: 1.6843 - acc: 0.8897
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8236 - acc: 0.6247
11314/11314 [==============================] - 1s 85us/sample - loss: 1.6230 - acc: 0.8933
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8384 - acc: 0.6265
11314/11314 [==============================] - 1s 84us/sample - loss: 1.6926 - acc: 0.8879
7532/7532 [==============================] - 0s 30us/sample - loss: 2.7998 - acc: 0.6232
11314/11314 [==============================] - 1s 81us/sample - loss: 1.6401 - acc: 0.8915
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8974 - acc: 0.6251
11314/11314 [==============================] - 1s 75us/sample - loss: 1.5891 - acc: 0.8931
7532/7532 [==============================] - 0s 27us/sample - loss: 2.9212 - acc: 0.6183
11314/11314 [==============================] - 1s 75us/sample - loss: nan - acc: 0.7320
7532/7532 [==============================] - 0s 27us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 78us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 35us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 88us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 84us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 30us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 85us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 30us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 82us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 80us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 27us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 88us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 84us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 81us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 36us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 90us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 85us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 31us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 82us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 74us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 26us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 69us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 25us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 71us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 25us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 72us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 25us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 70us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 25us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 32us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 87us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 85us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 85us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 25us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 30us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 84us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 71us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 84us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 34us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 82us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 84us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 82us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 82us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 72us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 33us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 88us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 30us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 98us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 32us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 92us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 36us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 90us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 31us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 88us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 29us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 92us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 31us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 85us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 30us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 94us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 32us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 32us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 85us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 76us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 74us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 81us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 27us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 77us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 28us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 74us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 26us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 71us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 24us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 71us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 25us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 79us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 33us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 88us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 30us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 86us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 30us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 84us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 35us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 89us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 31us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 83us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 31us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 87us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 31us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 82us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 33us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 90us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 34us/sample - loss: nan - acc: 0.0424
11314/11314 [==============================] - 1s 88us/sample - loss: nan - acc: 0.0424
7532/7532 [==============================] - 0s 32us/sample - loss: nan - acc: 0.0424
Optimization Finished!
Final test loss: nan
Final test accuracy: 0.04235262796282768

In [14]:
run_experiment(tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = False,
    use_early_stoppoing = True,
    early_stopping_patience = 3,
    use_l2_reg = False,
    layers = 2,
    seed = 42
), title="3) 2 Layers, Early Stopping")


RUNNING EXPERIMENT: 3) 2 Layers, Early Stopping
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
output (Dense)               (None, 20)                2580      
=================================================================
Total params: 147,220
Trainable params: 147,220
Non-trainable params: 0
_________________________________________________________________
11314/11314 [==============================] - 1s 84us/sample - loss: 1.5789 - acc: 0.5424
7532/7532 [==============================] - 0s 29us/sample - loss: 1.2134 - acc: 0.6318
11314/11314 [==============================] - 1s 77us/sample - loss: 0.6599 - acc: 0.8048
7532/7532 [==============================] - 0s 34us/sample - loss: 1.1829 - acc: 0.6491
11314/11314 [==============================] - 1s 77us/sample - loss: 0.4242 - acc: 0.8746
7532/7532 [==============================] - 0s 33us/sample - loss: 1.2607 - acc: 0.6402
11314/11314 [==============================] - 1s 76us/sample - loss: 0.2918 - acc: 0.9129
7532/7532 [==============================] - 0s 31us/sample - loss: 1.3681 - acc: 0.6394
11314/11314 [==============================] - 1s 82us/sample - loss: 0.1889 - acc: 0.9487
7532/7532 [==============================] - 0s 31us/sample - loss: 1.5764 - acc: 0.6283
11314/11314 [==============================] - 1s 81us/sample - loss: 0.1239 - acc: 0.9684
7532/7532 [==============================] - 0s 29us/sample - loss: 1.7183 - acc: 0.6279
best epoch to stop is: 1 with loss: 1.1829048868701995
Optimization Finished!
Final test loss: 1.7182685572988354
Final test accuracy: 0.6278544664382935

In [15]:
run_experiment(tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = True,
    use_early_stoppoing = False,
    early_stopping_patience = 3,
    use_l2_reg = True,
    layers = 2,
    seed = 42
), title="4) 2 Layers, L2, Dropout")


RUNNING EXPERIMENT: 4) 2 Layers, L2, Dropout
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0         
_________________________________________________________________
output (Dense)               (None, 20)                2580      
_________________________________________________________________
dropout_5 (Dropout)          (None, 20)                0         
=================================================================
Total params: 147,220
Trainable params: 147,220
Non-trainable params: 0
_________________________________________________________________
11314/11314 [==============================] - 1s 99us/sample - loss: 5.5866 - acc: 0.1396
7532/7532 [==============================] - 0s 49us/sample - loss: 2.9565 - acc: 0.1150
11314/11314 [==============================] - 1s 94us/sample - loss: 4.1357 - acc: 0.1158
7532/7532 [==============================] - 0s 33us/sample - loss: 2.9223 - acc: 0.1167
11314/11314 [==============================] - 1s 98us/sample - loss: 4.2017 - acc: 0.1179
7532/7532 [==============================] - 0s 32us/sample - loss: 2.9122 - acc: 0.1223
11314/11314 [==============================] - 1s 93us/sample - loss: 4.1584 - acc: 0.1168
7532/7532 [==============================] - 0s 42us/sample - loss: 2.9104 - acc: 0.1198
11314/11314 [==============================] - 1s 89us/sample - loss: 4.1069 - acc: 0.1133
7532/7532 [==============================] - 0s 36us/sample - loss: 2.9256 - acc: 0.1118
11314/11314 [==============================] - 1s 90us/sample - loss: 4.2008 - acc: 0.1142
7532/7532 [==============================] - 0s 33us/sample - loss: 2.9034 - acc: 0.1232
11314/11314 [==============================] - 1s 92us/sample - loss: 4.1237 - acc: 0.1172
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8978 - acc: 0.1155
11314/11314 [==============================] - 1s 89us/sample - loss: 4.1473 - acc: 0.1185
7532/7532 [==============================] - 0s 32us/sample - loss: 2.9049 - acc: 0.1190
11314/11314 [==============================] - 1s 88us/sample - loss: 4.1833 - acc: 0.1115
7532/7532 [==============================] - 0s 33us/sample - loss: 2.9136 - acc: 0.1106
11314/11314 [==============================] - 1s 93us/sample - loss: 4.1633 - acc: 0.1162
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8933 - acc: 0.1148
11314/11314 [==============================] - 1s 92us/sample - loss: 4.1487 - acc: 0.1207
7532/7532 [==============================] - 0s 36us/sample - loss: 2.8988 - acc: 0.1111
11314/11314 [==============================] - 1s 90us/sample - loss: 4.0773 - acc: 0.1144
7532/7532 [==============================] - 0s 40us/sample - loss: 2.8955 - acc: 0.1192
11314/11314 [==============================] - 1s 95us/sample - loss: 4.1192 - acc: 0.1185
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8914 - acc: 0.1179
11314/11314 [==============================] - 1s 91us/sample - loss: 4.1745 - acc: 0.1175
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8890 - acc: 0.1228
11314/11314 [==============================] - 1s 93us/sample - loss: 4.1445 - acc: 0.1186
7532/7532 [==============================] - 0s 36us/sample - loss: 2.8982 - acc: 0.1138
11314/11314 [==============================] - 1s 95us/sample - loss: 4.1427 - acc: 0.1229
7532/7532 [==============================] - 0s 33us/sample - loss: 2.9031 - acc: 0.1089
11314/11314 [==============================] - 1s 94us/sample - loss: 4.0736 - acc: 0.1151
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8992 - acc: 0.1223
11314/11314 [==============================] - 1s 98us/sample - loss: 4.1574 - acc: 0.1183
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8867 - acc: 0.1279
11314/11314 [==============================] - 1s 92us/sample - loss: 4.1206 - acc: 0.1145
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8883 - acc: 0.1267
11314/11314 [==============================] - 1s 90us/sample - loss: 4.1718 - acc: 0.1183
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8887 - acc: 0.1191
11314/11314 [==============================] - 1s 97us/sample - loss: 4.1782 - acc: 0.1159
7532/7532 [==============================] - 0s 39us/sample - loss: 2.8931 - acc: 0.1219
11314/11314 [==============================] - 1s 93us/sample - loss: 4.0678 - acc: 0.1219
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8912 - acc: 0.1178
11314/11314 [==============================] - 1s 90us/sample - loss: 4.1692 - acc: 0.1199
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8824 - acc: 0.1202
11314/11314 [==============================] - 1s 94us/sample - loss: 4.2044 - acc: 0.1218
7532/7532 [==============================] - 0s 45us/sample - loss: 2.8888 - acc: 0.1198
11314/11314 [==============================] - 1s 91us/sample - loss: 4.1290 - acc: 0.1229
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8863 - acc: 0.1176
11314/11314 [==============================] - 1s 106us/sample - loss: 4.1396 - acc: 0.1179
7532/7532 [==============================] - 0s 36us/sample - loss: 2.9009 - acc: 0.1106
11314/11314 [==============================] - 1s 95us/sample - loss: 4.1119 - acc: 0.1171
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8878 - acc: 0.1200
11314/11314 [==============================] - 1s 98us/sample - loss: 4.2213 - acc: 0.1176
7532/7532 [==============================] - 0s 35us/sample - loss: 2.9009 - acc: 0.1094
11314/11314 [==============================] - 1s 88us/sample - loss: 4.1386 - acc: 0.1206
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8858 - acc: 0.1220
11314/11314 [==============================] - 1s 82us/sample - loss: 4.1615 - acc: 0.1227
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8950 - acc: 0.1147
11314/11314 [==============================] - 1s 87us/sample - loss: 4.0659 - acc: 0.1210
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8884 - acc: 0.1192
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1297 - acc: 0.1189
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8952 - acc: 0.1158
11314/11314 [==============================] - 1s 97us/sample - loss: 4.1527 - acc: 0.1155
7532/7532 [==============================] - 0s 41us/sample - loss: 2.8847 - acc: 0.1175
11314/11314 [==============================] - 1s 99us/sample - loss: 4.1177 - acc: 0.1231
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8895 - acc: 0.1142
11314/11314 [==============================] - 1s 91us/sample - loss: 4.1867 - acc: 0.1176
7532/7532 [==============================] - 0s 34us/sample - loss: 2.8977 - acc: 0.1131
11314/11314 [==============================] - 1s 90us/sample - loss: 4.2095 - acc: 0.1197
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8830 - acc: 0.1221
11314/11314 [==============================] - 1s 90us/sample - loss: 4.0965 - acc: 0.1208
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8873 - acc: 0.1148
11314/11314 [==============================] - 1s 91us/sample - loss: 4.1272 - acc: 0.1218
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8875 - acc: 0.1147
11314/11314 [==============================] - 1s 90us/sample - loss: 4.1873 - acc: 0.1234
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8826 - acc: 0.1136
11314/11314 [==============================] - 1s 89us/sample - loss: 4.1344 - acc: 0.1171
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8851 - acc: 0.1310
11314/11314 [==============================] - 1s 95us/sample - loss: 4.1098 - acc: 0.1148
7532/7532 [==============================] - 0s 35us/sample - loss: 2.8897 - acc: 0.1194
11314/11314 [==============================] - 1s 99us/sample - loss: 4.1383 - acc: 0.1188
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8857 - acc: 0.1255
11314/11314 [==============================] - 1s 90us/sample - loss: 4.1134 - acc: 0.1163
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8807 - acc: 0.1174
11314/11314 [==============================] - 1s 98us/sample - loss: 4.0738 - acc: 0.1183
7532/7532 [==============================] - 0s 34us/sample - loss: 2.8837 - acc: 0.1233
11314/11314 [==============================] - 1s 92us/sample - loss: 4.1327 - acc: 0.1138
7532/7532 [==============================] - 0s 37us/sample - loss: 2.8829 - acc: 0.1172
11314/11314 [==============================] - 1s 96us/sample - loss: 4.1320 - acc: 0.1204
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8939 - acc: 0.1156
11314/11314 [==============================] - 1s 93us/sample - loss: 4.0966 - acc: 0.1156
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8849 - acc: 0.1188
11314/11314 [==============================] - 1s 93us/sample - loss: 4.1573 - acc: 0.1220
7532/7532 [==============================] - 0s 36us/sample - loss: 2.8859 - acc: 0.1109
11314/11314 [==============================] - 1s 88us/sample - loss: 4.0915 - acc: 0.1211
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8881 - acc: 0.1154
11314/11314 [==============================] - 1s 83us/sample - loss: 4.1318 - acc: 0.1197
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8869 - acc: 0.1195
11314/11314 [==============================] - 1s 85us/sample - loss: 4.1835 - acc: 0.1177
7532/7532 [==============================] - 0s 31us/sample - loss: 2.9000 - acc: 0.1078
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1629 - acc: 0.1176
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8841 - acc: 0.1228
11314/11314 [==============================] - 1s 83us/sample - loss: 4.1267 - acc: 0.1235
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8886 - acc: 0.1279
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1210 - acc: 0.1178
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8871 - acc: 0.1213
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1046 - acc: 0.1231
7532/7532 [==============================] - 0s 29us/sample - loss: 2.8838 - acc: 0.1229
11314/11314 [==============================] - 1s 85us/sample - loss: 4.0971 - acc: 0.1199
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8866 - acc: 0.1118
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1207 - acc: 0.1174
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8910 - acc: 0.1179
11314/11314 [==============================] - 1s 83us/sample - loss: 4.1009 - acc: 0.1197
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8842 - acc: 0.1150
11314/11314 [==============================] - 1s 83us/sample - loss: 4.1240 - acc: 0.1170
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8840 - acc: 0.1164
11314/11314 [==============================] - 1s 86us/sample - loss: 4.0967 - acc: 0.1194
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8844 - acc: 0.1314
11314/11314 [==============================] - 1s 85us/sample - loss: 4.1960 - acc: 0.1208
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8855 - acc: 0.1273
11314/11314 [==============================] - 1s 85us/sample - loss: 4.1063 - acc: 0.1191
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8846 - acc: 0.1220
11314/11314 [==============================] - 1s 85us/sample - loss: 4.0920 - acc: 0.1144
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8968 - acc: 0.1217
11314/11314 [==============================] - 1s 84us/sample - loss: 4.1494 - acc: 0.1161
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8825 - acc: 0.1272
11314/11314 [==============================] - 1s 85us/sample - loss: 4.2121 - acc: 0.1167
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8857 - acc: 0.1171
11314/11314 [==============================] - 1s 84us/sample - loss: 4.1331 - acc: 0.1208
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8847 - acc: 0.1341
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1185 - acc: 0.1169
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8865 - acc: 0.1275
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1520 - acc: 0.1133
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8873 - acc: 0.1232
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1139 - acc: 0.1244
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8850 - acc: 0.1217
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1456 - acc: 0.1207
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8823 - acc: 0.1138
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1346 - acc: 0.1163
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8906 - acc: 0.1168
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1528 - acc: 0.1219
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8861 - acc: 0.1191
11314/11314 [==============================] - 1s 83us/sample - loss: 4.1379 - acc: 0.1209
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8806 - acc: 0.1143
11314/11314 [==============================] - 1s 84us/sample - loss: 4.1570 - acc: 0.1193
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8869 - acc: 0.1186
11314/11314 [==============================] - 1s 83us/sample - loss: 4.0560 - acc: 0.1179
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8879 - acc: 0.1140
11314/11314 [==============================] - 1s 84us/sample - loss: 4.0629 - acc: 0.1160
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8865 - acc: 0.1131
11314/11314 [==============================] - 1s 85us/sample - loss: 4.1716 - acc: 0.1185
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8936 - acc: 0.1289
11314/11314 [==============================] - 1s 85us/sample - loss: 4.2010 - acc: 0.1180
7532/7532 [==============================] - 0s 30us/sample - loss: 2.9047 - acc: 0.1134
11314/11314 [==============================] - 1s 89us/sample - loss: 4.1434 - acc: 0.1194
7532/7532 [==============================] - 0s 37us/sample - loss: 2.8845 - acc: 0.1223
11314/11314 [==============================] - 1s 91us/sample - loss: 4.2157 - acc: 0.1201
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8864 - acc: 0.1158
11314/11314 [==============================] - 1s 88us/sample - loss: 4.1079 - acc: 0.1186
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8827 - acc: 0.1211
11314/11314 [==============================] - 1s 88us/sample - loss: 4.1153 - acc: 0.1191
7532/7532 [==============================] - 0s 33us/sample - loss: 2.8821 - acc: 0.1271
11314/11314 [==============================] - 1s 88us/sample - loss: 4.0792 - acc: 0.1218
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8949 - acc: 0.1194
11314/11314 [==============================] - 1s 87us/sample - loss: 4.2039 - acc: 0.1184
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8879 - acc: 0.1176
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1161 - acc: 0.1199
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8824 - acc: 0.1231
11314/11314 [==============================] - 1s 85us/sample - loss: 4.1848 - acc: 0.1149
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8897 - acc: 0.1160
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1535 - acc: 0.1202
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8798 - acc: 0.1191
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1546 - acc: 0.1185
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8926 - acc: 0.1172
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1716 - acc: 0.1153
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8978 - acc: 0.1256
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1723 - acc: 0.1190
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8822 - acc: 0.1297
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1409 - acc: 0.1168
7532/7532 [==============================] - 0s 32us/sample - loss: 2.9241 - acc: 0.1121
11314/11314 [==============================] - 1s 86us/sample - loss: 4.1315 - acc: 0.1187
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8815 - acc: 0.1297
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1322 - acc: 0.1165
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8903 - acc: 0.1196
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1582 - acc: 0.1143
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8909 - acc: 0.1188
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1343 - acc: 0.1213
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8869 - acc: 0.1143
11314/11314 [==============================] - 1s 84us/sample - loss: 4.1945 - acc: 0.1152
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8911 - acc: 0.1195
11314/11314 [==============================] - 1s 87us/sample - loss: 4.1647 - acc: 0.1158
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8923 - acc: 0.1123
11314/11314 [==============================] - 1s 86us/sample - loss: 4.0852 - acc: 0.1210
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8876 - acc: 0.1150
11314/11314 [==============================] - 1s 87us/sample - loss: 4.0902 - acc: 0.1176
7532/7532 [==============================] - 0s 32us/sample - loss: 2.8845 - acc: 0.1140
11314/11314 [==============================] - 1s 84us/sample - loss: 4.1967 - acc: 0.1196
7532/7532 [==============================] - 0s 30us/sample - loss: 2.8817 - acc: 0.1251
Optimization Finished!
Final test loss: 2.881667776431926
Final test accuracy: 0.1250663846731186

In [16]:
run_experiment(tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = True,
    use_early_stoppoing = True,
    early_stopping_patience = 3,
    use_l2_reg = False,
    layers = 3,
    seed = 42
), title="5) 3 Layers, Dropout, Early Stopping")


RUNNING EXPERIMENT: 5) 3 Layers, Dropout, Early Stopping
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dropout_6 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_7 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense-2 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_8 (Dropout)          (None, 128)               0         
_________________________________________________________________
output (Dense)               (None, 20)                2580      
_________________________________________________________________
dropout_9 (Dropout)          (None, 20)                0         
=================================================================
Total params: 163,732
Trainable params: 163,732
Non-trainable params: 0
_________________________________________________________________
11314/11314 [==============================] - 1s 100us/sample - loss: 3.2222 - acc: 0.3730
7532/7532 [==============================] - 0s 38us/sample - loss: 1.3161 - acc: 0.5912
11314/11314 [==============================] - 1s 90us/sample - loss: 2.3678 - acc: 0.6583
7532/7532 [==============================] - 0s 33us/sample - loss: 1.2028 - acc: 0.6321
11314/11314 [==============================] - 1s 89us/sample - loss: 2.1317 - acc: 0.7370
7532/7532 [==============================] - 0s 29us/sample - loss: 1.2260 - acc: 0.6427
11314/11314 [==============================] - 1s 86us/sample - loss: 2.0109 - acc: 0.7753
7532/7532 [==============================] - 0s 32us/sample - loss: 1.4335 - acc: 0.6093
11314/11314 [==============================] - 1s 88us/sample - loss: 1.8735 - acc: 0.8074
7532/7532 [==============================] - 0s 32us/sample - loss: 1.4318 - acc: 0.6452
11314/11314 [==============================] - 1s 93us/sample - loss: 1.8594 - acc: 0.8327
7532/7532 [==============================] - 0s 34us/sample - loss: 1.5360 - acc: 0.6480
best epoch to stop is: 1 with loss: 1.202825215795986
Optimization Finished!
Final test loss: 1.5360263143520183
Final test accuracy: 0.6480350494384766

In [17]:
run_experiment(tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = False,
    use_early_stoppoing = True,
    early_stopping_patience = 3,
    use_l2_reg = True,
    layers = 3,
    seed = 42
), title="6) 3 Layers, L2, Early Stopping")


RUNNING EXPERIMENT: 6) 3 Layers, L2, Early Stopping
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dense-2 (Dense)              (None, 128)               16512     
_________________________________________________________________
output (Dense)               (None, 20)                2580      
=================================================================
Total params: 163,732
Trainable params: 163,732
Non-trainable params: 0
_________________________________________________________________
11314/11314 [==============================] - 1s 103us/sample - loss: 4.8431 - acc: 0.0880
7532/7532 [==============================] - 0s 40us/sample - loss: 3.0186 - acc: 0.0898
11314/11314 [==============================] - 1s 89us/sample - loss: 2.9595 - acc: 0.1141
7532/7532 [==============================] - 0s 34us/sample - loss: 2.9658 - acc: 0.1198
11314/11314 [==============================] - 1s 92us/sample - loss: 2.9254 - acc: 0.1251
7532/7532 [==============================] - 0s 34us/sample - loss: 2.9245 - acc: 0.1172
11314/11314 [==============================] - 1s 90us/sample - loss: 2.9069 - acc: 0.1244
7532/7532 [==============================] - 0s 35us/sample - loss: 2.9098 - acc: 0.1265
11314/11314 [==============================] - 1s 89us/sample - loss: 2.8946 - acc: 0.1282
7532/7532 [==============================] - 0s 35us/sample - loss: 2.9117 - acc: 0.1263
11314/11314 [==============================] - 1s 89us/sample - loss: 2.8884 - acc: 0.1287
7532/7532 [==============================] - 0s 34us/sample - loss: 2.9029 - acc: 0.1350
11314/11314 [==============================] - 1s 88us/sample - loss: 2.8831 - acc: 0.1267
7532/7532 [==============================] - 0s 34us/sample - loss: 2.8940 - acc: 0.1304
11314/11314 [==============================] - 1s 89us/sample - loss: 2.8800 - acc: 0.1260
7532/7532 [==============================] - 0s 35us/sample - loss: 2.8883 - acc: 0.1336
11314/11314 [==============================] - 1s 91us/sample - loss: 2.8764 - acc: 0.1298
7532/7532 [==============================] - 0s 31us/sample - loss: 2.8897 - acc: 0.1293
11314/11314 [==============================] - 1s 89us/sample - loss: 2.8723 - acc: 0.1314
7532/7532 [==============================] - 0s 35us/sample - loss: 2.8964 - acc: 0.1264
11314/11314 [==============================] - 1s 89us/sample - loss: 2.8681 - acc: 0.1343
7532/7532 [==============================] - 0s 36us/sample - loss: 2.8869 - acc: 0.1281
11314/11314 [==============================] - 1s 89us/sample - loss: 2.8685 - acc: 0.1332
7532/7532 [==============================] - 0s 35us/sample - loss: 2.8889 - acc: 0.1225
best epoch to stop is: 10 with loss: 2.8868789524710436
Optimization Finished!
Final test loss: 2.8889461337285542
Final test accuracy: 0.1225438117980957

In [18]:
run_experiment(tf.contrib.training.HParams(
    batch_size = 32,
    max_epochs = 100,
    max_features = 1000,
    learning_rate = 0.03,
    reg_param = 0.03,
    dropout_keep_prob = 0.9,
    use_dropout = True,
    use_early_stoppoing = True,
    early_stopping_patience = 3,
    use_l2_reg = True,
    layers = 3,
    seed = 42
), title="7) 3 Layers, L2, Dropout, Early Stopping")


RUNNING EXPERIMENT: 7) 3 Layers, L2, Dropout, Early Stopping
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 1000)              0         
_________________________________________________________________
dense-0 (Dense)              (None, 128)               128128    
_________________________________________________________________
dropout_10 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense-1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_11 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense-2 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_12 (Dropout)         (None, 128)               0         
_________________________________________________________________
output (Dense)               (None, 20)                2580      
_________________________________________________________________
dropout_13 (Dropout)         (None, 20)                0         
=================================================================
Total params: 163,732
Trainable params: 163,732
Non-trainable params: 0
_________________________________________________________________
11314/11314 [==============================] - 1s 117us/sample - loss: 6.0774 - acc: 0.0651
7532/7532 [==============================] - 0s 43us/sample - loss: 2.9988 - acc: 0.0522
11314/11314 [==============================] - 1s 99us/sample - loss: 4.1731 - acc: 0.0513
7532/7532 [==============================] - 0s 38us/sample - loss: 2.9916 - acc: 0.0522
11314/11314 [==============================] - 1s 102us/sample - loss: 4.1949 - acc: 0.0503
7532/7532 [==============================] - 0s 36us/sample - loss: 2.9909 - acc: 0.0522
11314/11314 [==============================] - 1s 100us/sample - loss: 4.1623 - acc: 0.0476
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9907 - acc: 0.0524
11314/11314 [==============================] - 1s 100us/sample - loss: 4.2385 - acc: 0.0501
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9905 - acc: 0.0526
11314/11314 [==============================] - 1s 99us/sample - loss: 4.1656 - acc: 0.0487
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9904 - acc: 0.0528
11314/11314 [==============================] - 1s 100us/sample - loss: 4.2129 - acc: 0.0485
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9903 - acc: 0.0526
11314/11314 [==============================] - 1s 101us/sample - loss: 4.2686 - acc: 0.0502
7532/7532 [==============================] - 0s 38us/sample - loss: 2.9903 - acc: 0.0528
11314/11314 [==============================] - 1s 100us/sample - loss: 4.3060 - acc: 0.0500
7532/7532 [==============================] - 0s 36us/sample - loss: 2.9903 - acc: 0.0523
11314/11314 [==============================] - 1s 100us/sample - loss: 4.2156 - acc: 0.0477
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9903 - acc: 0.0526
11314/11314 [==============================] - 1s 98us/sample - loss: 4.2391 - acc: 0.0474
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9903 - acc: 0.0530
11314/11314 [==============================] - 1s 100us/sample - loss: 4.2075 - acc: 0.0512
7532/7532 [==============================] - 0s 38us/sample - loss: 2.9903 - acc: 0.0530
11314/11314 [==============================] - 1s 100us/sample - loss: 4.1767 - acc: 0.0519
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9903 - acc: 0.0528
11314/11314 [==============================] - 1s 101us/sample - loss: 4.1545 - acc: 0.0506
7532/7532 [==============================] - 0s 38us/sample - loss: 2.9903 - acc: 0.0523
11314/11314 [==============================] - 1s 98us/sample - loss: 4.2097 - acc: 0.0514
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9903 - acc: 0.0528
11314/11314 [==============================] - 1s 101us/sample - loss: 4.2141 - acc: 0.0512
7532/7532 [==============================] - 0s 37us/sample - loss: 2.9903 - acc: 0.0528
11314/11314 [==============================] - 1s 101us/sample - loss: 4.1768 - acc: 0.0494
7532/7532 [==============================] - 0s 36us/sample - loss: 2.9903 - acc: 0.0528
11314/11314 [==============================] - 1s 100us/sample - loss: 4.1679 - acc: 0.0502
7532/7532 [==============================] - 0s 38us/sample - loss: 2.9903 - acc: 0.0527
11314/11314 [==============================] - 1s 102us/sample - loss: 4.2338 - acc: 0.0494
7532/7532 [==============================] - 0s 38us/sample - loss: 2.9903 - acc: 0.0528
best epoch to stop is: 16 with loss: 2.990277012168127
Optimization Finished!
Final test loss: 2.9902841645988834
Final test accuracy: 0.05284121260046959