필요한 모듈과 타이타닉 데이터 로드

part-1에서 했던 작업과 동일하게 데이터를 로드하여 트레이닝, 테스트 데이터셋을 만듭니다.


In [1]:
import pandas
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
from tensorflow.contrib import skflow

In [2]:
train = pandas.read_csv('data/titanic_train.csv')

In [3]:
y, X = train['Survived'], train[['Age', 'SibSp', 'Fare']].fillna(0)

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

딥뉴럴 네트워크를 이용한 타이타닉 테스트


In [5]:
classifier = skflow.TensorFlowDNNClassifier(
    hidden_units=[10, 20, 10],
    n_classes=2,
    batch_size=128,
    steps=500,
    learning_rate=0.05)

In [6]:
classifier.fit(X_train, y_train)


Step #100, epoch #16, avg. train loss: 0.75368
Step #200, epoch #33, avg. train loss: 0.61258
Step #300, epoch #50, avg. train loss: 0.60673
Step #400, epoch #66, avg. train loss: 0.60201
Step #500, epoch #83, avg. train loss: 0.60029
Out[6]:
TensorFlowDNNClassifier(batch_size=128, class_weight=None, clip_gradients=5.0,
            config=None, continue_training=False, dropout=None,
            hidden_units=[10, 20, 10], learning_rate=0.05, n_classes=2,
            optimizer='Adagrad', steps=500, verbose=1)

In [7]:
print(accuracy_score(classifier.predict(X_test), y_test))


0.664804469274

tanh 활성화함수를 이용한 테스트


In [8]:
import tensorflow as tf

In [9]:
def dnn_tanh(X, y):
    layers = skflow.ops.dnn(X, [10, 20, 10], tf.tanh)
    return skflow.models.logistic_regression(layers, y)

In [10]:
classifier = skflow.TensorFlowEstimator(
    model_fn=dnn_tanh,
    n_classes=2,
    batch_size=128,
    steps=500,
    learning_rate=0.05)

In [11]:
classifier.fit(X_train, y_train)


Step #100, epoch #16, avg. train loss: 0.62736
Step #200, epoch #33, avg. train loss: 0.61053
Step #300, epoch #50, avg. train loss: 0.60340
Step #400, epoch #66, avg. train loss: 0.60201
Step #500, epoch #83, avg. train loss: 0.60182
Out[11]:
TensorFlowEstimator(batch_size=128, class_weight=None, clip_gradients=5.0,
          config=None, continue_training=False, learning_rate=0.05,
          model_fn=<function dnn_tanh at 0x1127b6f28>, n_classes=2,
          optimizer='Adagrad', steps=500, verbose=1)

In [12]:
print(accuracy_score(classifier.predict(X_test), y_test))


0.698324022346

숫자 판별 테스트


In [13]:
import random
from sklearn import datasets

In [14]:
random.seed(42)

In [15]:
digits = datasets.load_digits()

In [16]:
X = digits.images

In [17]:
y = digits.target

In [18]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [19]:
def conv_model(X, y):
    X = tf.expand_dims(X, 3)
    features = tf.reduce_max(skflow.ops.conv2d(X, 12, [3, 3]), [1, 2])
    features = tf.reshape(features, [-1, 12])
    return skflow.models.logistic_regression(features, y)

In [20]:
classifier = skflow.TensorFlowEstimator(
    model_fn=conv_model,
    n_classes=10,
    batch_size=128,
    steps=500,
    learning_rate=0.05)

In [21]:
classifier.fit(X_train, y_train)


Step #100, epoch #8, avg. train loss: 2.66842
Step #200, epoch #16, avg. train loss: 1.42305
Step #300, epoch #25, avg. train loss: 1.07725
Step #400, epoch #33, avg. train loss: 0.88358
Step #500, epoch #41, avg. train loss: 0.74869
Out[21]:
TensorFlowEstimator(batch_size=128, class_weight=None, clip_gradients=5.0,
          config=None, continue_training=False, learning_rate=0.05,
          model_fn=<function conv_model at 0x113313ea0>, n_classes=10,
          optimizer='Adagrad', steps=500, verbose=1)

In [22]:
print(accuracy_score(classifier.predict(X_test), y_test))


0.747222222222

In [ ]: