第13章 ニューラルネットワーク

13.1 Theano を使った式の構築、コンパイル、実行

13.1.1 Theano とは何か

13.1.2 はじめてのTheano

$ pip install Theano

In [1]:
import theano
from theano import tensor as T

# 初期化: scalar メソッドではスカラー(単純な配列)を生成
x1 = T.scalar()
w1 = T.scalar()
w0 = T.scalar()
z1 = w1 * x1 + w0

# コンパイル
net_input = theano.function(inputs=[w1, x1, w0], outputs=z1)

# 実行
net_input(2.0, 1.0, 0.5)


Out[1]:
array(2.5)

13.1.3 Theano を設定する


In [2]:
# 浮動小数点数の型のデフォルトを確認
print(theano.config.floatX)

# 浮動小数点数の型を float32 に設定
# GPU の場合は float32 にしないといけないらしい
theano.config.floatX = 'float32'

# CPU と GPU どっちを使うかの設定を確認
print(theano.config.device)


float64
cpu
  • 環境変数で設定の変更が可能
export THEANO_FLAGS=floatX=float32
  • cpuを使って計算する場合
THEANO_FLAGS=device=cpu,floatX=float64 python <pythonスクリプト>
  • gpuを使う場合
THEANO_FLAGS=device=gpu,floatX=float32 python <pythonスクリプト>
  • ~/.theanorc ファイルにデフォルト設定を書くことも可能
[global]
floatX=float32
device-gpu

13.1.4 配列構造を操作する


In [3]:
import numpy as np

# 初期化
# Theanoを64ビットモードで実行している場合は、fmatrixの代わりにdmatrixを使用する必要がある
x = T.fmatrix(name='x')
x_sum = T.sum(x, axis=0)

# コンパイル
calc_sum = theano.function(inputs=[x], outputs=x_sum)

# 実行(Pythonリスト)
ary = [[1, 2, 3], [1, 2, 3]]
print('Column sum:', calc_sum(ary))

# 実行(Numpy配列)
ary = np.array([[1, 2, 3], [1, 2, 3]], dtype=theano.config.floatX)
print('Column sum:', calc_sum(ary))


Column sum: [ 2.  4.  6.]
Column sum: [ 2.  4.  6.]

In [4]:
import theano
from theano import tensor as T

# 初期化
x = T.fmatrix('x')
w = theano.shared(np.asarray([[0.0, 0.0, 0.0]], dtype=theano.config.floatX))
z = x.dot(w.T)
update = [[w, w + 1.0]]

# コンパイル
net_input = theano.function(inputs=[x], updates=update, outputs=z)

# 実行
data = np.array([[1, 2, 3]], dtype=theano.config.floatX)
for i in range(5):
    print('z{}:'.format(i), net_input(data))


z0: [[ 0.]]
z1: [[ 6.]]
z2: [[ 12.]]
z3: [[ 18.]]
z4: [[ 24.]]

In [5]:
import theano
from theano import tensor as T

# 初期化
data = np.array([[1, 2, 3]], dtype=theano.config.floatX)
x = T.fmatrix('x')
w = theano.shared(np.asarray([[0.0, 0.0, 0.0]], dtype=theano.config.floatX))
z = x.dot(w.T)
update = [[w, w + 1.0]]

# コンパイル
net_input = theano.function(inputs=[], updates=update, givens={x: data}, outputs=z)

# 実行
for i in range(5):
    print('z{}:'.format(i), net_input())


z0: [[ 0.]]
z1: [[ 6.]]
z2: [[ 12.]]
z3: [[ 18.]]
z4: [[ 24.]]

13.1.5 線形回帰の例


In [6]:
import numpy as np
# 10個のトレーニングサンプルが含まれた、1次元のデータセットを作成する
X_train = np.asarray([[0.0], [1.0], [2.0], [3.0], [4.0],
                      [5.0], [6.0], [7.0], [8.0], [9.0]], 
                     dtype=theano.config.floatX)

y_train = np.asarray([1.0, 1.3, 3.1, 2.0, 5.0, 
                      6.3, 6.6, 7.4, 8.0, 9.0], 
                     dtype=theano.config.floatX)

In [7]:
import theano
from theano import tensor as T
import numpy as np

def training_linreg(X_train, y_train, eta, epochs):
    costs = []
    # 配列の初期化
    eta0 = T.fscalar('eta0') # float32型のスカラーのインスタンス
    y = T.fvector(name='y') # float32型のベクトルのインスタンス
    X = T.fmatrix(name='X') # float32型の行列のインスタンス
    # 重み w を関数内で参照可能な共有変数として作成
    w = theano.shared(np.zeros(shape=(X_train.shape[1] + 1), dtype=theano.config.floatX), name='w')
    
    # コストの計算
    net_input = T.dot(X, w[1:]) + w[0] # 重みを用いて総入力を計算
    errors = y - net_input # yと総入力の誤差
    cost = T.sum(T.pow(errors, 2)) # 誤差との2重和
    
    # 重みの更新
    gradient = T.grad(cost, wrt=w) # コストの勾配
    update = [(w, w - eta0 * gradient)] # コストの勾配に学習率をかけて、重み w を更新
    
    # モデルのコンパイル
    train = theano.function(inputs=[eta0], outputs=cost, updates=update, givens={X: X_train, y: y_train})
    
    for _ in range(epochs):
        costs.append(train(eta))
        
    return costs, w

In [8]:
import matplotlib.pyplot as plt
costs, w = training_linreg(X_train, y_train, eta=0.001, epochs=10)

plt.plot(range(1, len(costs) + 1), costs)
plt.xlabel('Epochs')
plt.ylabel('Cost')
plt.tight_layout()
plt.show()



In [9]:
# 入力特徴量に基づいて予測する
def predict_linreg(X, w):
    Xt = T.matrix(name='X')
    net_input = T.dot(Xt, w[1:]) + w[0]
    predict = theano.function(inputs=[Xt], givens={w: w}, outputs=net_input)
    
    return predict(X)

In [10]:
import matplotlib.pyplot as plt
plt.scatter(X_train, y_train, marker='s', s=50)
plt.plot(range(X_train.shape[0]), predict_linreg(X_train, w), color='gray', marker='o', markersize=4, linewidth=3)
plt.xlabel('x')
plt.ylabel('y')
plt.show()


13.2 フィードフォワードニューラルネットワークでの活性化関数の選択

13.2.1 ロジスティック関数のまとめ


In [11]:
import numpy as np

X = np.array([[1, 1.4, 1.5]])
w = np.array([0.0, 0.2, 0.4])

def net_input(X, w):
    z = X.dot(w)
    return z

def logistic(z):
    return 1.0 / (1.0 + np.exp(-z))

def logistic_activation(X, w):
    z = net_input(X, w)
    return logistic(z)
print('P(y=1|x) = {:.3f}'.format(logistic_activation(X, w)[0]))


P(y=1|x) = 0.707

In [12]:
# W : array, shape = [n_output_units, n_hidden_units+1]
# 隠れ層 -> 出力層の重み行列
# 最初の列 (W[:][0]) がバイアスユニットであることに注意
W = np.array([[1.1, 1.2, 1.3, 0.5],
              [0.1, 0.2, 0.4, 0.1],
              [0.2, 0.5, 2.1, 1.9]])

# A : array, shape = [n_hidden+1, n_samples]
# 各礼装の活性化
# 最初の要素 (A[0][0] = 1) がバイアスユニットであることに注意

A = np.array([[1.0], 
              [0.1], 
              [0.3], 
              [0.7]])

# Z : array, shape = [n_output_units, n_samples]
# 出力層の総入力

Z = W.dot(A) 
y_probas = logistic(Z)
print('Probabilities:\n', y_probas)


Probabilities:
 [[ 0.87653295]
 [ 0.57688526]
 [ 0.90114393]]

In [13]:
y_class = np.argmax(Z, axis=0)
print('predicted class label: %d' % y_class[0])


predicted class label: 2

13.2.2 ソフトマックス関数を使って多クラス分類の確率を推定する


In [14]:
def softmax(z):
    return np.exp(z) / np.sum(np.exp(z))

def sotmax_activation(X, w):
    z = net_input(X, w)
    return softmax(z)

y_probas = softmax(Z)
print(y_probas)
print(y_probas.sum())


[[ 0.40386493]
 [ 0.07756222]
 [ 0.51857284]]
1.0

In [15]:
y_class = np.argmax(Z, axis=0)
y_class[0]


Out[15]:
2

双曲線正接関数を使って出力範囲を拡大する

  • 双曲線正接関数(hyperbolic tangent: tanh)

13.3 Kerasを使ったニューラルネットワークの効率的なトレーニング

pip install Keras
  • MNIST データセット http://yann.lecun.com/exdb/mnist/
    • train-images-idx3-ubyte.gz: training set images (9,912,422 bytes)
    • train-labels-idx1-ubyte.gz: training set labels (28,881 bytes)
    • t10k-images-idx3-ubyte.gz: test set images (1,648,877 bytes)
    • t10k-labels-idx1-ubyte.gz: test set labels (4,542 bytes)

In [16]:
import os
import struct
import numpy as np
 
def load_mnist(path, kind='train'):
    """Load MNIST data from `path`"""
    labels_path = os.path.join(path, 
                               '%s-labels-idx1-ubyte' % kind)
    images_path = os.path.join(path, 
                               '%s-images-idx3-ubyte' % kind)
        
    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II', 
                                 lbpath.read(8))
        labels = np.fromfile(lbpath, 
                             dtype=np.uint8)

    with open(images_path, 'rb') as imgpath:
        magic, num, rows, cols = struct.unpack(">IIII", 
                                               imgpath.read(16))
        images = np.fromfile(imgpath, 
                             dtype=np.uint8).reshape(len(labels), 784)
 
    return images, labels

In [17]:
X_train, y_train = load_mnist('mnist', kind='train')
print('Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1]))


Rows: 60000, columns: 784

In [18]:
X_test, y_test = load_mnist('mnist', kind='t10k')
print('Rows: %d, columns: %d' % (X_test.shape[0], X_test.shape[1]))


Rows: 10000, columns: 784

In [19]:
import theano 

theano.config.floatX = 'float32'
X_train = X_train.astype(theano.config.floatX)
X_test = X_test.astype(theano.config.floatX)

In [20]:
from keras.utils import np_utils

print('First 3 labels: ', y_train[:3])

y_train_ohe = np_utils.to_categorical(y_train) 
print('\nFirst 3 labels (one-hot):\n', y_train_ohe[:3])


Using Theano backend.
First 3 labels:  [5 0 4]

First 3 labels (one-hot):
 [[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]]

In [21]:
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD

np.random.seed(1) 

# モデルを初期化
model = Sequential()

# 1つ目の隠れ層を追加
model.add(Dense(input_dim=X_train.shape[1], # 入力ユニット数
                output_dim=50, # 出力ユニット数
                init='uniform', # 重みを一様乱数で初期化
                activation='tanh')) # 活性化関数(双曲線正接関数)

# 2つ目の隠れ層を追加
model.add(Dense(input_dim=50, 
                output_dim=50, 
                init='uniform', 
                activation='tanh'))

# 出力層を追加
model.add(Dense(input_dim=50, 
                output_dim=y_train_ohe.shape[1], 
                init='uniform', 
                activation='softmax'))

# モデルコンパイル時のオプティマイザを設定
# SGD: 確率的勾配降下法
# 引数に学習率、荷重減衰定数、モーメンタム学習を設定
sgd = SGD(lr=0.001, decay=1e-7, momentum=.9)
# モデルをコンパイル
model.compile(loss='categorical_crossentropy', # コスト関数
              optimizer=sgd, # オプティマイザ
              metrics=['accuracy']) # モデルの評価指標

In [22]:
model.fit(X_train, # トレーニングデータ
          y_train_ohe, # 出力データ
          nb_epoch=50, # エポック数
          batch_size=300, # バッチサイズ
          verbose=1,  # 実行時にメッセージを出力
          validation_split=0.1) # 検証用データの割合


Train on 54000 samples, validate on 6000 samples
Epoch 1/50
54000/54000 [==============================] - 2s - loss: 2.2290 - acc: 0.3595 - val_loss: 2.1092 - val_acc: 0.5347
Epoch 2/50
54000/54000 [==============================] - 2s - loss: 1.8848 - acc: 0.5292 - val_loss: 1.6075 - val_acc: 0.5588
Epoch 3/50
54000/54000 [==============================] - 1s - loss: 1.3915 - acc: 0.5868 - val_loss: 1.1670 - val_acc: 0.6640
Epoch 4/50
54000/54000 [==============================] - 1s - loss: 1.0612 - acc: 0.6949 - val_loss: 0.9042 - val_acc: 0.7650
Epoch 5/50
54000/54000 [==============================] - 1s - loss: 0.8575 - acc: 0.7705 - val_loss: 0.7342 - val_acc: 0.8242
Epoch 6/50
54000/54000 [==============================] - 1s - loss: 0.7237 - acc: 0.8125 - val_loss: 0.6145 - val_acc: 0.8625
Epoch 7/50
54000/54000 [==============================] - 1s - loss: 0.6247 - acc: 0.8454 - val_loss: 0.5443 - val_acc: 0.8752
Epoch 8/50
54000/54000 [==============================] - 1s - loss: 0.5552 - acc: 0.8614 - val_loss: 0.4762 - val_acc: 0.8880
Epoch 9/50
54000/54000 [==============================] - 1s - loss: 0.5000 - acc: 0.8754 - val_loss: 0.4247 - val_acc: 0.8980
Epoch 10/50
54000/54000 [==============================] - 2s - loss: 0.4582 - acc: 0.8844 - val_loss: 0.3914 - val_acc: 0.9085
Epoch 11/50
54000/54000 [==============================] - 2s - loss: 0.4267 - acc: 0.8906 - val_loss: 0.3651 - val_acc: 0.9122
Epoch 12/50
54000/54000 [==============================] - 2s - loss: 0.3994 - acc: 0.8971 - val_loss: 0.3357 - val_acc: 0.9155
Epoch 13/50
54000/54000 [==============================] - 1s - loss: 0.3742 - acc: 0.9020 - val_loss: 0.3320 - val_acc: 0.9162
Epoch 14/50
54000/54000 [==============================] - 1s - loss: 0.3633 - acc: 0.9019 - val_loss: 0.3170 - val_acc: 0.9188
Epoch 15/50
54000/54000 [==============================] - 1s - loss: 0.3464 - acc: 0.9052 - val_loss: 0.2985 - val_acc: 0.9222
Epoch 16/50
54000/54000 [==============================] - 1s - loss: 0.3372 - acc: 0.9074 - val_loss: 0.2878 - val_acc: 0.9192
Epoch 17/50
54000/54000 [==============================] - 1s - loss: 0.3253 - acc: 0.9105 - val_loss: 0.2899 - val_acc: 0.9207
Epoch 18/50
54000/54000 [==============================] - 1s - loss: 0.3214 - acc: 0.9108 - val_loss: 0.2739 - val_acc: 0.9253
Epoch 19/50
54000/54000 [==============================] - 1s - loss: 0.3076 - acc: 0.9147 - val_loss: 0.2612 - val_acc: 0.9300
Epoch 20/50
54000/54000 [==============================] - 1s - loss: 0.2962 - acc: 0.9163 - val_loss: 0.2515 - val_acc: 0.9327
Epoch 21/50
54000/54000 [==============================] - 1s - loss: 0.2946 - acc: 0.9165 - val_loss: 0.2513 - val_acc: 0.9288
Epoch 22/50
54000/54000 [==============================] - 1s - loss: 0.2830 - acc: 0.9193 - val_loss: 0.2571 - val_acc: 0.9275
Epoch 23/50
54000/54000 [==============================] - 1s - loss: 0.2803 - acc: 0.9202 - val_loss: 0.2499 - val_acc: 0.9268
Epoch 24/50
54000/54000 [==============================] - 1s - loss: 0.2758 - acc: 0.9215 - val_loss: 0.2549 - val_acc: 0.9293
Epoch 25/50
54000/54000 [==============================] - 2s - loss: 0.2699 - acc: 0.9234 - val_loss: 0.2251 - val_acc: 0.9373
Epoch 26/50
54000/54000 [==============================] - 1s - loss: 0.2601 - acc: 0.9247 - val_loss: 0.2321 - val_acc: 0.9358
Epoch 27/50
54000/54000 [==============================] - 1s - loss: 0.2613 - acc: 0.9252 - val_loss: 0.2215 - val_acc: 0.9385
Epoch 28/50
54000/54000 [==============================] - 1s - loss: 0.2697 - acc: 0.9221 - val_loss: 0.2310 - val_acc: 0.9358
Epoch 29/50
54000/54000 [==============================] - 1s - loss: 0.2574 - acc: 0.9260 - val_loss: 0.2257 - val_acc: 0.9352
Epoch 30/50
54000/54000 [==============================] - 2s - loss: 0.2586 - acc: 0.9266 - val_loss: 0.2197 - val_acc: 0.9367
Epoch 31/50
54000/54000 [==============================] - 1s - loss: 0.2531 - acc: 0.9274 - val_loss: 0.2132 - val_acc: 0.9413
Epoch 32/50
54000/54000 [==============================] - 1s - loss: 0.2447 - acc: 0.9296 - val_loss: 0.2244 - val_acc: 0.9375
Epoch 33/50
54000/54000 [==============================] - 1s - loss: 0.2495 - acc: 0.9276 - val_loss: 0.2131 - val_acc: 0.9393
Epoch 34/50
54000/54000 [==============================] - 1s - loss: 0.2444 - acc: 0.9292 - val_loss: 0.2160 - val_acc: 0.9397
Epoch 35/50
54000/54000 [==============================] - 1s - loss: 0.2413 - acc: 0.9297 - val_loss: 0.2132 - val_acc: 0.9382
Epoch 36/50
54000/54000 [==============================] - 1s - loss: 0.2377 - acc: 0.9301 - val_loss: 0.2226 - val_acc: 0.9348
Epoch 37/50
54000/54000 [==============================] - 1s - loss: 0.2327 - acc: 0.9321 - val_loss: 0.2059 - val_acc: 0.9417
Epoch 38/50
54000/54000 [==============================] - 1s - loss: 0.2288 - acc: 0.9332 - val_loss: 0.2188 - val_acc: 0.9373
Epoch 39/50
54000/54000 [==============================] - 1s - loss: 0.2339 - acc: 0.9312 - val_loss: 0.2035 - val_acc: 0.9388
Epoch 40/50
54000/54000 [==============================] - 1s - loss: 0.2183 - acc: 0.9365 - val_loss: 0.1975 - val_acc: 0.9450
Epoch 41/50
54000/54000 [==============================] - 1s - loss: 0.2266 - acc: 0.9340 - val_loss: 0.2092 - val_acc: 0.9417
Epoch 42/50
54000/54000 [==============================] - 1s - loss: 0.2182 - acc: 0.9351 - val_loss: 0.1980 - val_acc: 0.9435
Epoch 43/50
54000/54000 [==============================] - 1s - loss: 0.2193 - acc: 0.9349 - val_loss: 0.1987 - val_acc: 0.9430
Epoch 44/50
54000/54000 [==============================] - 1s - loss: 0.2146 - acc: 0.9366 - val_loss: 0.1945 - val_acc: 0.9475
Epoch 45/50
54000/54000 [==============================] - 1s - loss: 0.2119 - acc: 0.9363 - val_loss: 0.1890 - val_acc: 0.9460
Epoch 46/50
54000/54000 [==============================] - 1s - loss: 0.2108 - acc: 0.9383 - val_loss: 0.1964 - val_acc: 0.9445
Epoch 47/50
54000/54000 [==============================] - 1s - loss: 0.2130 - acc: 0.9372 - val_loss: 0.1955 - val_acc: 0.9437
Epoch 48/50
54000/54000 [==============================] - 1s - loss: 0.2118 - acc: 0.9377 - val_loss: 0.1985 - val_acc: 0.9460
Epoch 49/50
54000/54000 [==============================] - 1s - loss: 0.2149 - acc: 0.9361 - val_loss: 0.1947 - val_acc: 0.9438
Epoch 50/50
54000/54000 [==============================] - 1s - loss: 0.2133 - acc: 0.9375 - val_loss: 0.1976 - val_acc: 0.9420
Out[22]:
<keras.callbacks.History at 0x117d35320>

In [23]:
y_train_pred = model.predict_classes(X_train, verbose=0)
print('First 3 predictions: ', y_train_pred[:3])


First 3 predictions:  [5 0 4]

In [24]:
train_acc = np.sum(y_train == y_train_pred, axis=0) / X_train.shape[0]
print('Training accuracy: %.2f%%' % (train_acc * 100))


Training accuracy: 93.81%

In [25]:
y_test_pred = model.predict_classes(X_test, verbose=0)
test_acc = np.sum(y_test == y_test_pred, axis=0) / X_test.shape[0]
print('Test accuracy: %.2f%%' % (test_acc * 100))


Test accuracy: 93.71%