Deep Learning - Uczenie się cech

Zródła

http://colah.github.io - Świetny blog o uczeniu głębokim

Blog nividii (dużo ciekawych rzeczy):

Tłumaczenie neuronowe

Deeplearning in a Nutshell:

http://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
http://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/
Trzecia częśc nadejdzie

GPU Gems (o GPU w ogóle)

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter45.html

YouTube

Artykuły

Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada, http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

Czegoś nie ma?

Inżyniera cech vs. Uczenie się cech

Pierwsza głęboka sieć neuronowa

Ivakhnenko and Lapa (1965): Warstwy trenowane po kolei, algorytm propagacji wsteczniej jeszcze nie istniał.

Propagcja wsteczna błędów (BP)

Wczesna niekompletna wersja BP w latach 60-tych
Linnainmaa 1970 (praca magisterska!):
- Pierwsza nowoczesna wersja
- Implementacja w FORTRANie
- Nie wspomniano wtedy o sieciach neuronowych
Rumelhart, Hinton, Williams (1985): Pierwsze wyniki dla BP i sieci neuronowych
LeCunn 1989 (Bell Labs): Pierwsze "sensowne" zastosowanie BP do sieci neuronowych

Yann LeCun - LeNet

Y. LeCun, L. D. Jackel, B. Boser, J. S. Denker, H. P. Graf, I. Guyon, D. Henderson, R. E. Howard, and W. Hubbard. Handwritten digit recognition: Applications of neural net chips and automatic learning. IEEE Communication, pages 41-46, November 1989. invited paper.

Cortes and Vapnik (1995): Rozwój SVM przykrywa sieci neuronowe

Problem znikających gradientów (Vanishing Gradient Problem)

W głębokich sieciach neuronowych wagi były ograniczone do $[0,1]$ lub $[-1,1]$.
Mnożenie wielu małych wartości (tak się dzieje w BP w przypadku wielu warstw) prowadzi do zanikania wartości.

Rozwiązania:

Nowe algorytmy optimalizacji (Rprop, RMSprop)
RELU
Dropout
Lepsza incjalizacja wag
LSTM (Long short term memory)
...

GPU: Zwiększenie mocy obliczeniowej

ImageNet 2012:

http://image-net.org/challenges/LSVRC/2012/ilsvrc2012.pdf

ImageNet: Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (2012)

Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada

Model	Top-1	Top-5
Sparse Coding	47.1%	28.2%
SIFT+FVs	45.7%	25.7%
CNN	37.5%	17.0%

Sieci Konwolucyjne

MNIST - Sieci Feedforward w Keras (Powtórka)



In [21]:

    
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils


batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between tran and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

rms = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=rms)

model.fit(X_train, Y_train,
          batch_size=batch_size, nb_epoch=nb_epoch,
          show_accuracy=True, verbose=2,
          validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test,
                       show_accuracy=True, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])









    



60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
4s - loss: 0.2745 - acc: 0.9164 - val_loss: 0.1153 - val_acc: 0.9645
Epoch 2/20
4s - loss: 0.1135 - acc: 0.9659 - val_loss: 0.0928 - val_acc: 0.9708
Epoch 3/20
4s - loss: 0.0796 - acc: 0.9749 - val_loss: 0.0864 - val_acc: 0.9748
Epoch 4/20
4s - loss: 0.0631 - acc: 0.9802 - val_loss: 0.0650 - val_acc: 0.9796
Epoch 5/20
4s - loss: 0.0501 - acc: 0.9844 - val_loss: 0.0736 - val_acc: 0.9777
Epoch 6/20
4s - loss: 0.0411 - acc: 0.9868 - val_loss: 0.0585 - val_acc: 0.9827
Epoch 7/20
4s - loss: 0.0339 - acc: 0.9889 - val_loss: 0.0663 - val_acc: 0.9820
Epoch 8/20
4s - loss: 0.0302 - acc: 0.9902 - val_loss: 0.0626 - val_acc: 0.9823
Epoch 9/20
4s - loss: 0.0260 - acc: 0.9918 - val_loss: 0.0653 - val_acc: 0.9822
Epoch 10/20
4s - loss: 0.0211 - acc: 0.9935 - val_loss: 0.0620 - val_acc: 0.9820
Epoch 11/20
4s - loss: 0.0204 - acc: 0.9934 - val_loss: 0.0568 - val_acc: 0.9847
Epoch 12/20
4s - loss: 0.0177 - acc: 0.9944 - val_loss: 0.0727 - val_acc: 0.9819
Epoch 13/20
4s - loss: 0.0163 - acc: 0.9948 - val_loss: 0.0562 - val_acc: 0.9853
Epoch 14/20
4s - loss: 0.0126 - acc: 0.9959 - val_loss: 0.0620 - val_acc: 0.9851
Epoch 15/20
4s - loss: 0.0125 - acc: 0.9957 - val_loss: 0.0618 - val_acc: 0.9849
Epoch 16/20
4s - loss: 0.0131 - acc: 0.9956 - val_loss: 0.0680 - val_acc: 0.9832
Epoch 17/20
4s - loss: 0.0098 - acc: 0.9965 - val_loss: 0.0682 - val_acc: 0.9849
Epoch 18/20
4s - loss: 0.0098 - acc: 0.9964 - val_loss: 0.0664 - val_acc: 0.9854
Epoch 19/20
4s - loss: 0.0092 - acc: 0.9970 - val_loss: 0.0637 - val_acc: 0.9834
Epoch 20/20
4s - loss: 0.0090 - acc: 0.9971 - val_loss: 0.0685 - val_acc: 0.9855
Test score: 0.0684883500228
Test accuracy: 0.9855

Konwolucje w sieciach neuronowych (1D)

Warstwa konwolucyjna w Keras (1D)



In [11]:

    
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.embeddings import Embedding
from keras.layers.convolutional import Convolution1D, MaxPooling1D

model = Sequential()
model.add(Embedding(5000, 100, input_length=100))
model.add(Dropout(0.25))

model.add(Convolution1D(nb_filter=250,
                        filter_length=3,
                        border_mode='valid',
                        activation='relu',
                        subsample_length=1))
model.add(MaxPooling1D(pool_length=2))



In [ ]:

    
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.embeddings import Embedding
from keras.layers.convolutional import Convolution1D, MaxPooling1D
from keras.datasets import imdb


# set parameters:
max_features = 5000
maxlen = 100
batch_size = 32
embedding_dims = 100
nb_filter = 250
filter_length = 3
hidden_dims = 250
nb_epoch = 2

print('Loading data...')
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features,
                                                      test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()

# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
model.add(Dropout(0.25))

# we add a Convolution1D, which will learn nb_filter
# word group filters of size filter_length:
model.add(Convolution1D(nb_filter=nb_filter,
                        filter_length=filter_length,
                        border_mode='valid',
                        activation='relu',
                        subsample_length=1))
# we use standard max pooling (halving the output of the previous layer):
model.add(MaxPooling1D(pool_length=2))

# We flatten the output of the conv layer,
# so that we can add a vanilla dense layer:
model.add(Flatten())

# We add a vanilla hidden layer:
model.add(Dense(hidden_dims))
model.add(Dropout(0.25))
model.add(Activation('relu'))

# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              class_mode='binary')
model.fit(X_train, y_train, batch_size=batch_size,
          nb_epoch=nb_epoch, show_accuracy=True,
          validation_data=(X_test, y_test))

Konwolucje w przetwarzaniu obrazów

Konwolucje 2D formalnie

$$ \left[\begin{array}{ccc} a & b & c\\ d & e & f\\ g & h & i\\ \end{array}\right] * \left[\begin{array}{ccc} 1 & 2 & 3\\ 3 & 4 & 5\\ 6 & 7 & 8\\ \end{array}\right] =\\ (1 \cdot i)+(2\cdot h)+(3\cdot g)+(4 \cdot f)+(5\cdot e)\\+(6\cdot d)+(7\cdot c)+(8\cdot b)+(9\cdot a) $$

Więcej: https://en.wikipedia.org/wiki/Kernel_(image_processing)

Konwolucje w sieciach neuronowych (2D)

Struktura jednostki konwolucyjnej

Więcej na: http://colah.github.io/posts/2014-07-Conv-Nets-Modular

ImageNet: Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton (2012)

Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada

Model	Top-1	Top-5
Sparse Coding	47.1%	28.2%
SIFT+FVs	45.7%	25.7%
CNN	37.5%	17.0%

Warstwa konwolucyjna i max-pooling w Keras (2D)



In [9]:

    
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D

model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='valid', input_shape=(1, 28, 28)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

MNIST - Sieci konwolucyjne w Keras



In [22]:

    
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.utils import np_utils

batch_size = 128
nb_classes = 10
nb_epoch = 12

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
nb_pool = 2
# convolution kernel size
nb_conv = 3

# the data, shuffled and split between tran and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
                        border_mode='valid',
                        input_shape=(1, img_rows, img_cols)))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, nb_conv, nb_conv))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='Adadelta')

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          show_accuracy=True, verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])









    



X_train shape: (60000, 1, 28, 28)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 9s - loss: 0.2332 - acc: 0.9285 - val_loss: 0.0558 - val_acc: 0.9832
Epoch 2/12
60000/60000 [==============================] - 8s - loss: 0.0871 - acc: 0.9738 - val_loss: 0.0408 - val_acc: 0.9860
Epoch 3/12
60000/60000 [==============================] - 9s - loss: 0.0675 - acc: 0.9802 - val_loss: 0.0329 - val_acc: 0.9890
Epoch 4/12
60000/60000 [==============================] - 9s - loss: 0.0537 - acc: 0.9835 - val_loss: 0.0316 - val_acc: 0.9892
Epoch 5/12
60000/60000 [==============================] - 9s - loss: 0.0483 - acc: 0.9851 - val_loss: 0.0313 - val_acc: 0.9895
Epoch 6/12
60000/60000 [==============================] - 9s - loss: 0.0406 - acc: 0.9870 - val_loss: 0.0330 - val_acc: 0.9886
Epoch 7/12
60000/60000 [==============================] - 9s - loss: 0.0388 - acc: 0.9880 - val_loss: 0.0282 - val_acc: 0.9913
Epoch 8/12
60000/60000 [==============================] - 9s - loss: 0.0344 - acc: 0.9892 - val_loss: 0.0282 - val_acc: 0.9905
Epoch 9/12
60000/60000 [==============================] - 8s - loss: 0.0288 - acc: 0.9911 - val_loss: 0.0264 - val_acc: 0.9924
Epoch 10/12
60000/60000 [==============================] - 9s - loss: 0.0290 - acc: 0.9910 - val_loss: 0.0268 - val_acc: 0.9920
Epoch 11/12
60000/60000 [==============================] - 8s - loss: 0.0281 - acc: 0.9910 - val_loss: 0.0264 - val_acc: 0.9923
Epoch 12/12
60000/60000 [==============================] - 9s - loss: 0.0249 - acc: 0.9915 - val_loss: 0.0324 - val_acc: 0.9917
Test score: 0.0323923342442
Test accuracy: 0.9917

RNN i LSTM - Przetwarzanie sekwencji

Zależności długodystansowe (Long-distance dependencies)

LSTM: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Generowanie Tekstu (litera po literze)

Generator ''Szekspira''

PANDARUS:
Alas, I think he shall be come approached and the day
When little srain would be attain'd into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.

Second Senator:
They are away this miseries, produced upon my soul,
Breaking and strongly should be buried, when I perish
The earth and thoughts of many states.

DUKE VINCENTIO:
Well, your wit is in the care of side and that.

Second Lord:
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I'll have the heart of the wars.

Tłumaczenie neuronowe

NMT

Generowanie opisów

Schemat sieci



In [15]:

    
from IPython.display import YouTubeVideo
YouTubeVideo("8BFzu9m52sc")









    Out[15]: