In [1]:
# %load /Users/facai/Study/book_notes/preconfig.py
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
sns.set(font='SimHei', font_scale=2.5)
plt.rcParams['axes.grid'] = False
#import numpy as np
#import pandas as pd
#pd.options.display.max_rows = 20
#import sklearn
#import itertools
#import logging
#logger = logging.getLogger()
#from IPython.display import SVG
def show_image(filename, figsize=None, res_dir=True):
if figsize:
plt.figure(figsize=figsize)
if res_dir:
filename = './res/{}'.format(filename)
plt.imshow(plt.imread(filename))
In [2]:
show_image('fig10_2.png', figsize=(12, 5))
two major advantages:
unfolded graph: computing gradients
In [6]:
# A.
show_image('fig10_3.png', figsize=(10, 8))
The total loss for a given sequence of $x$ values paired with a sequence of $y$ values would be just the sum of the losses over all the time steps:
\begin{align} &L \left ( \{x^1, \cdots, x^\tau\}, \{y^1, \cdots, y^\tau\}) \right ) \\ &= \sum_t L^t \\ &= - \sum_t \log p_{\text{model}} \left ( y^t \, | \, \{x^1, \cdots, x^\tau\} \right ) \\ \end{align}So the back-propagation algorithm need $O(\tau)$ running time moving right to left through the graph, and also $O(\tau)$ memory cost to store the intermediate states. => back-propagation through time (BPTT): powerful but also expensive to train
In [2]:
# B.
show_image('fig10_4.png', figsize=(10, 8))
advantage of eliminating hidden-to-hidden recurrence: decouple all the time steps (replace predition by its ground truth in sample for $t-1$) => teacher forcing
In [4]:
show_image('fig10_6.png', figsize=(10, 8))
disadvantage: open-loop network (outputs fed back as input) => inputs are quite different between training and testing.
In [7]:
# C.
show_image('fig10_5.png', figsize=(10, 8))
In [8]:
show_image("formula_gradient.png", figsize=(12, 8))
In [9]:
show_image('fig10_7.png', figsize=(10, 8))
RNNs obtain the same full connectivity above but efficient parametrization, as illustrated below:
In [10]:
show_image('fig10_8.png', figsize=(10, 8))
determining the length of the sequence:
In [13]:
# fixed-length vector x, share R
show_image('fig10_9.png', figsize=(10, 8))