Nuts and bolts of deep learning

Andrew Ng

It makes sense to build a AI team and a systems team, because as seen in the following trend, the state of the art is is towards larger, higher performance and complex deep learning.


In [51]:
# Simple trend visualization
from matplotlib import pylab as plt
import numpy as np
x = np.arange(0., 5., 0.2)
plt.plot(x, x/2, '-k', label='small NN')
plt.plot(x, x/3, '-r', label='trad. ML')
plt.plot(x, x/4, '-b', label='med. NN')
plt.plot(x, x/5, '-g', label='large NN')
plt.legend(loc='upper right')
plt.xlim(0, 5)
plt.ylim(0, 5)

plt.xlabel('Data')
plt.ylabel('Performance')
plt.title('Deep Learning Trend')
plt.show()


DL can generally be categorized in a few buckets

General DL

  • FC Layers

Sequence Models

  • RNN/LSTM/GRU

Image

  • 2D/3D/CNN

Other:

  • RL

Of the three, Andrew thinks that RL is the future of machine learning, while current industry centralizes around the first three buckets

End to end DL

$$review \rightarrow sentiment_{0/1}$$$$image \rightarrow object_{1..n} $$$$audio \rightarrow transcript$$$$language_1 \rightarrow language_2$$$$parameter \rightarrow image$$

Speech

Traditional: $$audio \rightarrow phonemes \rightarrow transcript$$ Deep: $$audio \rightarrow transcript$$

A failing of this is the need for huge amounts of labeled data

Images

Traditional: $$image \rightarrow bone \ lengths \rightarrow age$$ Deep: $$image \rightarrow age$$

Self-driving car

Traditional: $$image \rightarrow cars \mid pedestrians \rightarrow trajectory \rightarrow steering$$ Deep: $$image \rightarrow steering$$

With a caution that for something like driving, the dataset required for end-to-end training could be prohibitively massive

Example goal: build a human-level speech system

  • Obtain dataset, split into train/dev/test

Measure the following things:

A

  • Human-level error 1% <- Bias

  • Training set error 5% <- Bias

  • Dev set error 6%

B

  • Human-level error 1%

  • Training set error 2% <- Variance

  • Dev set error 6% <- Variance

C

  • Human-level error 1% <- Bias

  • Training set error 5% <- Variance

  • Dev set error 10% <- Variance

An example workflow

Data synthesis:

  • OCR
  • Speech Recognition
  • NLP
  • Video Games (RL)

Have a unified data warehouse