It makes sense to build a AI team and a systems team, because as seen in the following trend, the state of the art is is towards larger, higher performance and complex deep learning.
In [51]:
# Simple trend visualization
from matplotlib import pylab as plt
import numpy as np
x = np.arange(0., 5., 0.2)
plt.plot(x, x/2, '-k', label='small NN')
plt.plot(x, x/3, '-r', label='trad. ML')
plt.plot(x, x/4, '-b', label='med. NN')
plt.plot(x, x/5, '-g', label='large NN')
plt.legend(loc='upper right')
plt.xlim(0, 5)
plt.ylim(0, 5)
plt.xlabel('Data')
plt.ylabel('Performance')
plt.title('Deep Learning Trend')
plt.show()
DL can generally be categorized in a few buckets
Of the three, Andrew thinks that RL is the future of machine learning, while current industry centralizes around the first three buckets
End to end DL
$$review \rightarrow sentiment_{0/1}$$$$image \rightarrow object_{1..n} $$$$audio \rightarrow transcript$$$$language_1 \rightarrow language_2$$$$parameter \rightarrow image$$Speech
Traditional: $$audio \rightarrow phonemes \rightarrow transcript$$ Deep: $$audio \rightarrow transcript$$
A failing of this is the need for huge amounts of labeled data
Images
Traditional: $$image \rightarrow bone \ lengths \rightarrow age$$ Deep: $$image \rightarrow age$$
Self-driving car
Traditional: $$image \rightarrow cars \mid pedestrians \rightarrow trajectory \rightarrow steering$$ Deep: $$image \rightarrow steering$$
With a caution that for something like driving, the dataset required for end-to-end training could be prohibitively massive
Measure the following things:
A
Human-level error 1% <- Bias
Training set error 5% <- Bias
Dev set error 6%
B
Human-level error 1%
Training set error 2% <- Variance
Dev set error 6% <- Variance
C
Human-level error 1% <- Bias
Training set error 5% <- Variance
Dev set error 10% <- Variance
Data synthesis:
Have a unified data warehouse