Artificial Neural Networks are a computational approach that mimics brain function: a large collection of linked neural units.
theta
In [2]:
# plot y = x-squared
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
x = np.linspace(-5,5,1000)
y = x**2
plt.plot(x,y);
In [3]:
# create our function
def f(x):
return x**2
In [4]:
# define values
epsilon = 1e-5
x = 3
In [5]:
# calculate delta y / delta x
gradient = (f(x+epsilon) - f(x-epsilon)) / (2*epsilon)
In [79]:
# compare with our known calculus solution
gradient
Out[79]:
We can use gradient descent to minimize a cost function, thereby optimizing our weights.
Multi-layer Perceptron (MLP) models in sklearn
The advantages of MLP are:
The disadvantages of MLP include:
In [63]:
# build simple neural net with sklearn: An "OR" gate
from sklearn.neural_network import MLPClassifier
X = [[0., 0.], [1., 1.], [1., 0.], [0., 1.]]
y = [0, 1, 1, 1]
clf = MLPClassifier(hidden_layer_sizes=(5,2),
solver='lbfgs',
random_state=42)
clf.fit(X,y)
Out[63]:
In [64]:
# predict new observations
clf.predict([[0,1]])
Out[64]:
In [65]:
# find parameters
print([coef.shape for coef in clf.coefs_])
clf.coefs_
Out[65]:
In [77]:
clf.predict([[2,2]])
Out[77]:
In [76]:
clf.predict([[-2,2]])
Out[76]:
In [78]:
clf.predict([[-2,-2]])
Out[78]:
Multi-layer Perceptron is sensitive to feature scaling, so it is highly recommended to scale your data.
L-BFGS converges faster and with better solutions on small datasets. For relatively large datasets, Adam is performant and robust. SGD with momentum or nesterov’s momentum, on the other hand, can perform better than those two algorithms if learning rate is correctly tuned.