How a regression network is traditionally trained

This network is trained using a data set $D = ({{\bf x}^{(n)}, {\bf t}^{(n)}})$ by adjusting ${\bf w}$ so as to minimize an error function, e.g.,

$$ E_D({\bf w}) = \sum_n\sum_i (y_i({\bf x}^{(n)};{\bf w}) - t_i^{(n)})^2 $$

This objective function is a sum of terms, one for each input/target pair $\{ {\bf x}, {\bf t} \}$, measuring how close the output ${\bf y}({\bf x}; {\bf w})$ is to the target ${\bf t}$:

$$ E_D({\bf w}) = \sum_n E_{\bf x}^{(n)}, \quad E_{\bf x}^{(n)}=\sum_i (y_i({\bf x}^{(n)};{\bf w}) - t_i^{(n)})^2 $$

This minimization is based on repeated evaluation of the gradient of $E_D$. This gradient can be efficiently computed using the backpropagation algorithm which uses the chain rule to find the derivatives, as we discuss below.

Often, regularization (also known as weight decay) is included, modifying the objective function to:

$$ M({\bf w})=\alpha E_D({\bf w}) + \beta E_W({\bf w}), $$

where $E_W = \frac{1}{2}\sum_i w_i^2$.

Gradient descent

(From Wikipedia) Cool animations at http://www.benfrederickson.com/numerical-optimization/

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is based on the observation that if the multi-variable function $ F(\mathbf {x} )$ is defined and differentiable in a neighborhood of a point $ \mathbf {a}$ , then $ F(\mathbf {x} )$ decreases fastest if one goes from $ \mathbf {a}$ in the direction of the negative gradient of $F$ at $ \mathbf {a}$ , $ -\nabla F(\mathbf {a} )$. It follows that, if

$$\mathbf {a} _{n+1}=\mathbf {a} _{n}-\eta \nabla F(\mathbf {a} _{n})$$

for $\eta$ small enough, then $F(\mathbf {a_{n}} )\geq F(\mathbf {a_{n+1}} )$. In other words, the term $\eta \nabla F(\mathbf {a} )$ is subtracted from $ \mathbf {a}$ because we want to move against the gradient, namely down toward the minimum. With this observation in mind, one starts with a guess $\mathbf {x} _{0}$ for a local minimum of $F$, and considers the sequence $\mathbf {x} _{0},\mathbf {x} _{1},\mathbf {x} _{2},\dots$ such that

$${x} _{n+1}=\mathbf {x} _{n}-\gamma _{n}\nabla F(\mathbf {x} _{n}),\ n\geq 0.$$

We have

$F(\mathbf {x} _{0})\geq F(\mathbf {x} _{1})\geq F(\mathbf {x} _{2})\geq \cdots$ , so hopefully the sequence $(\mathbf {x} _{n})$ converges to the desired local minimum. Note that the value of the step size $\eta$ is allowed to change at every iteration.

This process is illustrated in the adjacent picture. Here $F$ is assumed to be defined on the plane, and that its graph has a bowl shape. The blue curves are the contour lines, that is, the regions on which the value of $F$ is constant. A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is orthogonal to the contour line going through that point. We see that gradient descent leads us to the bottom of the bowl, that is, to the point where the value of the function $F$ is minimal.

Illustration of the gradient descept procedure on a series of iterations down a bowl shaped surface

The "Zig-Zagging" nature of the method is also evident below, where the gradient descent method is applied to $$F(x,y)=\sin \left({\frac {1}{2}}x^{2}-{\frac {1}{4}}y^{2}+3\right)\cos(2x+1-e^{y})$$



In [4]:

    
%matplotlib inline
from matplotlib import pyplot
pyplot.rcParams['image.cmap'] = 'jet'
import numpy as np

x0 = -1.4
y0 = 0.5
x = [x0] # The algorithm starts at x0, y0
y = [y0] 

eta = 0.1 # step size multiplier
precision = 0.00001

def f(x,y):
    f1 = x**2/2-y**2/4+3
    f2 = 2*x+1-np.exp(y)
    return np.sin(f1)*np.cos(f2)

def gradf(x,y):
    f1 = x**2/2-y**2/4+3
    f2 = 2*x+1-np.exp(y)
    dx = np.cos(f1)*np.cos(f2)*x-np.sin(f1)*np.sin(f2)*2.
    dy = np.cos(f1)*np.cos(f2)*(-y/2.)-np.sin(f1)*np.sin(f2)*(-np.exp(y))
    return (dx,dy)

err = 100.
while err > precision:
    (step_x, step_y) = gradf(x0, y0)
    x0 -= eta*step_x
    y0 -= eta*step_y
    x.append(x0)
    y.append(y0)
    err = eta*(abs(step_x)+abs(step_y))


print(x0,y0)

#### All this below is just to visualize the process
dx = 0.05
dy = 0.05
xx = np.arange(-1.5, 1.+dx, dx)
yy = np.arange(0., 2.+dy, dy)
V = np.zeros(shape=(len(yy),len(xx)))

for iy in range(0,len(yy)):
    for ix in range(0,len(xx)):
        V[iy,ix] = f(xx[ix],yy[iy])

X, Y = np.meshgrid(xx, yy)
pyplot.contour(X, Y, V)

#pyplot.plot(x,y,linestyle='--', lw=3);
pyplot.scatter(x,y);

pyplot.ylabel("y")
pyplot.xlabel("x");









    



0.3226478037930326 1.602369170618785

Stochastic gradient descent (SGD)

Stochastic gradient descent (often shortened to SGD), also known as incremental gradient descent, is a stochastic approximation of the gradient descent optimization and iterative method for minimizing an objective function that is written as a sum of differentiable functions.

There are a number of challenges in applying the gradient descent rule. To understand what the problem is, let's look back at the quadratic cost $E_D$. Notice that this cost function has the form $E=\sum_n E_{\bf x}^{(n)}$ In practice, to compute the gradient $\nabla E_D$ we need to compute the gradients $\nabla E_{\bf x}^{(n)}$ separately for each training input, ${\bf x^{(n)}}$ and then average them. . Unfortunately, when the number of training inputs is very large this can take a long time, and learning thus occurs slowly.

Stochastic gradient descent can be used to speed up learning. The idea is to estimate the gradient $\nabla E$ by computing $\nabla E_{\bf x}$ for a small sample of randomly chosen training inputs. By averaging over this small sample it turns out that we can quickly get a good estimate of the true gradient.

To connect this explicitly to learning in neural networks, suppose $w_k$ and $b_l$ denote the weights and biases in our neural network. Then stochastic gradient descent works by picking out a randomly chosen mini-batch of training inputs, and training with those, $$ w_k \rightarrow w_k - \eta \sum_{j=1}^m \frac{\partial{E_{\bf x}^{(j)}}}{\partial w_k} $$

$$ b_l \rightarrow b_l - \eta \sum_{j=1}^m \frac{\partial{E_{\bf x}^{(j)}}}{\partial b_l} $$

where the sums are over all the training examples in the current mini-batch. Then we pick out another randomly chosen mini-batch and train with those. And so on, until we have exhausted the training inputs, which is said to complete an epoch of training. At that point we start over with a new training epoch.

The pseudocode would look like:

Choose an initial vector of parameters $w$ and learning rate $\eta$.

Repeat until an approximate minimum is obtained:

    Randomly shuffle examples in the training set.
    For i=1,2,...,n , do:

$\quad \quad \quad \quad \quad w:=w-\eta \nabla E_{i}(w).$

Example: linear regression

As seen previously, the objective function to be minimized is:

$$ \begin{aligned} E(w)=\sum _{i=1}^{n}E_{i}(w)=\sum _{i=1}^{n}\left(w_{1}+w_{2}x_{i}-y_{i}\right)^{2}. \end{aligned} $$

And the gradent descent equations can be written in matrix form as:

$$ \begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}:={\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}-\eta {\begin{bmatrix}2(w_{1}+w_{2}x_{i}-y_{i})\\2x_{i}(w_{1}+w_{2}x_{i}-y_{i})\end{bmatrix}}. $$

We'll generate a series of 100 random points aligned more or less along the line $y=a+bx$ with $a=1$ and $b=2$



In [5]:

    
%matplotlib inline
from matplotlib import pyplot
import numpy as np

a = 1
b = 2
num_points = 100
np.random.seed(637163) # we make sure we always generate the same sequence
x_data = np.random.rand(num_points)*20.
y_data = x_data*b+a+3*(2.*np.random.rand(num_points)-1)

pyplot.scatter(x_data,y_data)
pyplot.plot(x_data, b*x_data+a)

#### Least squares fit
sum_x = np.sum(x_data)
sum_y = np.sum(y_data)
sum_x2 = np.sum(x_data**2)
sum_xy = np.sum(x_data*y_data)
det = num_points*sum_x2-sum_x**2
fit_a = (sum_y*sum_x2-sum_x*sum_xy)/det
fit_b = (num_points*sum_xy-sum_x*sum_y)/det
print(fit_a,fit_b)

pyplot.xlim(-1,22)
pyplot.ylim(-1,24)
pyplot.plot(x_data, fit_b*x_data+fit_a);









    



1.1637760980701564 2.001777141438794

We now write an SGD code for this problem. The training_data is a list of tuples (x, y) representing the training inputs and corresponding desired outputs. The variables epochs and mini_batch_size are what you'd expect - the number of epochs to train for, and the size of the mini-batches to use when sampling. eta is the learning rate, $\eta$. If the optional argument test_data is supplied, then the program will evaluate the network after each epoch of training, and print out partial progress. This is useful for tracking progress, but slows things down substantially.

The code works as follows. In each epoch, it starts by randomly shuffling the training data, and then partitions it into mini-batches of the appropriate size. This is an easy way of sampling randomly from the training data. Then for each mini_batch we apply a single step of gradient descent. This is done by the code self.update_mini_batch(mini_batch, eta), which updates the coefficients according to a single iteration of gradient descent, using just the training data in mini_batch.



In [8]:

    
epochs = 1000
mini_batch_size = 10
eta = 0.01/mini_batch_size

a = 3.
b = 3.
def update_mini_batch(mini_batch, eta):
    global a, b
    a0 = a
    b0 = b
    for x, y, in mini_batch:
        e = eta*(a0+b0*x-y)
        a -= e
        b -= x*e
    
training_data = list(zip(x_data,y_data))
for j in range(epochs):
    np.random.shuffle(training_data)
    mini_batches = [training_data[k:k+mini_batch_size]
                    for k in range(0, len(training_data), mini_batch_size)]
    for mini_batch in mini_batches:
        update_mini_batch(mini_batch, eta)
    print ("Epoch {0}: {1} {2}".format(j,a,b))









    



Epoch 0: 2.851091506514999 1.7876531781856742
Epoch 1: 2.827178748563592 1.963573397229449
Epoch 2: 2.7801714492150196 1.8619634110879069
Epoch 3: 2.756668947213542 2.0288955459781213
Epoch 4: 2.7076355088367525 1.880872788736252
Epoch 5: 2.6709938639494206 1.9094861291196814
Epoch 6: 2.6459565191117687 2.083968448182042
Epoch 7: 2.592571941252409 1.867303977943171
Epoch 8: 2.5668650012203917 1.9282486257504263
Epoch 9: 2.5260855862259803 1.8254093290530535
Epoch 10: 2.4977119308055653 1.8393082159024072
Epoch 11: 2.461504007375818 1.8187446556619598
Epoch 12: 2.440510503849123 1.9639665902712513
Epoch 13: 2.4192661745417667 2.056408083610859
Epoch 14: 2.3778520184589746 1.9200785689241224
Epoch 15: 2.352636754920605 1.9557702061187
Epoch 16: 2.32668522997869 1.985365726392641
Epoch 17: 2.29093524056995 1.9302755099169442
Epoch 18: 2.264373631096743 1.9014850942257417
Epoch 19: 2.240116842945883 1.935758005663645
Epoch 20: 2.2192529930462856 2.0084713102139746
Epoch 21: 2.1882155215215175 1.954607851080249
Epoch 22: 2.1610945265017794 1.871860736902566
Epoch 23: 2.1460869019987685 2.0191055043901693
Epoch 24: 2.1134743012209274 1.917120770897182
Epoch 25: 2.093876809726992 1.9695147135445203
Epoch 26: 2.0732373408257976 2.009659333749805
Epoch 27: 2.0545056929658063 1.9990817647672674
Epoch 28: 2.0358849885566888 2.0684880037349314
Epoch 29: 2.0127961055519528 1.981549450254667
Epoch 30: 1.9912110003344137 1.9953084240830377
Epoch 31: 1.9702356965968113 1.9337881228020706
Epoch 32: 1.958931435902377 1.9703136538892496
Epoch 33: 1.9320635446309764 1.8826305504717291
Epoch 34: 1.9183934975941435 1.9108781471703922
Epoch 35: 1.9002148717519194 1.933657775282379
Epoch 36: 1.8765188176622247 1.8501242260953568
Epoch 37: 1.8636240317182582 1.8732118340436419
Epoch 38: 1.8580542540860803 2.043103451641688
Epoch 39: 1.8404926684574494 1.9833198887817372
Epoch 40: 1.818471335258018 1.885853165755979
Epoch 41: 1.8104060287044144 1.946463310907274
Epoch 42: 1.7945224315136403 1.955383791204891
Epoch 43: 1.782283463654412 2.0023405102821905
Epoch 44: 1.749520717458214 1.8281773503780525
Epoch 45: 1.7453813582714632 2.003640838074842
Epoch 46: 1.728623006260412 1.9766876117635683
Epoch 47: 1.7084093749222484 1.934436123013543
Epoch 48: 1.6882385559184097 1.8316669036210635
Epoch 49: 1.6872347550750442 1.9027956243651138
Epoch 50: 1.6799350406816318 2.0067968405782595
Epoch 51: 1.6544625765529073 1.8881985255370595
Epoch 52: 1.6437316979583543 1.9255046345948543
Epoch 53: 1.6381622976293797 2.0030543868483233
Epoch 54: 1.619288123392432 1.9054952135108394
Epoch 55: 1.6081107386909623 1.8842249840685168
Epoch 56: 1.606385744453914 1.995354114039655
Epoch 57: 1.590951010569201 1.935539287963154
Epoch 58: 1.5907159977092078 2.0039496119285936
Epoch 59: 1.583681348958318 2.0712979889802097
Epoch 60: 1.5630265005446518 1.9298084540897775
Epoch 61: 1.564720140787769 2.0808310251200517
Epoch 62: 1.5466686232306155 1.9187231079896427
Epoch 63: 1.5388524762367308 1.9621349285397989
Epoch 64: 1.5405490598559062 2.0291358831335318
Epoch 65: 1.5208453821228052 1.8248931026367041
Epoch 66: 1.5213798489903287 1.8930706301200002
Epoch 67: 1.529153914875216 2.1490931535209667
Epoch 68: 1.5022716493694421 1.9262311478789624
Epoch 69: 1.500068778772673 2.0144696446422423
Epoch 70: 1.4895292591761131 1.9988696565984023
Epoch 71: 1.4750010294727158 1.9334703018634691
Epoch 72: 1.468734739081002 1.9509592360886463
Epoch 73: 1.4602541491491496 1.9535588676349709
Epoch 74: 1.4629985193934691 2.087981232637853
Epoch 75: 1.4482903637054756 1.9769627087994208
Epoch 76: 1.4441169372446487 1.9911884358343
Epoch 77: 1.4374914169928699 1.9697513594833118
Epoch 78: 1.4290881203003065 1.9810815298729503
Epoch 79: 1.4246361485906045 2.021522320675344
Epoch 80: 1.416276663539712 1.997969411686069
Epoch 81: 1.4088098511794314 1.9930406863592567
Epoch 82: 1.4068185837259672 2.0164782110987383
Epoch 83: 1.3962324622369873 1.8977840351410191
Epoch 84: 1.401436092664027 1.9950621893919915
Epoch 85: 1.387471145834617 1.8824887521430227
Epoch 86: 1.4014093218694679 2.1053832817280766
Epoch 87: 1.3846125002362668 1.961879809671709
Epoch 88: 1.3803176460734246 1.9693072109387666
Epoch 89: 1.3733167787595022 1.9193370938253795
Epoch 90: 1.3719395040783267 1.9529063749285855
Epoch 91: 1.3657851409032489 1.9519469364018414
Epoch 92: 1.365731695836303 2.0299060306472323
Epoch 93: 1.352704037805701 1.9068562249348497
Epoch 94: 1.358872635570531 1.9505330043142486
Epoch 95: 1.352406342772186 1.924937800945274
Epoch 96: 1.3529921471002395 1.9582904963736072
Epoch 97: 1.3467058766687356 1.964421345479529
Epoch 98: 1.3441056012495736 1.9585772301210544
Epoch 99: 1.338307075761737 1.9641242330382802
Epoch 100: 1.3389750343328595 2.0331089097262622
Epoch 101: 1.3292056062449498 1.970316569830166
Epoch 102: 1.3286221627701793 2.0548925899273556
Epoch 103: 1.3249933581558049 2.0102278112773337
Epoch 104: 1.3141363246433797 1.9167494489192494
Epoch 105: 1.3112904405797088 1.9230166472931127
Epoch 106: 1.3155463891174177 2.0363423473576034
Epoch 107: 1.3066507352411303 2.002921226309002
Epoch 108: 1.3012267385619987 1.992156333232618
Epoch 109: 1.2998811762793303 2.0836475234636054
Epoch 110: 1.2869768488136015 2.0326476069467847
Epoch 111: 1.2782409042449163 1.955308271763259
Epoch 112: 1.2702227407769626 1.843641680619616
Epoch 113: 1.2841364346903865 2.0428258403939537
Epoch 114: 1.2799771181355302 2.027691157552291
Epoch 115: 1.272749198706809 1.9838436304512466
Epoch 116: 1.2751164588056392 2.048106822415394
Epoch 117: 1.2709851254712097 1.9917962774665723
Epoch 118: 1.2762427149034599 2.136723976667836
Epoch 119: 1.2614286952500717 1.9537916421283819
Epoch 120: 1.268051811501334 2.003348786243271
Epoch 121: 1.2705624559612398 2.05644114955335
Epoch 122: 1.2488428579582682 1.824502349600589
Epoch 123: 1.2609746890519609 2.046167589066578
Epoch 124: 1.2649024819516497 2.073455837171682
Epoch 125: 1.2553435081447994 1.9933812628787015
Epoch 126: 1.2598887147760574 2.0313796669882813
Epoch 127: 1.2599147404360962 2.062627907835911
Epoch 128: 1.249692892431034 1.975233671612488
Epoch 129: 1.2442021578463776 1.9552628960704093
Epoch 130: 1.2386105150995876 1.982099666924853
Epoch 131: 1.2355586187412317 1.9757114294223843
Epoch 132: 1.2443294895643744 2.101623933536461
Epoch 133: 1.2299004597831222 1.9768187344562476
Epoch 134: 1.2276038542035699 1.9822626470879492
Epoch 135: 1.2277446415672508 1.9615810105868339
Epoch 136: 1.2286379676808046 2.0355041539229313
Epoch 137: 1.2211714193515004 1.9701469343010707
Epoch 138: 1.2298287656838378 2.024296033663489
Epoch 139: 1.227733776450261 2.035072556742687
Epoch 140: 1.2188039214924324 1.9586362914585522
Epoch 141: 1.221958994511311 2.018941730208499
Epoch 142: 1.2157233735073354 1.9671959386100284
Epoch 143: 1.2124366156263169 1.9592948983993996
Epoch 144: 1.2147369220284092 2.0442536562207776
Epoch 145: 1.205955712341693 1.9694423842214128
Epoch 146: 1.2049971023807025 1.9859694713761042
Epoch 147: 1.2075221330905666 2.009002905561291
Epoch 148: 1.2115221193191605 2.059401540299177
Epoch 149: 1.2025709919887162 1.982096858804897
Epoch 150: 1.2025820405123533 1.9921201608490595
Epoch 151: 1.2044366839866598 2.0146807988967903
Epoch 152: 1.206892541459285 2.037165558406217
Epoch 153: 1.2084186753224047 2.0514455070803828
Epoch 154: 1.2089888077827122 2.051536688143954
Epoch 155: 1.2076227373011457 1.9988099365677854
Epoch 156: 1.2037083483319981 1.9950064987376799
Epoch 157: 1.206526240140426 2.074083444964129
Epoch 158: 1.1969956856897452 1.9783641111066796
Epoch 159: 1.1988308898232631 1.9543139803979717
Epoch 160: 1.1995691446186438 1.937559081054288
Epoch 161: 1.2038438870466412 2.015030834060468
Epoch 162: 1.1985588246541923 1.9698469983800948
Epoch 163: 1.1958191873606725 1.9663497539993224
Epoch 164: 1.186928424021493 1.86455626410724
Epoch 165: 1.1928061814986093 2.0497622851939536
Epoch 166: 1.1865071366714997 2.0408127489658647
Epoch 167: 1.182851618741269 2.0110138481333046
Epoch 168: 1.18782422094513 2.0552683461751196
Epoch 169: 1.1897908718277397 2.0792826146138967
Epoch 170: 1.1895586261808533 2.020543573678281
Epoch 171: 1.1825696897860893 1.9801100085432106
Epoch 172: 1.181820009229163 1.9799370528048312
Epoch 173: 1.1868381738615787 2.0495345787064054
Epoch 174: 1.1839766393778706 2.0473350252553804
Epoch 175: 1.1850485909968755 2.0716526425398674
Epoch 176: 1.1824535090862134 2.011224705734455
Epoch 177: 1.180767591043016 2.004658322219915
Epoch 178: 1.1821388960311063 1.988800626421598
Epoch 179: 1.1880522464685088 2.067303807558048
Epoch 180: 1.1759645896758149 1.9546667268531908
Epoch 181: 1.1790533790141011 1.9842002311282236
Epoch 182: 1.1836537100256175 2.0460500757626106
Epoch 183: 1.1822863317308494 2.025085525889144
Epoch 184: 1.172530841497514 1.9242741669889663
Epoch 185: 1.1823675397686417 2.0441537662254428
Epoch 186: 1.177468047390862 1.9588840402287566
Epoch 187: 1.1801243667511367 1.970880411546426
Epoch 188: 1.1867122725568682 2.071167968819989
Epoch 189: 1.175747819017282 1.9120770900835262
Epoch 190: 1.178488236180694 1.9711391056919325
Epoch 191: 1.1834159610241293 2.076521736901644
Epoch 192: 1.1840720138080807 2.0695169723793168
Epoch 193: 1.1878662150844443 2.1054361965492254
Epoch 194: 1.172204080715583 1.9193349078207402
Epoch 195: 1.1796266578616825 2.021273557852105
Epoch 196: 1.1756283433183137 1.978397476038657
Epoch 197: 1.179347870801012 2.057485247243262
Epoch 198: 1.1771510299609378 1.9839566352038045
Epoch 199: 1.1734142144784492 1.9413638781685132
Epoch 200: 1.1796858491714215 2.0257042728471295
Epoch 201: 1.1828027565096388 2.073049797618982
Epoch 202: 1.1836229294305753 2.105296552012404
Epoch 203: 1.1748330785870187 1.97842429934267
Epoch 204: 1.1710094849765988 1.9225888313967678
Epoch 205: 1.1750019318878657 2.0161089101204266
Epoch 206: 1.1688822960668843 1.9173751852467291
Epoch 207: 1.171000593729568 1.9496922473163611
Epoch 208: 1.1719395299814825 1.9846125819542637
Epoch 209: 1.1653718472989494 1.9657401115851765
Epoch 210: 1.1666777708673137 1.9969331887938948
Epoch 211: 1.1647159687365278 1.9909138128673325
Epoch 212: 1.1754452887943576 2.0857264475141384
Epoch 213: 1.169688967309767 2.054910054131244
Epoch 214: 1.169127580623336 2.0601206215450856
Epoch 215: 1.1652074626875797 1.9973034692573273
Epoch 216: 1.1702852193446573 2.0519593792424358
Epoch 217: 1.1646162669879867 1.968465416447997
Epoch 218: 1.1707704669426298 2.0249056512171877
Epoch 219: 1.1724593556038376 2.082380488563087
Epoch 220: 1.1600188422806863 1.9533301129439185
Epoch 221: 1.1603990597387488 1.9908556837124307
Epoch 222: 1.172306977616543 2.149832223460105
Epoch 223: 1.1533139870665958 1.897368373790175
Epoch 224: 1.1603077278460978 1.9740605449537711
Epoch 225: 1.1610865668922927 2.0156478512874445
Epoch 226: 1.1598149090178131 2.0062030866511695
Epoch 227: 1.1629546106433142 2.010542546698093
Epoch 228: 1.1636366920994934 2.0645587753976815
Epoch 229: 1.158441789962946 1.9732275213983936
Epoch 230: 1.1594068941620521 2.0211588360306543
Epoch 231: 1.157992091904049 1.9899786212760588
Epoch 232: 1.1586025563761901 1.9669302651828673
Epoch 233: 1.158935020774732 1.9624153218725968
Epoch 234: 1.1780146089004881 2.2084053499839253
Epoch 235: 1.1624086023948017 1.9904183594049172
Epoch 236: 1.1687114571640016 2.0243325338089573
Epoch 237: 1.163780074230273 2.0115919918004996
Epoch 238: 1.1593855584185195 1.9610645305963619
Epoch 239: 1.1557913057904843 1.9313205589116578
Epoch 240: 1.1604861873712111 2.0147368305307953
Epoch 241: 1.1643855379493093 2.0308919296252905
Epoch 242: 1.1684656530568671 2.0836439974908796
Epoch 243: 1.156170351707462 1.9437864267698877
Epoch 244: 1.1632490253614063 2.0331559937553387
Epoch 245: 1.1625416949346654 1.9878681854250837
Epoch 246: 1.1743225954897536 2.139668084828036
Epoch 247: 1.160922425872473 1.9673081971292015
Epoch 248: 1.158370283489093 1.9186952135315374
Epoch 249: 1.1590132165288352 1.9575008370562035
Epoch 250: 1.1692487934322693 2.047704473254694
Epoch 251: 1.1661088708383454 2.0272174059642936
Epoch 252: 1.164244940978814 1.9922728331857886
Epoch 253: 1.1740138877833493 2.144034922963351
Epoch 254: 1.1627478961724935 1.9765857849724182
Epoch 255: 1.157342812461918 1.9176534632661586
Epoch 256: 1.167310602022143 2.023937327632803
Epoch 257: 1.1664092885855957 2.0225094450202428
Epoch 258: 1.166403114048588 2.0214241796771435
Epoch 259: 1.1640833336509528 1.9501414768095904
Epoch 260: 1.172884748612123 2.047402314044348
Epoch 261: 1.1733694938120844 2.0709007251204166
Epoch 262: 1.1703883076248496 1.9846698128270954
Epoch 263: 1.1721027556301264 2.037220774305596
Epoch 264: 1.17839549649102 2.1112297457663614
Epoch 265: 1.1654972084590798 1.9444411088542253
Epoch 266: 1.1770102224348051 2.0984330461842684
Epoch 267: 1.1661324837193872 1.9735288321449527
Epoch 268: 1.1684228835987054 2.0136744531303976
Epoch 269: 1.162849000284166 1.9527235425674798
Epoch 270: 1.1734050786860557 2.0986326550647374
Epoch 271: 1.160698086530356 1.968205613519547
Epoch 272: 1.1718205074971704 2.087588337895705
Epoch 273: 1.1707801461385712 2.05541134597421
Epoch 274: 1.1691796980186806 2.0359205380058287
Epoch 275: 1.1681172510742326 2.0465446260179836
Epoch 276: 1.1668021632958008 2.055020724857271
Epoch 277: 1.151646609766444 1.9027054110156603
Epoch 278: 1.1630268716903074 2.0320960711046787
Epoch 279: 1.1532204157848187 1.9267965144776447
Epoch 280: 1.1545101829463142 1.9504711296639747
Epoch 281: 1.1539752359172013 1.9155125002617734
Epoch 282: 1.1605993185017476 1.9984865154380536
Epoch 283: 1.16832157879913 2.0808959291191385
Epoch 284: 1.1571226626221311 1.9496603321626576
Epoch 285: 1.1637392503157475 1.966644596396405
Epoch 286: 1.1703061954315055 2.0508837977273955
Epoch 287: 1.1661373161151396 1.9581123052424907
Epoch 288: 1.1716489545640907 2.0660943356351633
Epoch 289: 1.162210911845686 2.000755633795154
Epoch 290: 1.1665233458674846 2.103888995353846
Epoch 291: 1.152381620806598 2.041867334236522
Epoch 292: 1.1533313698555427 2.0270061674169306
Epoch 293: 1.1540479557480243 2.0564307797064973
Epoch 294: 1.1530786586022215 2.015652244367046
Epoch 295: 1.1586921442683673 2.0977453777519637
Epoch 296: 1.1466277814629358 1.9802845942481904
Epoch 297: 1.1484481049977502 1.9713593809492327
Epoch 298: 1.145805258441764 1.9447097646418092
Epoch 299: 1.1476998763032493 1.9471699750330864
Epoch 300: 1.1475673482226003 1.946809335925523
Epoch 301: 1.1450321497622773 1.9366124455646785
Epoch 302: 1.1610396227941684 2.084448184315205
Epoch 303: 1.15617334776939 2.066528471403558
Epoch 304: 1.157186834932992 2.039126665822382
Epoch 305: 1.1518226564498195 2.0724202431858814
Epoch 306: 1.1475072952738203 2.0187104774884315
Epoch 307: 1.144911046485268 1.9691664109997982
Epoch 308: 1.1436634433193602 1.9981925681435333
Epoch 309: 1.1440556245610136 1.9958492184228103
Epoch 310: 1.1416845744986281 1.9567346183934688
Epoch 311: 1.1465242589209208 1.9965803402508104
Epoch 312: 1.1402822568281583 1.9572394426584003
Epoch 313: 1.1444256744712749 1.9917700404530183
Epoch 314: 1.1341670575342746 1.8563959255071167
Epoch 315: 1.1451735809680927 1.9990284579202906
Epoch 316: 1.1448462684191392 1.946829763991842
Epoch 317: 1.1460292976769055 1.9591527701272857
Epoch 318: 1.1524783638351412 2.0395256491258884
Epoch 319: 1.1481255014746177 1.955782751168601
Epoch 320: 1.1535458673634562 2.018376299089275
Epoch 321: 1.1564232403894963 2.077434874460244
Epoch 322: 1.147254973345927 1.9583338917817799
Epoch 323: 1.1516395714656882 2.009468785579987
Epoch 324: 1.1557673220619185 2.063228215257118
Epoch 325: 1.1534744287737841 1.9935416955109886
Epoch 326: 1.150181727132005 1.949609916674194
Epoch 327: 1.1441988979348574 1.869514673151032
Epoch 328: 1.1501336543379006 1.9673135310095184
Epoch 329: 1.1511906713198927 1.9199337926780946
Epoch 330: 1.1494617877918734 1.9073749427857871
Epoch 331: 1.1469113985657917 1.9805302170310206
Epoch 332: 1.1541405113958176 2.0325597852045463
Epoch 333: 1.1402980906326248 1.8390940750249538
Epoch 334: 1.1531753874448596 1.9989140653485673
Epoch 335: 1.1567599623902975 2.017653960628913
Epoch 336: 1.1581622391493147 2.015007567430858
Epoch 337: 1.1590143212810402 2.0609980775168237
Epoch 338: 1.1510751732507323 2.001375491561636
Epoch 339: 1.1481402813804618 1.9660002677298571
Epoch 340: 1.1493973273100468 1.9400570063599067
Epoch 341: 1.1583282367595962 2.0790651999854237
Epoch 342: 1.1546969443141344 2.019734985064594
Epoch 343: 1.1527831700666127 1.9674135205163883
Epoch 344: 1.1549645414004304 2.0057042393205355
Epoch 345: 1.1576926450047145 2.0576406101941864
Epoch 346: 1.161981814267257 2.0685419449014533
Epoch 347: 1.1618604580758003 2.0649695637689844
Epoch 348: 1.1562319540905006 1.9578249376360854
Epoch 349: 1.1625951733893936 1.9825105915855945
Epoch 350: 1.157983377883249 1.9215557186068493
Epoch 351: 1.1642282348431414 1.9658373090480363
Epoch 352: 1.1528841361924658 1.8699415500500947
Epoch 353: 1.1588619403644715 1.973303504547195
Epoch 354: 1.1644781193218436 2.0546850735644964
Epoch 355: 1.1603904375276721 2.025002971930877
Epoch 356: 1.1549509698733604 1.9856711113014198
Epoch 357: 1.1448840032833865 1.8643261327266034
Epoch 358: 1.1526553868308187 1.9910940344948775
Epoch 359: 1.1590624478940748 2.0905241266197585
Epoch 360: 1.1438425028437778 1.8713452738982141
Epoch 361: 1.1401137304292206 1.8118780505269045
Epoch 362: 1.1495959026983391 1.9370362546355204
Epoch 363: 1.1484417214953213 1.9652754975670976
Epoch 364: 1.1488545201409164 1.990225168446491
Epoch 365: 1.1446320914497148 1.9126347675717308
Epoch 366: 1.149701200028218 1.9804858077778578
Epoch 367: 1.143652647759062 1.867703685220218
Epoch 368: 1.156057605455258 2.0361489292896566
Epoch 369: 1.1523585704588584 2.0066252854364075
Epoch 370: 1.1545886779431278 2.0608739349648895
Epoch 371: 1.1513732490007436 1.9781191873402264
Epoch 372: 1.153112913284324 1.9985838970000227
Epoch 373: 1.1520276656465205 1.9937025191088742
Epoch 374: 1.153933670869458 2.031596796668612
Epoch 375: 1.1538351175449262 2.037474678249571
Epoch 376: 1.1520646242862507 2.017049789453241
Epoch 377: 1.1466829516420303 1.9419215391653144
Epoch 378: 1.163010896069329 2.079237976454214
Epoch 379: 1.1500419394047414 1.9335876359293516
Epoch 380: 1.157359178919251 2.0173619548181474
Epoch 381: 1.1553387148870204 1.9869264142006289
Epoch 382: 1.150777670711536 1.89310014361363
Epoch 383: 1.1604969603315158 2.0407271442227106
Epoch 384: 1.1523358843246543 1.9426426565179926
Epoch 385: 1.1635756840442424 2.004076157846764
Epoch 386: 1.1673614420569172 2.0596411901504927
Epoch 387: 1.1585544994201846 1.9538972375490153
Epoch 388: 1.1635871495397658 2.02419123848167
Epoch 389: 1.1626716170881763 1.9844917942397182
Epoch 390: 1.1640985607676797 1.9942870242566482
Epoch 391: 1.1667433246462153 2.012710781308925
Epoch 392: 1.166057448474898 2.013157912048576
Epoch 393: 1.1625924218958799 1.9780533169102879
Epoch 394: 1.1677546394375398 2.0679503119367917
Epoch 395: 1.165376031086863 2.040136965156445
Epoch 396: 1.158296853822276 1.9564699927587872
Epoch 397: 1.1606700470871392 1.9480817414503098
Epoch 398: 1.168912601731494 2.041150736836026
Epoch 399: 1.1706081338896337 2.021184077249987
Epoch 400: 1.1801842754484062 2.1239415606354117
Epoch 401: 1.1677212057163933 2.012395110885921
Epoch 402: 1.1672601497312562 2.0311179189639055
Epoch 403: 1.1446005340238736 1.745576950488077
Epoch 404: 1.16644155750266 2.0371263086647105
Epoch 405: 1.160096559777111 1.9573745798179463
Epoch 406: 1.1561184001485691 1.9270445802835297
Epoch 407: 1.1564428794892025 1.9534130611724356
Epoch 408: 1.157832333324879 1.9990185814286516
Epoch 409: 1.1542536578440337 1.9603170999631656
Epoch 410: 1.1579641202641253 2.0197871817756243
Epoch 411: 1.1483763991029288 1.9513250981321313
Epoch 412: 1.1404922532824875 1.9209794035824443
Epoch 413: 1.1507849734020454 2.0644281753554914
Epoch 414: 1.1522729146307125 2.014082678488246
Epoch 415: 1.1539226907918823 2.0452288186933716
Epoch 416: 1.1553935535896283 2.0516110385635793
Epoch 417: 1.1522612541246897 2.0132793132417066
Epoch 418: 1.1445911719870643 1.8998976720242748
Epoch 419: 1.15111755242973 1.9942442880638716
Epoch 420: 1.1492918953099753 1.9723618584340097
Epoch 421: 1.1508398588156288 2.0285906972912917
Epoch 422: 1.1457613876618533 1.9442137226381164
Epoch 423: 1.1544611539760208 2.033125521968535
Epoch 424: 1.158597511625977 2.0883628018192923
Epoch 425: 1.1571397943655337 2.030533907318707
Epoch 426: 1.1485143165837068 1.9784233945860508
Epoch 427: 1.154701391933309 1.9971076092250635
Epoch 428: 1.153064423920393 1.9959524721314952
Epoch 429: 1.1567851179708466 2.019159503275338
Epoch 430: 1.1571300814502 2.0130031186865063
Epoch 431: 1.1557711599737877 1.961595185581823
Epoch 432: 1.1546534087018567 1.9417015113839
Epoch 433: 1.1600875830711923 1.9985945740120918
Epoch 434: 1.1621550174809294 2.0081173436065765
Epoch 435: 1.1647858119071102 2.0069931180021467
Epoch 436: 1.171922539030513 2.0754662230190997
Epoch 437: 1.167478389331363 1.9709286580435443
Epoch 438: 1.1584175000364914 1.8390082675323287
Epoch 439: 1.1702472757144489 1.9891024656575065
Epoch 440: 1.171445495576249 2.0087085929409083
Epoch 441: 1.1707096902503504 1.9762975241864922
Epoch 442: 1.1637849791651869 1.9026433988163767
Epoch 443: 1.171386642660623 1.9761437663208812
Epoch 444: 1.1779935021986057 2.046505549169825
Epoch 445: 1.1671231314509358 1.9652553164672595
Epoch 446: 1.1684116342751572 2.0495697045197634
Epoch 447: 1.1647003781266703 1.9978788500721527
Epoch 448: 1.1622405847121584 1.9532584123270598
Epoch 449: 1.163145617594456 1.97476781288562
Epoch 450: 1.1660072402971797 1.9742775233108107
Epoch 451: 1.1754628552358457 2.0751210976336836
Epoch 452: 1.1735655090109642 2.0469302285938347
Epoch 453: 1.169287003886958 2.002149720380278
Epoch 454: 1.1635890708472378 1.9611595279040226
Epoch 455: 1.1715093753124384 2.069328916004224
Epoch 456: 1.1680462317101417 2.0319407602859423
Epoch 457: 1.172094998018456 2.0318137992892273
Epoch 458: 1.160355532966944 1.8619506742616743
Epoch 459: 1.173269399075153 2.027583132221297
Epoch 460: 1.1684714341198927 1.963550735443599
Epoch 461: 1.170713830400304 2.044528597373633
Epoch 462: 1.1771672554199253 2.152225397121842
Epoch 463: 1.165837688657203 2.0185833196372154
Epoch 464: 1.1737217995420748 2.1373669706733183
Epoch 465: 1.1578672638403995 1.9871013310791956
Epoch 466: 1.155718232390874 1.9557080632755361
Epoch 467: 1.1557765390859782 1.9816049106330598
Epoch 468: 1.1618268652269887 2.070405227570197
Epoch 469: 1.161491829599467 2.055096537614472
Epoch 470: 1.1501254958475469 1.9661249599952544
Epoch 471: 1.1572012124584645 2.0097660872796412
Epoch 472: 1.1589604474043982 2.0302610947822126
Epoch 473: 1.1570301975614823 2.023791417809254
Epoch 474: 1.154208144563905 1.9919144916032592
Epoch 475: 1.1519935009767621 1.9600780058834755
Epoch 476: 1.1523555948256758 1.9765881895154538
Epoch 477: 1.157402986756551 1.9904666517287626
Epoch 478: 1.1633983269171395 2.052477744992646
Epoch 479: 1.1599612891715394 1.977522581958186
Epoch 480: 1.1630221140374917 2.0175865662949612
Epoch 481: 1.1603236425247734 2.0134881202380304
Epoch 482: 1.1622428147093367 2.041719852029051
Epoch 483: 1.1663182676893415 2.053700356965903
Epoch 484: 1.1715021454571897 2.1250912513664217
Epoch 485: 1.1693220355043468 2.096288016694156
Epoch 486: 1.162186265069271 2.007679351313094
Epoch 487: 1.156178143974997 1.977949535688021
Epoch 488: 1.1602071602457047 1.994003641212097
Epoch 489: 1.1628305001357682 2.015400203164338
Epoch 490: 1.161280006087401 2.003279628817588
Epoch 491: 1.1599510493939122 2.002753704350214
Epoch 492: 1.1601241418913693 1.98766468931923
Epoch 493: 1.1622395894987398 1.9936463538182703
Epoch 494: 1.157830575420715 1.9005282558070824
Epoch 495: 1.1624944319141972 1.9714438683768225
Epoch 496: 1.1622015660822151 1.9369276388742183
Epoch 497: 1.1688861721589185 1.9886871017866947
Epoch 498: 1.1696200308857707 2.0001774086437893
Epoch 499: 1.1751592349446145 2.0805927706849436
Epoch 500: 1.1726821241284315 2.0608937547750488
Epoch 501: 1.1693244711262787 2.0588277961979427
Epoch 502: 1.164318861403886 1.9568837861962476
Epoch 503: 1.1674620721851714 2.001643097605654
Epoch 504: 1.1686390240785633 2.0403778965299755
Epoch 505: 1.160558346790356 1.9947000458052078
Epoch 506: 1.1550101382922815 1.9553919771250783
Epoch 507: 1.1633863462773795 2.047664305171632
Epoch 508: 1.161234130215018 2.004647106635081
Epoch 509: 1.163735588214571 2.0240105576567906
Epoch 510: 1.154880290041301 1.9186242647350635
Epoch 511: 1.165342490479345 2.0470743379697582
Epoch 512: 1.1643114503808094 2.0418126189070493
Epoch 513: 1.157790152976131 1.9940304144216299
Epoch 514: 1.16612453996357 2.0405570445462846
Epoch 515: 1.1624344094319572 1.9741268566492267
Epoch 516: 1.1655260982797748 2.001969136200936
Epoch 517: 1.1724045019844676 2.091767385080294
Epoch 518: 1.1716510265738609 2.0843143345325106
Epoch 519: 1.1686269961900118 2.073748595118983
Epoch 520: 1.1638694305193875 2.0280956923017217
Epoch 521: 1.1636480533268407 2.0332406500489677
Epoch 522: 1.1603089324484057 1.9955956478220662
Epoch 523: 1.1655246105364943 2.0737222183350674
Epoch 524: 1.1662126905080017 2.076627691101107
Epoch 525: 1.1568865515059628 1.974100652504382
Epoch 526: 1.168518075087314 2.116093664338233
Epoch 527: 1.1739213365871974 2.1025021700328166
Epoch 528: 1.1640961527037137 2.0044410213216626
Epoch 529: 1.1672718428903104 2.0129850523107686
Epoch 530: 1.161172217703352 1.9262876678915872
Epoch 531: 1.1585188388036571 1.9180818748253303
Epoch 532: 1.1653540395516186 2.0225110646631754
Epoch 533: 1.157019824202749 1.9078821673186914
Epoch 534: 1.1661352704968753 2.005397500866252
Epoch 535: 1.1627653941287703 1.9065647246932755
Epoch 536: 1.1578514209429478 1.8395637428079765
Epoch 537: 1.1662191899152756 2.00156943830715
Epoch 538: 1.168201945668109 2.023471497362851
Epoch 539: 1.1665228526276126 2.0644991500670145
Epoch 540: 1.1658816613974488 2.058665702998422
Epoch 541: 1.1643447355846515 1.986067069948219
Epoch 542: 1.162476253404076 1.9906159550427986
Epoch 543: 1.1747341048941513 2.15537441684528
Epoch 544: 1.158303334439897 1.940222766571229
Epoch 545: 1.167750097576072 2.047987879505625
Epoch 546: 1.161499969447288 1.9796463160817586
Epoch 547: 1.1618397809182983 2.0145491275750587
Epoch 548: 1.1530248211180012 1.9542380628345206
Epoch 549: 1.1531222017080704 1.9157917081991016
Epoch 550: 1.1635858828490209 2.0209805841913973
Epoch 551: 1.1579552887038964 1.9522790453630605
Epoch 552: 1.1649831808558848 2.046686158613785
Epoch 553: 1.1640074373615312 2.003159153474343
Epoch 554: 1.1651457482691459 2.0102017022980583
Epoch 555: 1.1715635029370517 2.0164581257660914
Epoch 556: 1.1764307339401388 2.0724774274837436
Epoch 557: 1.1669823043140815 1.9717705460185897
Epoch 558: 1.1744484834107063 1.997186379174134
Epoch 559: 1.169286604704071 1.9482003074049572
Epoch 560: 1.1708232269684675 1.9670896160116247
Epoch 561: 1.177918040159993 2.0833842383961665
Epoch 562: 1.175450588919051 2.0409732812502694
Epoch 563: 1.1757013259264324 2.036028007875955
Epoch 564: 1.169852341967423 1.9178011529882681
Epoch 565: 1.1839854524569744 2.1097700742101275
Epoch 566: 1.1699901890362303 1.9443972610595948
Epoch 567: 1.1784189477903166 2.0134108575047995
Epoch 568: 1.1724111060404596 1.9501752190008588
Epoch 569: 1.1793644855150411 2.058851187496867
Epoch 570: 1.1710704634361726 1.9844881108378327
Epoch 571: 1.170637349147944 2.0128491214203437
Epoch 572: 1.1674859871552459 1.995827220735414
Epoch 573: 1.1602092563525992 1.937681733563966
Epoch 574: 1.1562973452235896 1.9081278689182501
Epoch 575: 1.1601815573497287 1.9425865690783073
Epoch 576: 1.1618195147540886 1.9472110697687859
Epoch 577: 1.1708741364461133 2.043250021263253
Epoch 578: 1.1676068093394385 2.003299759582118
Epoch 579: 1.1651953384573908 1.9480119950750352
Epoch 580: 1.1729481290413073 2.0578489946616196
Epoch 581: 1.1756518105924687 2.081163220123037
Epoch 582: 1.1665099587772048 1.9747648880773814
Epoch 583: 1.167405836182548 1.9592179552107116
Epoch 584: 1.1746998654284155 2.0028697591643874
Epoch 585: 1.1773365248807184 2.0719121468963726
Epoch 586: 1.174884035757471 2.0468189831104544
Epoch 587: 1.166853113745297 1.9770294711982148
Epoch 588: 1.164852928893286 1.9380153940723435
Epoch 589: 1.1641376525904508 1.9377478395940595
Epoch 590: 1.1599106893044446 1.9057579256872386
Epoch 591: 1.1709976070406567 2.029604329245804
Epoch 592: 1.1716528984848746 2.0380568684025455
Epoch 593: 1.1686722556889133 2.0331959031481532
Epoch 594: 1.1632339738919797 1.9820065033475909
Epoch 595: 1.1564164096740799 1.9271097108640185
Epoch 596: 1.1666694828568867 2.0724520045260295
Epoch 597: 1.1537707827222201 1.9181065128036707
Epoch 598: 1.1571075122251715 1.9705058791561187
Epoch 599: 1.1721611168487915 2.1617614972119306
Epoch 600: 1.155183953593385 1.9749704550186957
Epoch 601: 1.1554366688928757 1.9844079087563904
Epoch 602: 1.1647280640469377 2.081337702842052
Epoch 603: 1.1658496108379754 2.1060936098291947
Epoch 604: 1.14758503546801 1.8717290219119311
Epoch 605: 1.1510322758027156 1.9657134666956575
Epoch 606: 1.1535227944197683 1.989369661635029
Epoch 607: 1.1538889452189671 2.0018450944565944
Epoch 608: 1.1635852958354194 2.1179681261409202
Epoch 609: 1.1554217599794048 1.9053692359601369
Epoch 610: 1.1697281053661879 2.063683085157363
Epoch 611: 1.1654273181045691 2.0749523365744253
Epoch 612: 1.1532446422864804 1.9181586657893415
Epoch 613: 1.163051758110536 2.046855736204119
Epoch 614: 1.151106979641301 1.9490453567949784
Epoch 615: 1.1589446331299682 2.0621825131187985
Epoch 616: 1.156008845475296 2.0593705707766103
Epoch 617: 1.1561201349749786 2.077364593638815
Epoch 618: 1.1479035736525844 2.008954274865795
Epoch 619: 1.144524703842288 1.980690356318616
Epoch 620: 1.136992737736608 1.920267931915191
Epoch 621: 1.1449055065162845 2.0256836009563326
Epoch 622: 1.1369548617973584 1.9139524101299534
Epoch 623: 1.1394590802641777 1.9412799156168985
Epoch 624: 1.1440634230707285 1.9759853212911
Epoch 625: 1.1377341050508412 1.8056331460596537
Epoch 626: 1.154621556420888 2.0517240402014973
Epoch 627: 1.1437171562739505 1.9261258346472023
Epoch 628: 1.150385307559757 2.027778902322227
Epoch 629: 1.1530967199002244 2.0626847385825093
Epoch 630: 1.1531172636195939 2.065527734414501
Epoch 631: 1.1490742275494856 1.9638324313626334
Epoch 632: 1.155315310074967 2.102823402912834
Epoch 633: 1.1417601690775259 1.9172810647771976
Epoch 634: 1.1510159534567739 2.010997298421388
Epoch 635: 1.149005741676144 1.9937017899446772
Epoch 636: 1.155222921847631 2.01113908577509
Epoch 637: 1.150820535642392 1.9526606615164677
Epoch 638: 1.1551409338435366 2.000433609996005
Epoch 639: 1.1578834103979951 2.0468988071021372
Epoch 640: 1.1565305856473265 2.0128183635846515
Epoch 641: 1.1591606428192935 2.01835623644454
Epoch 642: 1.1569524851263293 2.045253459625615
Epoch 643: 1.1514787531975645 1.9551687657003172
Epoch 644: 1.163649621435769 2.0838699038768436
Epoch 645: 1.1472411948456802 1.9178902458696894
Epoch 646: 1.1447468750038605 1.8467790295194404
Epoch 647: 1.165686895471076 2.1111351565553234
Epoch 648: 1.1576923239875891 1.9899596857567805
Epoch 649: 1.161485272855309 2.025151149884347
Epoch 650: 1.1597597440083867 1.908835408981916
Epoch 651: 1.1650400581070193 2.0420247761455457
Epoch 652: 1.1675470577075604 2.062979899616677
Epoch 653: 1.159614412837413 1.994903243221295
Epoch 654: 1.1582505049578176 1.9941648150426208
Epoch 655: 1.1634176578590663 2.057592596278545
Epoch 656: 1.1611885297523123 2.007311825230074
Epoch 657: 1.154168506409637 1.9096176697993068
Epoch 658: 1.1658267998253125 2.0735413463909116
Epoch 659: 1.1622960295547577 1.9920497664560168
Epoch 660: 1.1605113725197804 1.9769764654165227
Epoch 661: 1.1688240775030034 2.0720466891643783
Epoch 662: 1.166474635389868 2.043905242100123
Epoch 663: 1.1561023168581162 1.891245447819258
Epoch 664: 1.170902381424022 2.0453216165097943
Epoch 665: 1.1735445476514155 2.0428657951344187
Epoch 666: 1.172054013314705 2.0262399185341877
Epoch 667: 1.1708034841598174 2.0146171031004765
Epoch 668: 1.1671142915453008 1.9348631427426084
Epoch 669: 1.1775445636369115 2.0477901564859926
Epoch 670: 1.1644419422264496 1.919474895221312
Epoch 671: 1.1741559338165275 2.074967660671925
Epoch 672: 1.1731059472314012 2.0687478375680235
Epoch 673: 1.1625465689120127 1.9400170806618209
Epoch 674: 1.1621233434513776 1.9331084122401405
Epoch 675: 1.172690681588269 2.078074424231667
Epoch 676: 1.1673711696972886 2.0179073495059945
Epoch 677: 1.1706102870624606 2.0697200328933634
Epoch 678: 1.1703580494763104 1.9874324564763488
Epoch 679: 1.1619811693165076 1.8821263438324085
Epoch 680: 1.1760639541364406 2.0506844372123743
Epoch 681: 1.1768429181245823 2.0749670618770857
Epoch 682: 1.1636128240680195 1.8879778704396915
Epoch 683: 1.1709499629950786 1.9835964000124204
Epoch 684: 1.176966935674532 2.0451598736206993
Epoch 685: 1.1748268450232888 2.010486718394343
Epoch 686: 1.172105455781458 1.9888604323678445
Epoch 687: 1.1773147388707024 2.0552232315545726
Epoch 688: 1.174981523032786 2.025749691408062
Epoch 689: 1.175802764810576 2.0272108047081865
Epoch 690: 1.1741388731261873 2.0438495359399673
Epoch 691: 1.1645637415696226 1.914852964763105
Epoch 692: 1.1775437201929273 2.0837327430593087
Epoch 693: 1.171021545689541 2.0380823234432075
Epoch 694: 1.1720299111368093 2.077853785353434
Epoch 695: 1.166607405867305 2.0266663943476817
Epoch 696: 1.1616206086636185 2.0041200728231723
Epoch 697: 1.1650256609464475 2.0303823418621296
Epoch 698: 1.164814414701397 1.9838895901176739
Epoch 699: 1.1743731417049934 2.121649344069409
Epoch 700: 1.1729325183302142 2.090510587257664
Epoch 701: 1.167460302805175 1.9853618003596634
Epoch 702: 1.171378151734284 2.053763389219159
Epoch 703: 1.1702239862054735 2.055377866431336
Epoch 704: 1.1655974083142302 1.99677623915332
Epoch 705: 1.1637079902216065 1.9796444231160475
Epoch 706: 1.1657164709848369 1.9887143991472445
Epoch 707: 1.1675292832336142 2.0241719840109442
Epoch 708: 1.1698232737370793 2.035350913839469
Epoch 709: 1.1734883786457513 2.058245749818177
Epoch 710: 1.178176428028091 2.0956221073814563
Epoch 711: 1.1707212472550819 2.0256728162977207
Epoch 712: 1.1575486624890918 1.8194403182077832
Epoch 713: 1.177523685965058 2.0673595592156992
Epoch 714: 1.1747159994855776 2.043155876208187
Epoch 715: 1.1710098799379844 1.9762692647653097
Epoch 716: 1.1640454907921152 1.8903064571396975
Epoch 717: 1.1762622377686232 2.0574860911681907
Epoch 718: 1.1787825645581391 2.002981652644582
Epoch 719: 1.1760841297560203 1.9935265856839561
Epoch 720: 1.1809428938255737 2.0650923158709085
Epoch 721: 1.1765205975787607 1.9578502910758795
Epoch 722: 1.1774642175778434 2.019709090157791
Epoch 723: 1.1680699833280164 1.898312445095025
Epoch 724: 1.1761840770108003 2.0462077852453637
Epoch 725: 1.1718479222055256 2.0058071438445033
Epoch 726: 1.1664817877999039 1.9096610999007464
Epoch 727: 1.1724817886067012 1.9513913121667488
Epoch 728: 1.177878687532817 2.0048043751218154
Epoch 729: 1.1805702427200604 2.038253110489647
Epoch 730: 1.1731762805250165 1.9773643250294353
Epoch 731: 1.1719481861256338 1.9681530248838734
Epoch 732: 1.170958734170868 1.931736000361683
Epoch 733: 1.1809837229509716 2.076098698308606
Epoch 734: 1.1814920816011996 2.1008648023647707
Epoch 735: 1.1739589961909729 1.9850473133869209
Epoch 736: 1.175209416452425 2.010759548096611
Epoch 737: 1.17394430015845 1.980847883428966
Epoch 738: 1.1742166094964246 1.9785416003352714
Epoch 739: 1.176104874697144 1.9923927012366787
Epoch 740: 1.1713424931242489 1.8749667813350874
Epoch 741: 1.179252499460744 1.9830882619410506
Epoch 742: 1.1719552117063976 1.8948350185906075
Epoch 743: 1.1823893626823063 2.0678006992277806
Epoch 744: 1.1780966097356345 1.998036497360876
Epoch 745: 1.179165780753876 2.004630972509234
Epoch 746: 1.1835778630200546 2.0950285707921648
Epoch 747: 1.1766786813161292 2.044100782536718
Epoch 748: 1.160163503077738 1.9177065378896447
Epoch 749: 1.1675883035849126 2.014212936737423
Epoch 750: 1.1634300571662926 1.9721611100038532
Epoch 751: 1.1715877858447237 2.043941039719496
Epoch 752: 1.1584171500053826 1.8746409041865049
Epoch 753: 1.1763127461801561 2.0918118518975723
Epoch 754: 1.1659716692903694 1.9742224619202564
Epoch 755: 1.1646604634177344 1.9361122857977011
Epoch 756: 1.160620833248367 1.8662711354233066
Epoch 757: 1.1661982405581681 1.968819352593975
Epoch 758: 1.1629045745496742 1.9550035485315502
Epoch 759: 1.1647548221153303 1.9730341239745295
Epoch 760: 1.1663056539956702 1.965890594274193
Epoch 761: 1.1779151897816287 2.081608721625768
Epoch 762: 1.1693962316786333 1.941286889886656
Epoch 763: 1.1805267074161756 2.0743145136767094
Epoch 764: 1.1800985705084266 2.0668825274432767
Epoch 765: 1.1726861160529571 1.956012847673854
Epoch 766: 1.1877647813139627 2.083771880721844
Epoch 767: 1.1759959529950046 1.9502345357584432
Epoch 768: 1.180106600845589 2.030203465317662
Epoch 769: 1.1785547996239625 2.0075038810866097
Epoch 770: 1.1721198204166714 1.9509697428767063
Epoch 771: 1.1748246698164564 1.9928669913981039
Epoch 772: 1.1730788942530228 1.9950697028851319
Epoch 773: 1.175246930868462 1.9668602729713875
Epoch 774: 1.1734417451383476 1.9514775324234275
Epoch 775: 1.1865747041781565 2.098993134805362
Epoch 776: 1.188501736263222 2.0506954757968603
Epoch 777: 1.182852656139179 1.9646481114324323
Epoch 778: 1.185967165253824 2.013231001970683
Epoch 779: 1.1844395490522908 2.0910341881077166
Epoch 780: 1.1737915821547968 1.9646164138616604
Epoch 781: 1.191758869935296 2.18962453706787
Epoch 782: 1.1831959356758552 2.1198312684358753
Epoch 783: 1.1723311593442 1.9859166627501075
Epoch 784: 1.1744433846975588 2.019754884079452
Epoch 785: 1.1719566576658207 2.019318801231734
Epoch 786: 1.1710418213323723 1.988913309003219
Epoch 787: 1.1620316669598556 1.8707335762957873
Epoch 788: 1.1703379354775691 1.997393590357857
Epoch 789: 1.1719740907252663 2.025390894214763
Epoch 790: 1.1751951340536277 2.073895434476763
Epoch 791: 1.1649917139926171 1.9780477435131272
Epoch 792: 1.1703869198290446 2.025718385462993
Epoch 793: 1.1702765273424296 2.0614739778898983
Epoch 794: 1.1625825879514962 1.9226274177688791
Epoch 795: 1.1655086323726378 1.9208708143722197
Epoch 796: 1.1691174033705247 1.9006829597676416
Epoch 797: 1.1729928727329737 1.9640889316484784
Epoch 798: 1.1778463811902438 1.9921573108254367
Epoch 799: 1.1773514612461151 1.9667117251989434
Epoch 800: 1.177124613518903 2.0343926448254734
Epoch 801: 1.1749925961056273 1.985964658981048
Epoch 802: 1.179957250611606 2.047084039569667
Epoch 803: 1.178775628421662 2.0210086054432637
Epoch 804: 1.175220208981832 2.02350131862926
Epoch 805: 1.1673621446963132 1.960305978293802
Epoch 806: 1.1791041023504112 2.1120544016014757
Epoch 807: 1.178017568433982 2.0771402984542813
Epoch 808: 1.165647287412458 1.9602070874952493
Epoch 809: 1.1581522666958617 1.9120867281927372
Epoch 810: 1.1574695893847933 1.9553021737969036
Epoch 811: 1.1532611898613325 1.9119749320993886
Epoch 812: 1.1614700136038754 1.98716206728456
Epoch 813: 1.1671994116082518 2.0275299821752375
Epoch 814: 1.1666677665583243 2.0417365532450047
Epoch 815: 1.1697040029415398 2.0734529891667592
Epoch 816: 1.1686166547942265 2.0558238071783412
Epoch 817: 1.1601948873090822 2.010241210464497
Epoch 818: 1.150618385029723 1.9258408855428137
Epoch 819: 1.1636429174708893 2.0233682903580497
Epoch 820: 1.1622884142555794 2.0014636002126447
Epoch 821: 1.1555748556733847 1.906875567540085
Epoch 822: 1.1664335200140223 2.0571278055304023
Epoch 823: 1.1583224588173926 1.9869757506956676
Epoch 824: 1.1702089173813757 2.1543993462777062
Epoch 825: 1.1616518864819756 2.0773275184740485
Epoch 826: 1.1551967684697095 2.0063270888888423
Epoch 827: 1.152287628406903 1.9779246087420448
Epoch 828: 1.157850677359235 2.057287178586857
Epoch 829: 1.1556715647828006 2.0239885485233216
Epoch 830: 1.1565890096397193 2.0503171254560946
Epoch 831: 1.1595389084512515 1.98795193268577
Epoch 832: 1.1589800571798163 2.011033267696134
Epoch 833: 1.1556691202822504 1.9693737537898452
Epoch 834: 1.157665567164785 2.0208205094235487
Epoch 835: 1.1595353260212122 2.063046521990212
Epoch 836: 1.1557062861486722 1.9994191106493338
Epoch 837: 1.1565809689923208 1.983921711149201
Epoch 838: 1.1623608098363054 2.0646902809727394
Epoch 839: 1.1581531743891427 1.987295732082542
Epoch 840: 1.1590402311588939 1.9689879988643828
Epoch 841: 1.1646506455863843 2.0480950713908213
Epoch 842: 1.162215740353453 1.9799758074439537
Epoch 843: 1.1571640072959855 1.921655076066351
Epoch 844: 1.1580560500600952 1.9277735300278482
Epoch 845: 1.1649186113385384 1.9996409990007527
Epoch 846: 1.1646426722474732 2.003677002909093
Epoch 847: 1.163233802067977 1.9591389964503692
Epoch 848: 1.1707356847623558 2.063112166983802
Epoch 849: 1.1649031859386627 1.9707133701977613
Epoch 850: 1.1669768604197768 2.003781069027834
Epoch 851: 1.1666918279123084 1.9670802100651719
Epoch 852: 1.177111771987998 2.082291134554474
Epoch 853: 1.1714490445121664 1.9742670079602334
Epoch 854: 1.1690847946719853 1.9717368670033464
Epoch 855: 1.1704196198091974 2.001142604751988
Epoch 856: 1.1643601918228275 1.9346720864323468
Epoch 857: 1.169987727850821 1.9859981783905607
Epoch 858: 1.1679002555552322 1.9558523607403306
Epoch 859: 1.1700790689829543 1.9659696296969515
Epoch 860: 1.1702633562751767 2.035529723782505
Epoch 861: 1.1668381815422824 1.9730851200413737
Epoch 862: 1.1710780627357755 2.0021502399044393
Epoch 863: 1.1778865245313674 2.0584634789808436
Epoch 864: 1.1706303717546014 1.9793857058274713
Epoch 865: 1.1720638650705177 2.002020048112528
Epoch 866: 1.177027344548512 2.094128393383114
Epoch 867: 1.1590711620519718 1.930179604717959
Epoch 868: 1.1595880558962133 1.9178769953065604
Epoch 869: 1.163754641289815 1.9709385703174107
Epoch 870: 1.1626672733179608 1.9655228206886322
Epoch 871: 1.1630797077085382 2.0002004967166447
Epoch 872: 1.1660986739522754 2.071909935186307
Epoch 873: 1.160184645422692 1.9995763508452455
Epoch 874: 1.160804198929034 2.0300445505981894
Epoch 875: 1.1585661002935455 2.0156828588956004
Epoch 876: 1.1608319160411553 2.0008201812945963
Epoch 877: 1.1675099494614467 2.11487769035713
Epoch 878: 1.1590479574905752 2.005462536710277
Epoch 879: 1.1599498451300045 2.0019678492327615
Epoch 880: 1.1551786199453369 2.0250694826853666
Epoch 881: 1.1446999750255613 1.9146614964687192
Epoch 882: 1.1607304073807792 2.100712842396314
Epoch 883: 1.1587528035888388 2.075430378710201
Epoch 884: 1.1450873628445297 1.9458641513033033
Epoch 885: 1.1516276042155942 1.9869954395285911
Epoch 886: 1.1558943781481392 1.9879195865884405
Epoch 887: 1.1588996968992071 2.0238636993884156
Epoch 888: 1.1589243138387106 1.9685674227858476
Epoch 889: 1.1574874914236424 1.9914169005594886
Epoch 890: 1.1458185129034988 1.8453262503642136
Epoch 891: 1.149657736705517 1.9360649678219166
Epoch 892: 1.1525349621243426 2.0003436914457113
Epoch 893: 1.1566643190248354 2.0899291699728164
Epoch 894: 1.1538013519485895 2.0285061124476362
Epoch 895: 1.1494894712829409 1.9841138725718106
Epoch 896: 1.1467280116705567 1.9875843391309578
Epoch 897: 1.1457867423798365 1.965955036334711
Epoch 898: 1.1500922673760217 2.0171676734984088
Epoch 899: 1.1461515369907496 1.9924543962760368
Epoch 900: 1.1561367888381597 2.124404315519627
Epoch 901: 1.143623481118157 1.9879421847563254
Epoch 902: 1.1371309103687557 1.9300487974583553
Epoch 903: 1.1392961111037059 1.923134522407426
Epoch 904: 1.1407954023431104 1.9720533933578297
Epoch 905: 1.1447308147935196 1.9998299887445616
Epoch 906: 1.1464585126021198 2.0148125135693653
Epoch 907: 1.1404403740371 1.9297813616045891
Epoch 908: 1.1463488767108843 2.0096770631354377
Epoch 909: 1.1546029622929534 2.0832874132502397
Epoch 910: 1.150169572833268 2.03405456699914
Epoch 911: 1.1461421224307 1.940212083531217
Epoch 912: 1.1461235515941184 1.9879469239195826
Epoch 913: 1.1437277165448931 1.9458425306722948
Epoch 914: 1.1465826613349566 1.963993027082908
Epoch 915: 1.153700095135566 2.061082796293308
Epoch 916: 1.145535162769503 1.9399497199454492
Epoch 917: 1.1514181013196756 2.00126064285679
Epoch 918: 1.145316405324972 1.8983230429796274
Epoch 919: 1.14742180546 1.9248133926280373
Epoch 920: 1.1453945133821228 1.9258806454553379
Epoch 921: 1.1564822331850053 2.0417494917942447
Epoch 922: 1.16532666842355 2.111873507098976
Epoch 923: 1.1540591309061792 1.9727663628126484
Epoch 924: 1.1511551928026063 1.9398995635622858
Epoch 925: 1.153896463580002 1.958796187945312
Epoch 926: 1.1585792016906362 1.9770359681201741
Epoch 927: 1.1638372257027831 1.987788305134844
Epoch 928: 1.1693475765031072 2.0466449244121923
Epoch 929: 1.1613297289146602 1.9509056905792623
Epoch 930: 1.1602256163595408 1.907791824877854
Epoch 931: 1.1762509516408628 2.111297411782571
Epoch 932: 1.1699169121556179 2.0205901983640597
Epoch 933: 1.1684207492079968 1.9272494985274782
Epoch 934: 1.1775275555358624 2.0234431331006006
Epoch 935: 1.1835057205367496 2.096536323344183
Epoch 936: 1.1808714871626944 2.06793651984597
Epoch 937: 1.1711764282577117 1.944775568850033
Epoch 938: 1.1765847908624938 2.059669845699896
Epoch 939: 1.1689618571823044 1.99094115269792
Epoch 940: 1.1702445180640104 2.0274149394537373
Epoch 941: 1.1673014771928076 2.006856804873804
Epoch 942: 1.1680741730951625 2.0138214259752067
Epoch 943: 1.1632470351155013 1.9746100270936742
Epoch 944: 1.164775740105387 1.9920699533172106
Epoch 945: 1.1688593880528286 2.0088814901064453
Epoch 946: 1.1747648520414604 2.064169474455046
Epoch 947: 1.166189301475967 1.9512623293150524
Epoch 948: 1.1723293093984961 1.9991619097278583
Epoch 949: 1.1724157247810891 2.0119349303671377
Epoch 950: 1.1749165447698378 2.056822162737071
Epoch 951: 1.1736709326235968 2.0236610959914625
Epoch 952: 1.1750017021555834 2.0385020828048606
Epoch 953: 1.1767926748338744 2.078712474345494
Epoch 954: 1.1712354911892442 1.988543257494375
Epoch 955: 1.1740383287480203 2.053345404770091
Epoch 956: 1.1751871446465831 2.0933373481889297
Epoch 957: 1.1641647182469066 1.9155137587626623
Epoch 958: 1.1691004333247237 1.9414331144327648
Epoch 959: 1.1772536725425993 2.067633262103231
Epoch 960: 1.1779986870370482 2.086082028772622
Epoch 961: 1.1738897563079478 2.047915322270108
Epoch 962: 1.1619161143374852 1.8923934588954494
Epoch 963: 1.1697457741957606 1.9844759110197736
Epoch 964: 1.1697351096822857 1.9741557080092809
Epoch 965: 1.1674910142105261 1.970702981080298
Epoch 966: 1.1708913097774833 2.035510742775975
Epoch 967: 1.163560206727453 1.959034424532293
Epoch 968: 1.1645150122282628 2.0045453242661675
Epoch 969: 1.1679107423551707 2.063905013621921
Epoch 970: 1.1677195713875725 2.0433237073388213
Epoch 971: 1.17002548380566 2.087239453154267
Epoch 972: 1.1669677656803277 2.00776004658578
Epoch 973: 1.1693951222961703 2.0054310660716648
Epoch 974: 1.163591461957682 1.9957651646191144
Epoch 975: 1.1631281436776817 1.996814233291856
Epoch 976: 1.1657782005156223 2.0337180441688014
Epoch 977: 1.1613277185536075 1.9761369889578149
Epoch 978: 1.1715340660463078 2.0889472256757897
Epoch 979: 1.1643274955849614 2.0325757476315434
Epoch 980: 1.169053706321655 2.1097168551532635
Epoch 981: 1.162086973500796 1.9555235042178425
Epoch 982: 1.1659416789106982 2.0106789994605276
Epoch 983: 1.169102413154393 2.0281705562256747
Epoch 984: 1.1554108232659615 1.9228184709910991
Epoch 985: 1.1588930741118912 1.921251824994387
Epoch 986: 1.1550587111137056 1.8840277756684038
Epoch 987: 1.1638976152823144 2.025125792607056
Epoch 988: 1.1671422605177448 2.097307271135845
Epoch 989: 1.1556821951123852 1.9115550838186852
Epoch 990: 1.160829285965992 1.940623122785074
Epoch 991: 1.165446317556802 2.000186893753661
Epoch 992: 1.1718972107106718 2.076132788705562
Epoch 993: 1.163709861587923 1.9829789758500083
Epoch 994: 1.1624906108985786 2.000615035032854
Epoch 995: 1.1613760832770657 1.9794077021416954
Epoch 996: 1.1678270786062839 2.0712825199502247
Epoch 997: 1.1632112053389412 2.0243115044550732
Epoch 998: 1.1598619426016632 1.9867915874642743
Epoch 999: 1.1667041528951323 2.064176860885111

Challenge 14.2

Use SGD to train the single neuron in the previous notebook using a linearly separable set of 100 points, divided by the line $-\frac{5}{2}x+\frac{3}{2}y+3=0$



In [23]:

    
### We provide a set of randomly generated training points 
num_points = 100
w1 = -2.5
w2 = 1.5
w0 = 3.
np.random.seed(637163) # we make sure we always generate the same sequence
x_data = np.random.rand(num_points)*10.
y_data = np.random.rand(num_points)*10.
z_data = np.zeros(num_points)
for i in range(len(z_data)):
    if (y_data[i] > (-w0-w1*x_data[i])/w2):
        z_data[i] = 1.

pyplot.scatter(x_data,y_data,c=z_data,marker='o',linewidth=1.5,edgecolors='black')
pyplot.plot(x_data,(-w1*x_data-w0)/w2)
pyplot.gray()
pyplot.xlim(0,10)
pyplot.ylim(0,10);

You will need the following auxiliary functions:



In [25]:

    
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

A simple network to classify handwritten digits

Most of this section has been taken from M. Nielsen's free on-line book: "Neural Networks and Deep Learning" http://neuralnetworksanddeeplearning.com/

In this section we discuss a neural network which can solve the more interesting and difficult problem, namely, recognizing individual handwritten digits.

The input layer of the network contains neurons encoding the values of the input pixels. Our training data for the network will consist of many 28 by 28 pixel images of scanned handwritten digits, and so the input layer contains 784=28×28 neurons. The input pixels are greyscale, with a value of 0.0 representing white, a value of 1.0 representing black, and in between values representing gradually darkening shades of grey.

The second layer of the network is a hidden layer. We denote the number of neurons in this hidden layer by $n$ , and we'll experiment with different values for $n$ . The example shown illustrates a small hidden layer, containing just $n=15$ neurons.

The output layer of the network contains 10 neurons. If the first neuron fires, i.e., has an output $\sim 1$ , then that will indicate that the network thinks the digit is a 0 . If the second neuron fires then that will indicate that the network thinks the digit is a 1 . And so on. A little more precisely, we number the output neurons from 0 through 9 , and figure out which neuron has the highest activation value. If that neuron is, say, neuron number 6 , then our network will guess that the input digit was a 6 . And so on for the other output neurons.

Network to identify single digits. The output layer has 10 neurons, one for each digit.

The first thing we'll need is a data set to learn from - a so-called training data set. We'll use the MNIST data set, which contains tens of thousands of scanned images of handwritten digits, together with their correct classifications. MNIST's name comes from the fact that it is a modified subset of two data sets collected by NIST, the United States' National Institute of Standards and Technology. Here's a few images from MNIST:

The MNIST data comes in two parts. The first part contains 60,000 images to be used as training data. These images are scanned handwriting samples from 250 people, half of whom were US Census Bureau employees, and half of whom were high school students. The images are greyscale and 28 by 28 pixels in size. The second part of the MNIST data set is 10,000 images to be used as test data. Again, these are 28 by 28 greyscale images. We'll use the test data to evaluate how well our neural network has learned to recognize digits. To make this a good test of performance, the test data was taken from a different set of 250 people than the original training data (albeit still a group split between Census Bureau employees and high school students). This helps give us confidence that our system can recognize digits from people whose writing it didn't see during training.

In practice, we are going to split the data a little differently. We'll leave the test images as is, but split the 60,000-image MNIST training set into two parts: a set of 50,000 images, which we'll use to train our neural network, and a separate 10,000 image validation set.

We'll use the notation $x$ to denote a training input. It'll be convenient to regard each training input $x$ as a 28×28=784-dimensional vector. Each entry in the vector represents the grey value for a single pixel in the image. We'll denote the corresponding desired output by y=y(x) , where y is a 10 -dimensional vector. For example, if a particular training image, $x$ , depicts a 6 , then $y(x)=(0,0,0,0,0,0,1,0,0,0)^T$ is the desired output from the network. Note that T here is the transpose operation, turning a row vector into an ordinary (column) vector.



In [45]:

    
"""
mnist_loader
~~~~~~~~~~~~

A library to load the MNIST image data.  For details of the data
structures that are returned, see the doc strings for ``load_data``
and ``load_data_wrapper``.  In practice, ``load_data_wrapper`` is the
function usually called by our neural network code.
"""

#### Libraries
# Standard library
import pickle
import gzip

# Third-party libraries
import numpy as np

def load_data():
    """Return the MNIST data as a tuple containing the training data,
    the validation data, and the test data.

    The ``training_data`` is returned as a tuple with two entries.
    The first entry contains the actual training images.  This is a
    numpy ndarray with 50,000 entries.  Each entry is, in turn, a
    numpy ndarray with 784 values, representing the 28 * 28 = 784
    pixels in a single MNIST image.

    The second entry in the ``training_data`` tuple is a numpy ndarray
    containing 50,000 entries.  Those entries are just the digit
    values (0...9) for the corresponding images contained in the first
    entry of the tuple.

    The ``validation_data`` and ``test_data`` are similar, except
    each contains only 10,000 images.

    This is a nice data format, but for use in neural networks it's
    helpful to modify the format of the ``training_data`` a little.
    That's done in the wrapper function ``load_data_wrapper()``, see
    below.
    """
    f = gzip.open('data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
    f.close()
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    """Return a tuple containing ``(training_data, validation_data,
    test_data)``. Based on ``load_data``, but the format is more
    convenient for use in our implementation of neural networks.

    In particular, ``training_data`` is a list containing 50,000
    2-tuples ``(x, y)``.  ``x`` is a 784-dimensional numpy.ndarray
    containing the input image.  ``y`` is a 10-dimensional
    numpy.ndarray representing the unit vector corresponding to the
    correct digit for ``x``.

    ``validation_data`` and ``test_data`` are lists containing 10,000
    2-tuples ``(x, y)``.  In each case, ``x`` is a 784-dimensional
    numpy.ndarry containing the input image, and ``y`` is the
    corresponding classification, i.e., the digit values (integers)
    corresponding to ``x``.

    Obviously, this means we're using slightly different formats for
    the training data and the validation / test data.  These formats
    turn out to be the most convenient for use in our neural network
    code."""
    tr_d, va_d, te_d = load_data()
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
    training_results = [vectorized_result(y) for y in tr_d[1]]
    training_data = list(zip(training_inputs, training_results))
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
    validation_data = list(zip(validation_inputs, va_d[1]))
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = list(zip(test_inputs, te_d[1]))
    return (training_data, validation_data, test_data)

def vectorized_result(j):
    """Return a 10-dimensional unit vector with a 1.0 in the jth
    position and zeroes elsewhere.  This is used to convert a digit
    (0...9) into a corresponding desired output from the neural
    network."""
    e = np.zeros((10, 1))
    e[j] = 1.0
    return e

Note also that the biases and weights are stored as lists of Numpy matrices. So, for example net.weights[1] is a Numpy matrix storing the weights connecting the second and third layers of neurons. (It's not the first and second layers, since Python's list indexing starts at 0.) Since net.weights[1] is rather verbose, let's just denote that matrix $w$ . It's a matrix such that $w_{jk}$ is the weight for the connection between the $k^{th}$ neuron in the second layer, and the $j^{th}$ neuron in the third layer. This ordering of the $j$ and $k$ indices may seem strange. The big advantage of using this ordering is that it means that the vector of activations of the third layer of neurons is: $$a'=\mathrm {sigmoid}(wa+b)$$

There's quite a bit going on in this equation, so let's unpack it piece by piece. $a$ is the vector of activations of the second layer of neurons. To obtain $a'$ we multiply $a$ by the weight matrix $w$ , and add the vector $b$ of biases. We then apply the function sigmoid elementwise to every entry in the vector $wa+b$.

Of course, the main thing we want our Network objects to do is to learn. To that end we'll give them an SGD method which implements stochastic gradient descent.

Most of the work is done by the line

            delta_nabla_b, delta_nabla_w = self.backprop(x, y)

This invokes something called the backpropagation algorithm, which is a fast way of computing the gradient of the cost function. So update_mini_batch works simply by computing these gradients for every training example in the mini_batch, and then updating self.weights and self.biases appropriately.

The activation $a_{lj}$ of the $j^{th}$ neuron in the $l^{th}$ layer is related to the activations in the $(l-1)^{th}$ layer by the equation $$a^l_j=\mathrm{sigmoid}(\sum_k w_{jk}^l a^{l-1}_k+b^l_j)$$ where the sum is over all neurons $k$ in the $(l−1)^{th}$ layer. To rewrite this expression in a matrix form we define a weight matrix $w^l$ for each layer, $l$ . The entries of the weight matrix $w^l$ are just the weights connecting to the $l^{th}$ layer of neurons, that is, the entry in the $j^{th}$ row and $k^{th}$ column is $w^l_{jk}$. Similarly, for each layer $l$ we define a bias vector, $b^l$. You can probably guess how this works - the components of the bias vector are just the values $b^l_j$ , one component for each neuron in the $l^{th}$ layer. And finally, we define an activation vector $a^l$ whose components are the activations $a^l_j$.

With these notations in mind, these equations can be rewritten in the beautiful and compact vectorized form $$a^l=\mathrm{sigmoid}(w^la^{l-1}+b^l).$$ This expression gives us a much more global way of thinking about how the activations in one layer relate to activations in the previous layer: we just apply the weight matrix to the activations, then add the bias vector, and finally apply the sigmoid function.

Apart from self.backprop the program is self-explanatory - all the heavy lifting is done in self.SGD and self.update_mini_batch, which we've already discussed. The self.backprop method makes use of a few extra functions to help in computing the gradient, namely sigmoid_prime, which computes the derivative of the sigmoid function, and self.cost_derivative. You can get the gist of these (and perhaps the details) just by looking at the code and documentation strings. Note that while the program appears lengthy, much of the code is documentation strings intended to make the code easy to understand. In fact, the program contains just 74 lines of non-whitespace, non-comment code.



In [50]:

    
"""
network.py
~~~~~~~~~~

A module to implement the stochastic gradient descent learning
algorithm for a feedforward neural network.  Gradients are calculated
using backpropagation.  Note that I have focused on making the code
simple, easily readable, and easily modifiable.  It is not optimized,
and omits many desirable features.
"""

#### Libraries
# Standard library
import random

# Third-party libraries
import numpy as np

class Network(object):

    def __init__(self, sizes):
        """The list ``sizes`` contains the number of neurons in the
        respective layers of the network.  For example, if the list
        was [2, 3, 1] then it would be a three-layer network, with the
        first layer containing 2 neurons, the second layer 3 neurons,
        and the third layer 1 neuron.  The biases and weights for the
        network are initialized randomly, using a Gaussian
        distribution with mean 0, and variance 1.  Note that the first
        layer is assumed to be an input layer, and by convention we
        won't set any biases for those neurons, since biases are only
        ever used in computing the outputs from later layers."""
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, a):
        """Return the output of the network if ``a`` is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data=None):
        """Train the neural network using mini-batch stochastic
        gradient descent.  The ``training_data`` is a list of tuples
        ``(x, y)`` representing the training inputs and the desired
        outputs.  The other non-optional parameters are
        self-explanatory.  If ``test_data`` is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out.  This is useful for
        tracking progress, but slows things down substantially."""
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in range(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in range(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print ("Epoch {0}: {1} / {2}".format(
                    j, self.evaluate(test_data), n_test))
            else:
                print ("Epoch {0} complete".format(j))

    def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]

    def backprop(self, x, y):
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        # Note that the variable l in the loop below is used a little
        # differently to the notation in Chapter 2 of the book.  Here,
        # l = 1 means the last layer of neurons, l = 2 is the
        # second-last layer, and so on.  It's a renumbering of the
        # scheme in the book, used here to take advantage of the fact
        # that Python can use negative indices in lists.
        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        """Return the number of test inputs for which the neural
        network outputs the correct result. Note that the neural
        network's output is assumed to be the index of whichever
        neuron in the final layer has the highest activation."""
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations-y)

#### Miscellaneous functions
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

We first load the MNIST data:



In [51]:

    
training_data, validation_data, test_data = load_data_wrapper()

After loading the MNIST data, we'll set up a Network with 30 hidden neurons.



In [52]:

    
net = Network([784, 30, 10])

Finally, we'll use stochastic gradient descent to learn from the MNIST training_data over 30 epochs, with a mini-batch size of 10, and a learning rate of $\eta$=3.0:



In [53]:

    
net.SGD(training_data, 30, 10, 3.0, test_data=test_data)









    



Epoch 0: 9125 / 10000
Epoch 1: 9201 / 10000
Epoch 2: 9285 / 10000
Epoch 3: 9317 / 10000
Epoch 4: 9299 / 10000
Epoch 5: 9388 / 10000
Epoch 6: 9394 / 10000
Epoch 7: 9397 / 10000
Epoch 8: 9425 / 10000
Epoch 9: 9395 / 10000
Epoch 10: 9408 / 10000
Epoch 11: 9440 / 10000
Epoch 12: 9448 / 10000
Epoch 13: 9460 / 10000
Epoch 14: 9445 / 10000
Epoch 15: 9459 / 10000
Epoch 16: 9467 / 10000
Epoch 17: 9466 / 10000
Epoch 18: 9434 / 10000
Epoch 19: 9450 / 10000
Epoch 20: 9463 / 10000
Epoch 21: 9472 / 10000
Epoch 22: 9465 / 10000
Epoch 23: 9482 / 10000
Epoch 24: 9487 / 10000
Epoch 25: 9458 / 10000
Epoch 26: 9481 / 10000
Epoch 27: 9479 / 10000
Epoch 28: 9476 / 10000
Epoch 29: 9479 / 10000

Challenge 14.3

Try creating a network with just two layers - an input and an output layer, no hidden layer - with 784 and 10 neurons, respectively. Train the network using stochastic gradient descent. What classification accuracy can you achieve?

Number of hidden layers

Suppose that we want to approximate a set of functions to a given accuracy. How many hidden layers do we need? The answer is: At most two layers, with arbitrary accuracy obtained given enough units per layer. It has been also shown that only one layer is enough to approximate any continuous function. Of course, there is no way to know how many units we would need, and this is not known in general, and this number may grow exponentially with the number of input units.



In [ ]: