This jupyter notebook contains a working example of running stochastic gradient descent on MNIST using dreaml. It performs PCA and genereates random kitchen sink features while running mini-batched stochastic gradient descent. You can adjust the hyperparameters (i.e. step size) or generate more features and observe the resulting performance.


In [1]:
# Import libraries
import cPickle, gzip
import numpy as np
from time import sleep
import dreaml as dm
from dreaml.server import start
from dreaml.loss import Softmax
import dreaml.transformations as trans


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-ec6306561b8a> in <module>()
      3 import numpy as np
      4 from time import sleep
----> 5 import dreaml as dm
      6 from dreaml.server import start
      7 from dreaml.loss import Softmax

ImportError: No module named dreaml

In [ ]:
# Load data from files
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
n_train=1000
n_test=100
X_train = train_set[0][0:n_train,:]
y_train = train_set[1][0:n_train,None]
X_test = valid_set[0][0:n_train,:]
y_test = valid_set[1][0:n_train,None]

First, we initialize the dataframe and start the web frontend for visualization.


In [ ]:
df = dm.DataFrame()
start(df)

Here, we load the data into the dataframe.


In [ ]:
df["data/train/", "input/raw/"] = dm.DataFrame.from_matrix(X_train)
df["data/train/", "input/label/"] = dm.DataFrame.from_matrix(y_train)
df["data/test/", "input/raw/"] = dm.DataFrame.from_matrix(X_test)
df["data/test/", "input/label/"] = dm.DataFrame.from_matrix(y_test)

Next, we run PCA on the input. These are placed within the features folder. Note that the PCA transformation creates the PCA basis vectors as part of the subroutine. These are stored in an automatically generated directory.


In [ ]:
df["data/", "features/pca/"] = trans.PCA(df["data/train/", "input/raw/"], 
                                         df["data/","input/raw/"],
                                         num_bases=50)

From the PCA features, we also generate an initial set of 1000 kitchen sink features.


In [ ]:
df["data/", "features/ks1/"] = trans.KitchenSinks(df["data/","features/pca/"],
                                                  num_features=1000)

Here, we start the stochastic gradient descent process using Softmax loss.


In [ ]:
df["weights/", "features/"] = trans.SGD(Softmax,
                                      np.zeros((50,1000)),
                                      df["data/train/", "features/"],
                                      df["data/train/","input/label/"],
                                      batch_size=50,
                                      reg=0.01,
                                      step_size=1)

Let's compute some metrics on each datapoint to evaluate the progress of our model. In this case, softmax loss and multi-classification error.


In [ ]:
df["data/","metrics/"] = trans.Metrics([Softmax.f_vec, Softmax.err],
                                       df["weights/", "features/"],
                                       df["data/", "features/"],
                                       df["data/", "input/label/"],
                                       reg=0.01,
                                       metrics_names=["SoftmaxLoss",
                                                      "MulticlassError"])

We can plot a sequence of evaluations of arbitrary functions f that return a pair of lists (ys,xs) of numbers to plots. In this case, we compute the average softmax loss of all the training examples over the number of iterations, and the [trainerr,testerr] also over the number of iterations. These plots show up on the web frontend.


In [ ]:
def softmax_average():
  metrics = df["data/","metrics/"].get_matrix()
  n = df["data/train/","metrics/"].shape()[0]
  niters = df["weights/", "features/"].T().niters
  return ([np.mean(metrics[0:n,0])],[niters])

def traintest_average():
  metrics = df["data/","metrics/"].get_matrix()
  n = df["data/train/","metrics/"].shape()[0]
  niters = df["weights/", "features/"].T().niters
  return ([np.mean(metrics[0:n,1]),np.mean(metrics[n+1:,1])],[niters,niters])

df["plot/","loss/"] = dm.Plotter(softmax_average,
                                 "objective loss",
                                 legend=["softmax"])

df["plot/","err/"] = dm.Plotter(traintest_average,
                                "train and test err",
                                legend=["train","test"],
                                colors=["blue","green"])

Now that we have our plots set up, we can make changes to the model and see how dreaml reactively implements these changes in real-time!

Example 1: You might have noticed that the training and testing error is all over the place. The following code retrieves the SGD transformation and changes the value of the step size being taken.


In [ ]:
df["weights/", "features/"].T().step_size = 1e-2

Example 2: we can continue to generate more random kitchen sink features. Try it and see how it affects the model's performance. Note that all existing transformations see the change and restart accordingly, if needed.


In [ ]:
df["data/", "features/ks2/"] = trans.KitchenSinks(df["data/","features/pca/"],
                                                  num_features=1000)

In [ ]:
df["data/", "features/ks3/"] = trans.KitchenSinks(df["data/","features/pca/"],
                                                  num_features=1000)

In [ ]:
df["data/", "features/ks4/"] = trans.KitchenSinks(df["data/","features/pca/"],
                                                  num_features=1000)

In [ ]: