graphAttack Tutorial #01

Introduction

This tutorial demonstrates the basic graph framework used in graphAttack and applies it to a simple linear model

You should be familiar with basic linear algebra, Python and the Jupyter Notebook editor. It also helps if you have a basic understanding of Machine Learning and classification.

Imports


In [1]:
import numpy as np
import graphAttack as ga

This was developed using Python 3.6.1

Nodes

graphAttack uses Nodes to store data instead of normal python variables, the Nodes are classified into

Variables

Those objects are used to store numerical data to be fed into the graph. They use numpy arrays to store data. Here we initialize some variables:


In [2]:
a = np.array([2])
b = np.array([3])

aVar = ga.Variable(a)
bVar = ga.Variable(b)

We can now extract the stored data by calling:


In [3]:
aData = aVar.getValue()
print(aVar.getValue())


[2]

and print the information about a variable by:


In [4]:
print(aVar)


<Variable with outputs: ()>

Operations

Operations usually take as input other Nodes (Variables or Operations) and can be used to extract the output of underlying mathematical operation. Lets start with simple addition:


In [5]:
addOp = ga.AddOperation(aVar, bVar)

Again, we can obtain the value of the operation by .getValue() as well as some information about it by printing it:


In [6]:
print("2 + 3 = ", addOp.getValue())
print("Details of addOp:", addOp)


2 + 3 =  [5]
Details of addOp: <AddOperation with inputs: (Variable, Variable) and outputs: ()>

Defining operations could also be done using operators:


In [7]:
addOp2 = aVar + bVar
print("2 + 3 = ", addOp2.getValue())
print("Details of addOp2:", addOp2)


2 + 3 =  [5]
Details of addOp2: <AddOperation with inputs: (Variable, Variable) and outputs: ()>

We could chain operations together to get something slightly more complicated


In [8]:
aVar = ga.Variable(np.array([2]))
bVar = ga.Variable(np.array([3]))
cVar = ga.Variable(np.array([4]))

addOp = ga.AddOperation(aVar, bVar)
multOp = ga.MultiplyOperation(addOp, cVar)

print("(2 + 3) * 4 = ", multOp.getValue())


(2 + 3) * 4 =  [20]

Again, same could be achieved with operators:


In [9]:
aVar = ga.Variable(np.array([2]))
bVar = ga.Variable(np.array([3]))
cVar = ga.Variable(np.array([4]))

addOp = aVar + bVar
multOp = addOp * cVar

print("(2 + 3) * 4 = ", multOp.getValue())


(2 + 3) * 4 =  [20]

In [10]:
_ = """
The complete list of supported operators is:

+ : AddOperation
@ : MatMatmulOperation
* : MultiplyOperation
/ : DivideOperation
"""

Differentiation

graphAttack supports automatic differentiation to ease up the burden of optimising your parameters, lets look at our previous example, however this time, we could ask for a derivative of one of the Variables with respect to the final operation of the graph:


In [11]:
aVar = ga.Variable(np.array([2]))
bVar = ga.Variable(np.array([3]))
cVar = ga.Variable(np.array([4]))

addOp = aVar + bVar
multOp = addOp * cVar

print("(2 + 3) * 4 = ", multOp.getValue())
print("The derivative with respect to cVar is:", cVar.getGradient())


(2 + 3) * 4 =  [20]
The derivative with respect to cVar is: [ 5.]

Each time we ask for a gradient, it looks for the derivatives up the operations chain untill it reaches a dead-end and then uses a chain rule to propagate the gradients backwards through the whole graph. Normally we would ask for derivatives of variables but we also could find the derivative of an individual operations as long as we provide the input with respect to which we want the derivation to be made.


In [12]:
print("The derivative of multOp respect to cVar is:", multOp.getGradient(cVar))


The derivative of multOp respect to cVar is: [ 5.]

Graph

Quite an important part of this library is the graph object, it helps store and organise your operations, it can be initialized by


In [13]:
mainGraph = ga.Graph()

Now we can add more operations to it using the .addOperation method, it has a couple of parameters:


In [14]:
def addOperationDUMMY(self,
                     operation,
                     doGradient=False,
                     finalOperation=False,
                     feederOperation=False):
    """Add an operation to the graph
    
    Parameters
    ----------
    operation : ga.Operation
        Operation or Variable to be added
    doGradient : bool
        When true, calculate the derivative of the final operation with respect to this
        operation when self.getGradients() is called
    finalOperation : bool
        When true, specify this operation as final of the graph
    feederOperation : bool
        When true, specify this operation as the data feeder operation
        
    Returns
    -------
    ga.Operation
        the handle for added operation
        
    Raises
    ------
    ValueError
        Graph can only provide gradients with respect to variables!
        Only variables can be feeders
    """

We could compose our previous example using a graph


In [15]:
mainGraph = ga.Graph(printWarnings = False)

a = np.array([2])
b = np.array([3])
c = np.array([4])

aVar = ga.Variable(a)

aVar = mainGraph.addOperation(ga.Variable(a), doGradient=True)
bVar = mainGraph.addOperation(ga.Variable(b), doGradient=True)
addOp = mainGraph.addOperation(aVar + bVar)
cVar = mainGraph.addOperation(ga.Variable(c), doGradient=True)
multOp = mainGraph.addOperation(addOp * cVar, finalOperation=True)

print(mainGraph)


Computation Graph:
<op0-Variable with outputs: (op2-AddOperation, )>
<op1-Variable with outputs: (op2-AddOperation, )>
<op2-AddOperation with inputs: (op0-Variable, op1-Variable) and outputs: (op4-MultiplyOperation, )>
<op3-Variable with outputs: (op4-MultiplyOperation, )>
<op4-MultiplyOperation with inputs: (op2-AddOperation, op3-Variable) and outputs: ()>

Do not worry too much about the hidden warnings for now. We can now ask for Value of the final operation and gradients with respect to the specified variables:


In [16]:
print("Final value of (2 + 3) * 4 = ", mainGraph.getValue())
print("Gradients are:")
for g in mainGraph.getGradients():
    print(g)


Final value of (2 + 3) * 4 =  [20]
Gradients are:
(0, 'op0-Variable', array([ 4.]))
(1, 'op1-Variable', array([ 4.]))
(3, 'op3-Variable', array([ 5.]))

Linear Regression

Lets try to use the library to run a simple linear regression model that will extract the function: y = 2 x1 + 3 x2 + 4

Initially we need to prepare the dataset in the rights shapes, the shapes are important and it is worth to keep track of them since often in python column vectors end up being row vectors


In [17]:
x = np.random.random((100, 2))
y = (x[:, 0] * 2 + x[:, 1] * 3 + 4).reshape(-1, 1)

Next, we need to compose a graph that will represent our model


In [18]:
mainGraph = ga.Graph()

w = np.random.random((1, 2)).T
b = np.random.random(1)

print(w, b)
feed = mainGraph.addOperation(ga.Variable(x), feederOperation=True)
wop = mainGraph.addOperation(ga.Variable(w), doGradient=True)
matmulop = mainGraph.addOperation(feed @ wop)
bop = mainGraph.addOperation(ga.Variable(b), doGradient=True)
addop = mainGraph.addOperation(matmulop + bop)
costop = mainGraph.addOperation(ga.QuadraticCostOperation(addop, y), finalOperation=True)

print(mainGraph)


[[ 0.14455163]
 [ 0.63695709]] [ 0.39654949]
Computation Graph:
<op0-Variable with outputs: (op2-MatMatmulOperation, )>
<op1-Variable with outputs: (op2-MatMatmulOperation, )>
<op2-MatMatmulOperation with inputs: (op0-Variable, op1-Variable) and outputs: (op4-AddOperation, )>
<op3-Variable with outputs: (op4-AddOperation, )>
<op4-AddOperation with inputs: (op2-MatMatmulOperation, op3-Variable) and outputs: (op5-QuadraticCostOperation, )>
<op5-QuadraticCostOperation with input: (op4-AddOperation) and outputs: ()>

We can find the cost and gradients with respect to our parameters by:


In [19]:
print("Cost:", mainGraph.getValue())
print(costop.nExamples)
print("Gradients are:")
for g in mainGraph.getGradients():
    print(g)


Cost: 17.1062476698
100
Gradients are:
(1, 'op1-Variable', array([[-3.43530367],
       [-2.95852954]]))
(3, 'op3-Variable', array([-5.78538111]))

We can figure out that the random parameters are not terrific, in order to optimize them we need to construct a function that would return cost and flattened gradients so that we can feed it into a minimizer of choice, the code below is explained soon


In [20]:
def fprime(p, data, labels):
    mainGraph.feederOperation.assignData(data)
    mainGraph.resetAll()
    mainGraph.finalOperation.assignLabels(labels)
    mainGraph.attachParameters(p)
    c = mainGraph.getValue()
    mainGraph.getGradients()
    g = mainGraph.unrollGradients()
    return c, g

First, function parameters: fprime takes as input a flattened vector of parameters, the data and labels. While data and labels are quite self-explanatory, lets discuss the parameters vector. It is a flattened vector containing all the weights and biases. It can be extracted from a graph by calling mainGraph.unrollGraientParameters() and attached to a graph (values of weights/biases variables updated) by mainGraph.attachParameters(p).


In [21]:
print("Current parameters", wop.getValue(), bop.getValue())
param0 = mainGraph.unrollGradientParameters()
print("Unrolled parameters", param0)


Current parameters [[ 0.14455163]
 [ 0.63695709]] [ 0.39654949]
Unrolled parameters [ 0.14455163  0.63695709  0.39654949]

Inside the function, we first assign the data to the feeder Variable and reset the values of all operations (not Variables) so that the shapes will be updated and flow can be re-computed later. Next, we assign labels to the cost Variable, attach provided parameters. Later the final cost and gradients is extracted from the graph with the gradients being unrolled to match the parameters vector


In [22]:
print("Gradients are:")
for g in mainGraph.getGradients():
    print(g)
print("Unrolled gradients", mainGraph.unrollGradients())


Gradients are:
(1, 'op1-Variable', array([[-3.43530367],
       [-2.95852954]]))
(3, 'op3-Variable', array([-5.78538111]))
Unrolled gradients [-3.43530367 -2.95852954 -5.78538111]

We can pass our function to a minimizer of choice, for now lets use the one in scipy


In [23]:
import scipy.optimize
res = scipy.optimize.minimize(fprime,
                              param0,
                              args=(x, y),
                              method="BFGS",
                              jac=True)
print(res)


      fun: 3.264686774974691e-11
 hess_inv: array([[ 12.83002729,  -0.39440031,  -6.97098302],
       [ -0.39440031,  11.93526501,  -5.3375061 ],
       [ -6.97098302,  -5.3375061 ,   7.24783374]])
      jac: array([ -3.56653015e-06,  -2.85192804e-06,  -7.39635625e-06])
  message: 'Optimization terminated successfully.'
     nfev: 12
      nit: 9
     njev: 12
   status: 0
  success: True
        x: array([ 2.00000811,  3.00000801,  3.99998417])

SGD

In ML models, it is often useful to use stochastic gradient descent as your minimizer due to large number of data points. GraphAttack comes with a simple implementation of the Adam method that is considered quite robust. The details of the implementation can be found in https://arxiv.org/pdf/1412.6980.pdf

Here we will simply use the minimizer, it can be initialized by


In [24]:
adamGrad = ga.adaptiveSGD(trainingData=x,
                          trainingLabels=y,
                          param0=param0,
                          epochs=1e3,
                          miniBatchSize=5,
                          initialLearningRate=1e-3,
                          beta1=0.9,
                          beta2=0.999,
                          epsilon=1e-8,
                          testFrequency=1e1,
                          function=fprime)

With the explanation below:


In [25]:
_ = """
Class that holds most of the functionalities of the
adaptive SGD, currently using the ADAM algorightm
Attributes
----------
trainDataset : np.array
    features for each example
trainLabels : np.array
    labels for each example
param0 : np.array
    Initial parameters
epochs : int
    number of epochs to run the minimizer
miniBatchSize : int
    size of the mini batch

initialLearningRate : flooat
    Initial learning rate (typical choice: 1e-3)
beta1 : float
    Beta1 Adam paramater (typical choice: 0.9)
beta2 : float
    Beta2 Adam parameter (typical choice: 0.999)
epsilon : float
    epsilon Adam parameter (typical choice: 1e-8)

testFrequency : int
    How many minibatches to average the cost over for testing
function : float
    Function to minimize that is of form
    (cost, gradient) = function(params, FeaturesMatrix, LabelsMatrix)
"""

The mineralization is performed by calling


In [26]:
params = adamGrad.minimize(printTrainigCost=True)
print(params)


Mibatch: 0 out of 20 from epoch: 0 out of 1000, iterCost is: 8.230740e-03
Mibatch: 0 out of 20 from epoch: 100 out of 1000, iterCost is: 8.600308e+00
Mibatch: 0 out of 20 from epoch: 200 out of 1000, iterCost is: 1.032220e+00
Mibatch: 0 out of 20 from epoch: 300 out of 1000, iterCost is: 4.926280e-02
Mibatch: 0 out of 20 from epoch: 400 out of 1000, iterCost is: 2.071159e-02
Mibatch: 0 out of 20 from epoch: 500 out of 1000, iterCost is: 6.539681e-03
Mibatch: 0 out of 20 from epoch: 600 out of 1000, iterCost is: 5.052960e-04
Mibatch: 0 out of 20 from epoch: 700 out of 1000, iterCost is: 1.563774e-06
Mibatch: 0 out of 20 from epoch: 800 out of 1000, iterCost is: 7.268542e-13
Mibatch: 0 out of 20 from epoch: 900 out of 1000, iterCost is: 2.565849e-30
[ 2.  3.  4.]

License (MIT)

MIT License

Copyright (c) 2017 Jacek Golebiowski

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.