This tutorial demonstrates the basic graph framework used in graphAttack and applies it to a simple linear model
You should be familiar with basic linear algebra, Python and the Jupyter Notebook editor. It also helps if you have a basic understanding of Machine Learning and classification.
In [1]:
import numpy as np
import graphAttack as ga
This was developed using Python 3.6.1
graphAttack uses Nodes to store data instead of normal python variables, the Nodes are classified into
Those objects are used to store numerical data to be fed into the graph. They use numpy arrays to store data. Here we initialize some variables:
In [2]:
a = np.array([2])
b = np.array([3])
aVar = ga.Variable(a)
bVar = ga.Variable(b)
We can now extract the stored data by calling:
In [3]:
aData = aVar.getValue()
print(aVar.getValue())
and print the information about a variable by:
In [4]:
print(aVar)
Operations usually take as input other Nodes (Variables or Operations) and can be used to extract the output of underlying mathematical operation. Lets start with simple addition:
In [5]:
addOp = ga.AddOperation(aVar, bVar)
Again, we can obtain the value of the operation by .getValue() as well as some information about it by printing it:
In [6]:
print("2 + 3 = ", addOp.getValue())
print("Details of addOp:", addOp)
Defining operations could also be done using operators:
In [7]:
addOp2 = aVar + bVar
print("2 + 3 = ", addOp2.getValue())
print("Details of addOp2:", addOp2)
We could chain operations together to get something slightly more complicated
In [8]:
aVar = ga.Variable(np.array([2]))
bVar = ga.Variable(np.array([3]))
cVar = ga.Variable(np.array([4]))
addOp = ga.AddOperation(aVar, bVar)
multOp = ga.MultiplyOperation(addOp, cVar)
print("(2 + 3) * 4 = ", multOp.getValue())
Again, same could be achieved with operators:
In [9]:
aVar = ga.Variable(np.array([2]))
bVar = ga.Variable(np.array([3]))
cVar = ga.Variable(np.array([4]))
addOp = aVar + bVar
multOp = addOp * cVar
print("(2 + 3) * 4 = ", multOp.getValue())
In [10]:
_ = """
The complete list of supported operators is:
+ : AddOperation
@ : MatMatmulOperation
* : MultiplyOperation
/ : DivideOperation
"""
graphAttack supports automatic differentiation to ease up the burden of optimising your parameters, lets look at our previous example, however this time, we could ask for a derivative of one of the Variables with respect to the final operation of the graph:
In [11]:
aVar = ga.Variable(np.array([2]))
bVar = ga.Variable(np.array([3]))
cVar = ga.Variable(np.array([4]))
addOp = aVar + bVar
multOp = addOp * cVar
print("(2 + 3) * 4 = ", multOp.getValue())
print("The derivative with respect to cVar is:", cVar.getGradient())
Each time we ask for a gradient, it looks for the derivatives up the operations chain untill it reaches a dead-end and then uses a chain rule to propagate the gradients backwards through the whole graph. Normally we would ask for derivatives of variables but we also could find the derivative of an individual operations as long as we provide the input with respect to which we want the derivation to be made.
In [12]:
print("The derivative of multOp respect to cVar is:", multOp.getGradient(cVar))
Quite an important part of this library is the graph object, it helps store and organise your operations, it can be initialized by
In [13]:
mainGraph = ga.Graph()
Now we can add more operations to it using the .addOperation method, it has a couple of parameters:
In [14]:
def addOperationDUMMY(self,
operation,
doGradient=False,
finalOperation=False,
feederOperation=False):
"""Add an operation to the graph
Parameters
----------
operation : ga.Operation
Operation or Variable to be added
doGradient : bool
When true, calculate the derivative of the final operation with respect to this
operation when self.getGradients() is called
finalOperation : bool
When true, specify this operation as final of the graph
feederOperation : bool
When true, specify this operation as the data feeder operation
Returns
-------
ga.Operation
the handle for added operation
Raises
------
ValueError
Graph can only provide gradients with respect to variables!
Only variables can be feeders
"""
We could compose our previous example using a graph
In [15]:
mainGraph = ga.Graph(printWarnings = False)
a = np.array([2])
b = np.array([3])
c = np.array([4])
aVar = ga.Variable(a)
aVar = mainGraph.addOperation(ga.Variable(a), doGradient=True)
bVar = mainGraph.addOperation(ga.Variable(b), doGradient=True)
addOp = mainGraph.addOperation(aVar + bVar)
cVar = mainGraph.addOperation(ga.Variable(c), doGradient=True)
multOp = mainGraph.addOperation(addOp * cVar, finalOperation=True)
print(mainGraph)
Do not worry too much about the hidden warnings for now. We can now ask for Value of the final operation and gradients with respect to the specified variables:
In [16]:
print("Final value of (2 + 3) * 4 = ", mainGraph.getValue())
print("Gradients are:")
for g in mainGraph.getGradients():
print(g)
Lets try to use the library to run a simple linear regression model that will extract the function: y = 2 x1 + 3 x2 + 4
Initially we need to prepare the dataset in the rights shapes, the shapes are important and it is worth to keep track of them since often in python column vectors end up being row vectors
In [17]:
x = np.random.random((100, 2))
y = (x[:, 0] * 2 + x[:, 1] * 3 + 4).reshape(-1, 1)
Next, we need to compose a graph that will represent our model
In [18]:
mainGraph = ga.Graph()
w = np.random.random((1, 2)).T
b = np.random.random(1)
print(w, b)
feed = mainGraph.addOperation(ga.Variable(x), feederOperation=True)
wop = mainGraph.addOperation(ga.Variable(w), doGradient=True)
matmulop = mainGraph.addOperation(feed @ wop)
bop = mainGraph.addOperation(ga.Variable(b), doGradient=True)
addop = mainGraph.addOperation(matmulop + bop)
costop = mainGraph.addOperation(ga.QuadraticCostOperation(addop, y), finalOperation=True)
print(mainGraph)
We can find the cost and gradients with respect to our parameters by:
In [19]:
print("Cost:", mainGraph.getValue())
print(costop.nExamples)
print("Gradients are:")
for g in mainGraph.getGradients():
print(g)
We can figure out that the random parameters are not terrific, in order to optimize them we need to construct a function that would return cost and flattened gradients so that we can feed it into a minimizer of choice, the code below is explained soon
In [20]:
def fprime(p, data, labels):
mainGraph.feederOperation.assignData(data)
mainGraph.resetAll()
mainGraph.finalOperation.assignLabels(labels)
mainGraph.attachParameters(p)
c = mainGraph.getValue()
mainGraph.getGradients()
g = mainGraph.unrollGradients()
return c, g
First, function parameters: fprime takes as input a flattened vector of parameters, the data and labels. While data and labels are quite self-explanatory, lets discuss the parameters vector. It is a flattened vector containing all the weights and biases. It can be extracted from a graph by calling mainGraph.unrollGraientParameters() and attached to a graph (values of weights/biases variables updated) by mainGraph.attachParameters(p).
In [21]:
print("Current parameters", wop.getValue(), bop.getValue())
param0 = mainGraph.unrollGradientParameters()
print("Unrolled parameters", param0)
Inside the function, we first assign the data to the feeder Variable and reset the values of all operations (not Variables) so that the shapes will be updated and flow can be re-computed later. Next, we assign labels to the cost Variable, attach provided parameters. Later the final cost and gradients is extracted from the graph with the gradients being unrolled to match the parameters vector
In [22]:
print("Gradients are:")
for g in mainGraph.getGradients():
print(g)
print("Unrolled gradients", mainGraph.unrollGradients())
We can pass our function to a minimizer of choice, for now lets use the one in scipy
In [23]:
import scipy.optimize
res = scipy.optimize.minimize(fprime,
param0,
args=(x, y),
method="BFGS",
jac=True)
print(res)
In ML models, it is often useful to use stochastic gradient descent as your minimizer due to large number of data points. GraphAttack comes with a simple implementation of the Adam method that is considered quite robust. The details of the implementation can be found in https://arxiv.org/pdf/1412.6980.pdf
Here we will simply use the minimizer, it can be initialized by
In [24]:
adamGrad = ga.adaptiveSGD(trainingData=x,
trainingLabels=y,
param0=param0,
epochs=1e3,
miniBatchSize=5,
initialLearningRate=1e-3,
beta1=0.9,
beta2=0.999,
epsilon=1e-8,
testFrequency=1e1,
function=fprime)
With the explanation below:
In [25]:
_ = """
Class that holds most of the functionalities of the
adaptive SGD, currently using the ADAM algorightm
Attributes
----------
trainDataset : np.array
features for each example
trainLabels : np.array
labels for each example
param0 : np.array
Initial parameters
epochs : int
number of epochs to run the minimizer
miniBatchSize : int
size of the mini batch
initialLearningRate : flooat
Initial learning rate (typical choice: 1e-3)
beta1 : float
Beta1 Adam paramater (typical choice: 0.9)
beta2 : float
Beta2 Adam parameter (typical choice: 0.999)
epsilon : float
epsilon Adam parameter (typical choice: 1e-8)
testFrequency : int
How many minibatches to average the cost over for testing
function : float
Function to minimize that is of form
(cost, gradient) = function(params, FeaturesMatrix, LabelsMatrix)
"""
The mineralization is performed by calling
In [26]:
params = adamGrad.minimize(printTrainigCost=True)
print(params)
MIT License
Copyright (c) 2017 Jacek Golebiowski
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.