Caffe2 provides a high-level interface that helps creating Python ops. Let's consider the following example:
In [19]:
from caffe2.python import core, workspace
import numpy as np
def f(inputs, outputs):
outputs[0].feed(2 * inputs[0].data)
workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f)(["x"], ["y"])
workspace.FeedBlob("x", np.array([3.]))
workspace.RunNetOnce(net)
print(workspace.FetchBlob("y"))
As seen in the example, net.Python() function returns a callable that can be used just like any other operator. In this example, we add a new Python operator to the net with input "x" and output "y". Note that you can save the output of net.Python() and call it multiple times to add multiple Python operators (with possibly different inputs and outputs).
Let's take a closer look at net.Python() function and a corresponding body of a new Python operator (f). Every time net.Python(f) is called it serializes a given function f and saves it in a global registry under a known key (token, passed to a PythonOp as an argument). After this, net.Python() returns a lambda that accepts positional and keyword arguments (typically inputs, outputs and extra arguments) and attaches a new Python operator to the net that calls function f on a given list of inputs and outputs.
Python operator's function (f) expects two positional arguments: a list of inputs and a list of outputs. When an operator is executed it transparently converts Caffe2 blobs into the elements of these lists. In case of CPU tensor blobs, these blobs are converted into TensorCPU objects that act as wrappers around Numpy arrays. Let's take a closer look at a relationship between Caffe2 CPU tensor, Python's TensorCPU object and a Numpy array:
In [27]:
def f_reshape(inputs, outputs):
outputs[0].reshape(inputs[0].shape)
outputs[0].data[...] = 2 * inputs[0].data
workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f_reshape)(["x"], ["z"])
workspace.FeedBlob("x", np.array([3.]))
workspace.RunNetOnce(net)
print(workspace.FetchBlob("z"))
This example works correctly because "reshape" method updates an underlying Caffe2 tensor and a subsequent call to the ".data" property returns a Numpy array that shares memory with a Caffe2 tensor. The last line in "f_reshape" copies data into the shared memory location.
There're several additional arguments that net.Python() accepts. When pass_workspace=True is passed, a workspace is passed to an operator's Python function:
In [28]:
def f_workspace(inputs, outputs, workspace):
outputs[0].feed(2 * workspace.blobs["x"].fetch())
workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f_workspace, pass_workspace=True)([], ["y"])
workspace.FeedBlob("x", np.array([3.]))
workspace.RunNetOnce(net)
print(workspace.FetchBlob("y"))
Another important net.Python() argument is "grad_f" - a Python function for a corresponding gradient operator:
In [29]:
def f(inputs, outputs):
outputs[0].reshape(inputs[0].shape)
outputs[0].data[...] = inputs[0].data * 2
def grad_f(inputs, outputs):
# Ordering of inputs is [fwd inputs, outputs, grad_outputs]
grad_output = inputs[2]
grad_input = outputs[0]
grad_input.reshape(grad_output.shape)
grad_input.data[...] = grad_output.data * 2
workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f, grad_f)(["x"], ["y"])
workspace.FeedBlob("x", np.array([3.]))
net.AddGradientOperators(["y"])
workspace.RunNetOnce(net)
print(workspace.FetchBlob("x_grad"))
When net.Python() is called with a gradient function specified, it also registers a serialized gradient function that is used by a corresponding gradient Python operator (PythonGradient). This operator executes a gradient function that expects two arguments - input and output lists. The input list argument contains all forward function inputs, followed by all of its outputs, followed by the gradients of forward function outputs. The output list contains the gradients of forward function inputs. Note: net.Python()'s grad_output_indices/grad_input_indices allow specifying indices of gradient output/input blobs that gradient function reads/writes to.
PythonOp implementation is CPU specific, it uses Numpy arrays that expect CPU memory storage. In order to be able to use a Python operator with GPU tensors, we define a CUDA version of PythonOp using GPUFallbackOp. This operator wraps a CPU-operator and adds GPU-to-CPU (and opposite direction) copy operations. Thus, when using a PythonOp with a CUDA device option, all input CUDA tensors are automatically copied to CPU memory and all CPU output tensors are copied back to GPU.