Python Op Tutorial

In this tutorial we cover the Python operator that allows writing Caffe2 operators using Python, we also discuss some of the underlying implementation details.

Forward Python Operator, Net.Python() Interface

Caffe2 provides a high-level interface that helps creating Python ops. Let's consider the following example:



In [19]:

    
from caffe2.python import core, workspace
import numpy as np

def f(inputs, outputs):
    outputs[0].feed(2 * inputs[0].data)

workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f)(["x"], ["y"])
workspace.FeedBlob("x", np.array([3.]))
workspace.RunNetOnce(net)
print(workspace.FetchBlob("y"))









    



[ 6.]

As seen in the example, net.Python() function returns a callable that can be used just like any other operator. In this example, we add a new Python operator to the net with input "x" and output "y". Note that you can save the output of net.Python() and call it multiple times to add multiple Python operators (with possibly different inputs and outputs).

Let's take a closer look at net.Python() function and a corresponding body of a new Python operator (f). Every time net.Python(f) is called it serializes a given function f and saves it in a global registry under a known key (token, passed to a PythonOp as an argument). After this, net.Python() returns a lambda that accepts positional and keyword arguments (typically inputs, outputs and extra arguments) and attaches a new Python operator to the net that calls function f on a given list of inputs and outputs.

Python operator's function (f) expects two positional arguments: a list of inputs and a list of outputs. When an operator is executed it transparently converts Caffe2 blobs into the elements of these lists. In case of CPU tensor blobs, these blobs are converted into TensorCPU objects that act as wrappers around Numpy arrays. Let's take a closer look at a relationship between Caffe2 CPU tensor, Python's TensorCPU object and a Numpy array:

Conversion between C++ tensor objects and Numpy objects happens automatically and is handled by PyBind library.
When generating a TensorCPU wrapper, a new Numpy array object is created which shares the same memory storage as a corresponding Caffe2 CPU tensor. This Numpy array is accessible in Python as a .data property of a TensorCPU object.
Note that, although Numpy array and Caffe2 tensor might share the same storage, other tensor data (e.g. shape) of Caffe2 tensor is stored separately from a Numpy array. Furthermore, Numpy may copy and reallocate its array to a different location in memory (e.g. when we try to resize an array) during operator's function execution. It's important to keep that in mind when writing a Python operator's code to ensure that Caffe2 and Numpy output tensors are in sync.
TensorCPU's feed method accepts a Numpy tensor, resizes an underying Caffe2 tensor and copies Numpy's tensor data into a Caffe2 tensor.
Another way to ensure that Caffe2's output tensor is properly set is to call reshape function on a corresponding TensorCPU output and then copy data in Python to the output's .data tensor, e.g.:



In [27]:

    
def f_reshape(inputs, outputs):
    outputs[0].reshape(inputs[0].shape)
    outputs[0].data[...] = 2 * inputs[0].data

workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f_reshape)(["x"], ["z"])
workspace.FeedBlob("x", np.array([3.]))
workspace.RunNetOnce(net)
print(workspace.FetchBlob("z"))









    



[ 6.]

This example works correctly because "reshape" method updates an underlying Caffe2 tensor and a subsequent call to the ".data" property returns a Numpy array that shares memory with a Caffe2 tensor. The last line in "f_reshape" copies data into the shared memory location.

There're several additional arguments that net.Python() accepts. When pass_workspace=True is passed, a workspace is passed to an operator's Python function:



In [28]:

    
def f_workspace(inputs, outputs, workspace):
    outputs[0].feed(2 * workspace.blobs["x"].fetch())

workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f_workspace, pass_workspace=True)([], ["y"])
workspace.FeedBlob("x", np.array([3.]))
workspace.RunNetOnce(net)
print(workspace.FetchBlob("y"))









    



[ 6.]

Gradient Python Operator

Another important net.Python() argument is "grad_f" - a Python function for a corresponding gradient operator:



In [29]:

    
def f(inputs, outputs):
            outputs[0].reshape(inputs[0].shape)
            outputs[0].data[...] = inputs[0].data * 2

def grad_f(inputs, outputs):
    # Ordering of inputs is [fwd inputs, outputs, grad_outputs]
    grad_output = inputs[2]

    grad_input = outputs[0]
    grad_input.reshape(grad_output.shape)
    grad_input.data[...] = grad_output.data * 2

workspace.ResetWorkspace()
net = core.Net("tutorial")
net.Python(f, grad_f)(["x"], ["y"])
workspace.FeedBlob("x", np.array([3.]))
net.AddGradientOperators(["y"])
workspace.RunNetOnce(net)
print(workspace.FetchBlob("x_grad"))









    



[ 2.]

When net.Python() is called with a gradient function specified, it also registers a serialized gradient function that is used by a corresponding gradient Python operator (PythonGradient). This operator executes a gradient function that expects two arguments - input and output lists. The input list argument contains all forward function inputs, followed by all of its outputs, followed by the gradients of forward function outputs. The output list contains the gradients of forward function inputs. Note: net.Python()'s grad_output_indices/grad_input_indices allow specifying indices of gradient output/input blobs that gradient function reads/writes to.

Note on GPU tensors:

PythonOp implementation is CPU specific, it uses Numpy arrays that expect CPU memory storage. In order to be able to use a Python operator with GPU tensors, we define a CUDA version of PythonOp using GPUFallbackOp. This operator wraps a CPU-operator and adds GPU-to-CPU (and opposite direction) copy operations. Thus, when using a PythonOp with a CUDA device option, all input CUDA tensors are automatically copied to CPU memory and all CPU output tensors are copied back to GPU.