Using a single memory pool for Cupy and PyTorch or TensorFlow

Requesting memory from a GPU device directly is expensive, so most deep learning libraries will over-allocate, and maintain an internal pool of memory they will keep a hold of, instead of returning it back to the device. This means the libraries don't by default play well together: they all expect to be the single consumer of the GPU memory, so they hog it selfishly. If you use two frameworks together, you can get unexpected out-of-memory errors.

Thinc's internal models use cupy for GPU operations, and cupy offers a nice solution for this problem. You can provide cupy with a custom memory allocation function, which allows us to route cupy's memory requests via another library. This avoids the memory problem when you use PyTorch and cupy together, or when you use cupy and Tensorflow together. We don't yet have a similar solution for using PyTorch and Tensorflow together, however.

To start with, we call the require_gpu() function, which tells Thinc and PyTorch to allocate on GPU.


In [ ]:
!pip install "thinc>=8.0.0a0" torch "tensorflow>=2.0"

In [ ]:
from thinc.api import require_gpu

require_gpu()

We then call use_pytorch_for_gpu_memory() to set up the allocation strategy. Now when cupy tries to request GPU memory, it will do so by asking PyTorch, rather than asking the GPU directly.


In [ ]:
from thinc.api import use_pytorch_for_gpu_memory

use_pytorch_for_gpu_memory()

To test that it's working, we make a little function that allocates an array using cupy, and prints its size, along with the current size of PyTorch's memory pool. Notice the over-allocation: PyTorch grabs a much bigger chunk of memory than just our little array. That's why we need to have only one memory pool.


In [ ]:
import cupy 
import torch.cuda

def allocate_cupy_tensor(size):
    array = cupy.zeros((size,), dtype="f")
    print(array.size, torch.cuda.max_memory_allocated())
    return array
allocate_cupy_tensor(16)

We can also see that even when we free the tensor, the memory isn't immediately released. On the other hand, we don't need to resize the memory pool when we make a second small allocation.


In [ ]:
import tensorflow

with tensorflow.device('/device:GPU:0'):
    arr = allocate_cupy_tensor(1000)
    arr = None
    arr = allocate_cupy_tensor(1000)
    arr = None

If we make a huge allocation, we'll have to resize the pool though. Let's make sure the pool resizes properly, and that memory is freed when the tensors are removed.


In [ ]:
arr = allocate_cupy_tensor(1000)
for _ in range(100):
    arr2 = allocate_cupy_tensor(900000)