¿Por qué PyCUDA?

Hasta ahora hemos visto que si bien CUDA no es un lenguaje imposible de aprender, puede llegar a ser un dolor de cabeza el tener muchos apuntadores y manejar la memoria de un modo tan rudimentario.

Sin embargo hay alternativas que nos permiten trabajar en entornos más agradables, un ejemplo de ellos es PyCUDA creado con Andreas Klöckner. Básicamente PyCUDA se encarga de mapear todo CUDA dentro de Python.

Por poner un ejemplo, un código simple sería el siguiente


In [1]:
import pycuda.autoinit
import pycuda.driver as drv
import numpy

from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiplicar(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")

multiplicar = mod.get_function("multiplicar")

a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)

dest = numpy.zeros_like(a)

print dest

multiplicar(
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1), grid=(1,1))

print dest

print dest-a*b


[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.]
[ -2.23093435e-01   7.71138608e-01  -1.62466490e+00  -9.32421863e-01
  -1.10088444e+00   6.76056206e-01  -1.51221408e-02   2.10633740e-01
  -1.39220560e+00   6.93142833e-03  -9.32331622e-01   3.02768320e-01
   5.41956782e-01  -7.21880674e-01  -4.27147537e-01   1.88914979e+00
  -3.20988566e-01  -1.28624272e+00   5.96209288e-01  -4.46770936e-01
  -6.68728292e-01   1.01363969e+00   3.71933699e-01  -1.23209929e+00
  -3.97991896e-01   5.50299227e-01  -5.86081017e-03   2.00373673e+00
  -5.41762233e-01   3.23940441e-02  -8.31922051e-03   9.98587832e-02
  -9.43861976e-02  -3.03527212e+00  -2.07167578e+00  -8.80579874e-02
  -2.50485063e+00   1.93559602e-01   2.53015816e-01   3.91247869e-01
  -8.53050411e-01  -3.98777202e-02   1.39256179e+00  -2.20480915e-02
  -9.79267657e-01   9.32142675e-01  -1.96398824e-01   1.10271084e+00
   4.89762425e-01   6.21928930e-01  -4.68936920e-01  -1.77025902e+00
   1.61385071e+00  -9.19178188e-01   2.78751945e+00  -6.34354949e-01
  -4.46722060e-02  -5.47955275e-01   8.18227112e-01  -1.75317691e-05
  -1.09817171e-02   1.24889195e-01  -5.10717213e-01  -1.73888773e-01
   1.20121169e+00  -2.52370983e-01   5.46664476e-01  -2.35158384e-01
   9.90039259e-02   1.84710339e-01   9.40681815e-01  -4.87923682e-01
   4.83251989e-01  -3.77808303e-01   9.36705470e-01  -1.52405059e+00
  -1.79788377e-02  -2.81568199e-01   5.97173795e-02  -1.43277556e-01
  -1.87328053e+00   1.21282613e+00  -5.11281073e-01  -1.74878943e+00
   5.77167690e-01   6.02475286e-01   5.42142212e-01   1.33045375e-01
  -1.12367201e+00   6.49231747e-02   1.60744816e-01  -8.57812405e-01
   4.89366648e-04   3.16790074e-01  -4.95854706e-01   1.88671183e-02
   5.21268323e-02   9.00193751e-01   2.21465135e+00   1.33235991e+00
   6.15344584e-01   2.99904794e-01  -1.15651798e+00  -2.24848127e+00
  -4.27697301e-01  -2.75955290e-01  -4.69827056e-01  -1.11235291e-01
  -3.81987989e-01   2.56066054e-01   1.94903061e-01   8.71874928e-01
   4.18583959e-01  -5.12448132e-01  -2.79282853e-02   7.58963883e-01
   6.23363733e-01  -2.23189279e-01  -3.48707581e+00  -4.68175076e-02
   4.95050818e-01   1.07239473e+00   1.32306486e-01   6.18495166e-01
   4.15356696e-01  -1.24451756e+00   9.31818306e-01  -3.35362577e+00
   9.98254195e-02  -3.06033552e-01  -1.37546134e+00  -4.85503942e-01
   6.82494700e-01  -6.27095029e-02  -1.03054487e-03  -1.41750183e-02
   1.26088098e-01  -4.28531021e-01  -1.09709494e-01  -1.49764698e-02
  -3.15558362e+00  -1.82246529e-02  -4.67530131e-01  -2.57462919e-01
  -5.66141773e-03   3.19395989e-01   1.47326958e+00   1.15540016e+00
   3.89225245e-01   5.04462793e-03   3.36519450e-01  -4.22188044e-01
   3.68091494e-01   2.04593158e+00   2.96112914e-02  -1.32439891e-02
   1.28681511e-01   5.78986228e-01   2.59274244e-01  -2.31971025e+00
   2.16971226e-02  -1.63532168e-01  -1.23101689e-01  -3.31149411e+00
   9.66133714e-01  -5.11316657e-01   1.34220481e+00  -2.78830457e+00
   2.95583725e-01  -3.56204472e-02   1.87914655e-01   1.77598357e+00
   7.33621791e-02  -1.89232874e+00  -1.26144677e-01   8.98270383e-02
  -4.81581330e-01   4.73246574e-01  -2.06406370e-01   2.13020085e-03
   1.85043126e-01   2.47785851e-01   9.13514912e-01   1.84481461e-02
  -3.65972489e-01   2.81253695e-01  -4.19901252e-01   4.27739352e-01
   5.74206635e-02  -1.80134684e-01  -2.41042331e-01   5.67991018e-01
  -3.46471608e-01  -4.40766245e-01  -2.37729773e-03   2.13661027e+00
   8.69406834e-02   5.32891974e-02   5.89102097e-02  -2.29308677e+00
   8.01202238e-01   1.14783573e+00   1.00583881e-01   6.93704844e-01
   2.08954830e-02  -2.33266711e+00  -1.14522791e+00  -6.19596899e-01
   2.26983041e-01  -3.10265779e-01  -4.07066703e-01   5.76317906e-01
  -9.23607722e-02   2.13114351e-01  -5.31915307e-01  -7.65151344e-03
  -9.82670963e-01  -1.00366271e+00  -1.24221873e+00   2.26679981e-01
   5.12899280e-01  -8.54598463e-01   2.64241815e-01  -2.90651709e-01
  -2.70586491e-01   8.07394326e-01  -3.32982481e-01  -2.44420707e-01
  -4.06483293e-01  -1.61623394e+00   3.08610487e+00   3.90143067e-01
   1.04483914e+00   2.32398301e-01  -1.81912041e+00   2.41185260e+00
   1.38732219e+00   2.94656813e-01   8.39051545e-01  -2.20835066e+00
   3.07041228e-01  -1.52762079e+00   3.78000170e-01   3.32728088e-01
   4.55939621e-01  -2.69515943e-02  -1.10657883e+00   5.31137347e-01
   3.29975900e-03   2.56742209e-01   1.10293591e+00  -1.25127241e-01
  -6.07312083e-01  -5.09980740e-03  -1.01451182e+00   4.65393972e+00
  -4.63319838e-01   8.05306137e-01   1.06400296e-01   6.88281059e-02
  -8.54881585e-01  -2.86873603e+00  -1.78573709e-02   1.10349965e+00
  -9.32966471e-01  -1.30772090e+00  -3.14193219e-01   2.26954743e-01
  -3.19956452e-01   1.25444448e+00   6.87867522e-01  -1.52005069e-02
  -6.73608959e-01   1.82131141e-01  -2.34017625e-01   5.03092349e-01
   1.15182839e-01  -5.32620847e-02   8.01535100e-02   3.67249346e+00
   1.97036132e-01  -7.55788505e-01   6.97196722e-01  -8.40918362e-01
  -1.83792323e-01  -3.92277772e-03  -4.35655326e-01  -3.30601096e-01
  -5.51108411e-03  -1.01614738e+00  -9.99516807e-03   1.22893167e+00
   1.55224025e-01  -1.27889797e-01  -1.77164435e+00  -2.15255599e-02
   2.53130108e-01   1.46393645e+00  -2.37813443e-02   1.88755706e-01
  -3.84346694e-01  -9.36430931e-01   6.88675404e-01   3.98286819e-01
   1.68030933e-01  -2.80782253e-01   7.01844454e-01  -2.37076748e-02
   2.02387840e-01  -9.21126842e-01   2.13446081e-01  -2.31383741e-01
   3.33836526e-02   7.11176038e-01  -6.68884695e-01   8.94702226e-02
   2.07690030e-01  -5.88553011e-01  -9.80316475e-03  -5.12965977e-01
  -6.79002643e-01   1.60504830e+00  -3.63385022e-01  -1.57416546e+00
  -2.06378639e-01   8.00736845e-01   8.47820118e-02   2.11665869e-01
  -1.99302030e+00  -6.11883821e-03  -4.25845623e-01  -1.81322873e+00
   8.57481267e-03  -9.84930813e-01  -1.55273944e-01  -1.03869331e+00
  -5.03384233e-01   2.86864352e+00   4.37786616e-02  -1.12882817e+00
  -1.06297567e-01   1.54476988e+00  -1.30843818e-01  -1.01778996e+00
   4.16008532e-01   1.01629364e+00   6.24441449e-03   7.01418042e-01
   3.43267731e-02  -1.75705835e-01  -8.75325322e-01   1.60338536e-01
   3.26878041e-01  -2.55433917e-02   6.34551942e-01  -3.71364458e-03
  -3.91968638e-01   1.60392839e-02   1.14506865e+00   1.15264654e-01
   2.07620883e+00  -4.02823716e-01   2.70710737e-01   1.65674880e-01
   2.60421038e-01   5.95079623e-02   1.15429068e+00   5.75600229e-02
  -8.12964439e-01  -6.00483239e-01  -1.93542138e-01  -7.90636659e-01
   5.71399271e-01  -2.44854882e-01   2.18541119e-02  -3.93011086e-02
   5.02089381e-01  -4.67524618e-01  -2.22658023e-01  -2.50810862e-01
   7.52323344e-02   5.52335918e-01   1.09623475e-02   1.56716816e-02
  -3.52521315e-02   2.29123250e-01   6.50775852e-03   2.21132994e+00
   3.39943647e-01   2.13047601e-02   1.37103236e+00   2.37887168e+00
   5.33069670e-01   2.96517871e-02   1.62335008e-01  -3.68504152e-02
   3.05504411e-01   2.02799812e-01  -2.21257353e+00   4.38355543e-02]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.]

Al correr este programa vamos a obtener un montón de ceros; algo no muy interesante. Sin embargo detrás de escenas sí pasó algo interesante.

  • PyCUDA compiló el código fuente y lo cargó a la tarjeta.
  • Se asignó memoria automáticamente, además de copiar las cosas de CPU a GPU y de vuelta.
  • Por último la limpieza (liberación de memoria) se hace sola.

Útil ¿cierto?

Usando PyCUDA

Para empezar debemos importar e incializar PyCUDA


In [2]:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

Transferir datos

El siguiente paso es transferir datos al GPU. Principalmente arreglos de numpy. Por ejemplo, tomemos un arreglo de números aleatorios de $4 \times 4$


In [3]:
import numpy
a = numpy.random.randn(4,4)

sin embargo nuestro arreglo a consiste en números de doble precisión, dado que no todos los GPU de NVIDIA cuentan con doble precisión es que hacemos lo siguiente


In [4]:
a = a.astype(numpy.float32)

finalmente, necesitmos un arreglo hacia el cuál transferir la información, así que deberíamos guardar la memoria en el dispositivo:


In [5]:
a_gpu = cuda.mem_alloc(a.nbytes)

como último paso, necesitamos tranferir los datos al GPU


In [6]:
cuda.memcpy_htod(a_gpu, a)

Ejecutando kernels

Durante este capítulo nos centraremos en un ejemplo muy simple. Escribir un código para duplicar cada una de las entradas en un arreglo, en seguida escribimos el kernel en CUDA C, y se lo otorgamos al constructor de pycuda.compiler.SourceModule


In [7]:
mod = SourceModule("""
  __global__ void duplicar(float *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)

Si no hay errores, el código ahora ha sido compilado y cargado en el dispositivo. Encontramos una referencia a nuestra pycuda.driver.Function y la llamamos, especificando a_gpu como el argumento, y un tamaño de bloque de $4\times 4$:


In [8]:
mod


Out[8]:
<pycuda.compiler.SourceModule at 0x7f31a1fc0410>

In [9]:
func = mod.get_function("duplicar")
func(a_gpu, block=(4,4,1))

In [10]:
func


Out[10]:
<pycuda._driver.Function at 0x7f31a1fc3938>

Finalmente recogemos la información del GPU y la mostramos con el a original


In [11]:
a_duplicado = numpy.empty_like(a)
cuda.memcpy_dtoh(a_duplicado, a_gpu)
print a_duplicado
print a


[[-2.97226572  0.20464714  2.41393161 -2.09682822]
 [-1.63446426  0.7785539   1.59411359  0.42689195]
 [-1.14897263 -0.54843307  4.48510981  0.04625186]
 [-0.64690572  2.06020832 -1.02194703  5.90874815]]
[[-1.48613286  0.10232357  1.2069658  -1.04841411]
 [-0.81723213  0.38927695  0.79705679  0.21344598]
 [-0.57448632 -0.27421653  2.2425549   0.02312593]
 [-0.32345286  1.03010416 -0.51097351  2.95437407]]

In [12]:
print(type(a))
print(type(a_gpu))
print(type(a_duplicado))


<type 'numpy.ndarray'>
<class 'pycuda._driver.DeviceAllocation'>
<type 'numpy.ndarray'>

In [13]:
a_duplicado


Out[13]:
array([[-2.97226572,  0.20464714,  2.41393161, -2.09682822],
       [-1.63446426,  0.7785539 ,  1.59411359,  0.42689195],
       [-1.14897263, -0.54843307,  4.48510981,  0.04625186],
       [-0.64690572,  2.06020832, -1.02194703,  5.90874815]], dtype=float32)

¡Y eso es todo! Hemos terminado con el trabajo. PyCUDA se encarga de hacer toda la limpieza por nosotros.