Texture And Surface on GPU


Prerequisites

To get the most out of this lab, you should already be able to:

  • Write, compile, and run C# programs that both call CPU functions and launch GPU kernels.
  • Control parallel thread hierarchy using execution configuration.
  • Have some notions on images

Objectives

By the time you complete this lab, you will be able to:

  • Accelerate image processing algorithms with Texture and Surface memory

Sobel Filter

The Sobel operator, sometimes called the Sobel–Feldman operator or Sobel filter, is used in image processing and computer vision, particularly within edge detection algorithms where it creates an image emphasising edges.

Technically, it is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Sobel–Feldman operator is either the corresponding gradient vector or the norm of this vector. The Sobel–Feldman operator is based on convolving the image with a small, separable, and integer-valued filter in the horizontal and vertical directions and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient approximation that it produces is relatively crude, in particular for high-frequency variations in the image.


GPU

The first implementation is directly with the GPU, the main program 01-gpu.cs contains a program that is already working and compute the sobel filter.


In [ ]:
import platform
if platform.system() == "Windows" : # create directory on Windows
    !mkdir output-01-gpu 
if platform.system() == "Linux" : # create directory on Linux
    !mkdir -p ./output-01-gpu 
    
!hybridizer-cuda ./01-GPU/01-gpu.cs graybitmap.cs -o ./01-GPU/01-gpu.exe -run

# convert bmp to png to have interactive display
from PIL import Image
img = Image.open('./output-01-gpu/sobel.bmp')
img.save('./output-01-gpu/sobel.png', 'png')
from IPython.display import Image
Image(filename="./output-01-gpu/sobel.png", width=384, height=384)

The texture memory spaces reside in device memory and are cached in texture cache, so a texture fetch costs one memory read from device memory only on a cache miss, otherwise it just costs one read from texture cache. The texture cache is optimized for 2D spatial locality, so threads of the same warp that read texture or surface addresses that are close together in 2D will achieve best performance. Also, it is designed for streaming fetches with a constant latency; a cache hit reduces DRAM bandwidth demand but not fetch latency. The texture memory is read-only.

With the hybridizer, you'll need to use the texture object API and use Texture functions like in the TextureHelpers.cs file that have the functions implemented with the [IntrinsicFunction("cuda function")] attribute.

Now to create a Texture object you'll need to :

  • Create an IntPtr with the float array you want to copy in texture memory(you need to create the hybrunner before to create this IntPtr.
  • Create a cudaChannelFormatDesc to decribe the format of the value. Set the value to determine which type of data you want to use(float, float2, ushort, ushort4, ...).
  • Create the CUDA Array with the IntPtr and the cudaChannelFormatDesc you have previously created, the width and the height to have a 2D array in the CreateCudaArray function that alloc the array.
  • Create the Ressource Descriptor with the cuda Array.
  • Create the Texture Descriptor with the CreateCudaTextureDesc() function.
  • Finally create the texture object, then use cuda.CreateTextureObject with the texture object, the ressource descriptor and the texture descriptor. Now your texture is operational and you can use it.
//creation of the source pointer
IntPtr src = runner.Marshaller.MarshalManagedToNative(imageFloat);

//bind texture
cudaChannelFormatDesc channelDescTex = TextureHelpers.cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKind.cudaChannelFormatKindFloat);
//create CUDA Array
cudaArray_t cuArrayTex = TextureHelpers.CreateCudaArray(channelDescTex, src, (int)width, (int)height);
//create Ressource descriptor
cudaResourceDesc resDescTex = TextureHelpers.CreateCudaResourceDesc(cuArrayTex);

//create Texture descriptor
cudaTextureDesc texDesc = TextureHelpers.CreateCudaTextureDesc();

//create Texture object
cudaTextureObject_t texObj;
cuda.CreateTextureObject(out texObj, ref resDescTex, ref texDesc);

Then you can read your texture with the text2D function to access to the value you want with the texture object, the x and the y position of the value.

tex2D(cudaTextureObject_t texObj, float x, float y)

Now modify 02-texture.cs to allow the compute with the texture.

Should you need, refer to the solution


In [ ]:
import platform
if platform.system() == "Windows" : # create directory on Windows
    !mkdir output-02-texture
if platform.system() == "Linux" : # create directory on Linux
    !mkdir -p ./output-02-texture
    
!hybridizer-cuda ./02-Texture/02-texture.cs graybitmap.cs TextureHelpers.cs -o ./02-Texture/02-texture.exe -run

# convert bmp to png to have interactive display
from PIL import Image
img = Image.open('./output-02-texture/sobel.bmp')
img.save('./output-02-texture/sobel.png', 'png')
from IPython.display import Image
Image(filename="./output-02-texture/sobel.png", width=384, height=384)

The surface memory is like the texture one, the surface memory spaces reside in device memory and are cached in texture cache, so a surface read costs one memory read from device memory only on a cache miss, otherwise it just costs one read from texture cache.

Like the texture, with the hybridizer, you'll need to use the surface object API and use Surface functions like in the TextureHelpers.cs file that have the functions implemented with the [IntrinsicFunction("cuda function")] attribute.

Now to create a Surface object you'll need to :
  • Create a cudaChannelFormatDesc to decribe the format of the value.
  • Create the CUDA Array
  • alloc the CUDA Array with cuda.MallocArray with the CUDA Array, the cudaChannelFormatDesc, the width, the height to have a 2D array and the flag cudaArraySurfaceLoadStore to specify you want to alloc a surface.
  • Create the Ressource Descriptor with the cuda Array.
  • Finally create the texture object, then use cuda.CreateSurfaceObject with the texture object, the ressource descriptor and the texture descriptor. Now your texture is operational and you can use it.
//bind surface
cudaChannelFormatDesc channelDescSurf = TextureHelpers.cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKind.cudaChannelFormatKindFloat);
cudaArray_t cuArraySurf;
//alloc CUDA Array
cuda.MallocArray(out cuArraySurf, ref channelDescSurf, width, height, cudaMallocArrayFlags.cudaArraySurfaceLoadStore);

//create cudaResourceDesc for surface
cudaResourceDesc resDescSurf = TextureHelpers.CreateCudaResourceDesc(cuArraySurf);

//create surface object
cudaSurfaceObject_t surfObj;
cuda.CreateSurfaceObject(out surfObj, ref resDescSurf);

Now you can write in the surface with the surf2Dwrite(float data, cudaSurfaceObject_t surfObj, int x, int y) function where data is the new value, surfobj is the surface object, x is the x coordinate in byte and y is the y coordinate.

TExtureHelpers.surf2Dwrite(1.0F, surfObj, sizeOf(float) * 2, sizeOf(float) * 4); //write 1.0F in surfObj in the position (2,4)

Finally to have your data on the host you need to create a float array and pinned it to have an IntPtr and then use cuda.MemcpyFromArray to retrieve the data on the host.

//pinned float array to allow the copy of the surface object on the host
float[] imageCompute = new float[width * height];
GCHandle handle = GCHandle.Alloc(imageCompute, GCHandleType.Pinned);
IntPtr dest = handle.AddrOfPinnedObject();

cuda.MemcpyFromArray(dest, cuArraySurf, 0, 0, width * height * sizeof(float), cudaMemcpyKind.cudaMemcpyDeviceToHost);

Now modify 03-surface.cs to allow the compute with the surface.

Should you need, refer to the solution


In [ ]:
import platform
if platform.system() == "Windows" : # create directory on Windows
    !mkdir output-03-surface
if platform.system() == "Linux" : # create directory on Linux
    !mkdir -p ./output-03-surface
    
!hybridizer-cuda ./03-Surface/03-surface.cs graybitmap.cs TextureHelpers.cs -o ./03-Surface/03-surface.exe -run

# convert bmp to png to have interactive display
from PIL import Image
img = Image.open('./output-03-surface/sobel.bmp')
img.save('./output-03-surface/sobel.png', 'png')
from IPython.display import Image
Image(filename="./output-03-surface/sobel.png", width=384, height=384)