Monday, April 28, 2014

Using AMD's Static C++ Kernel Language in OpenCL

AMD provides an implementation of an OpenCL extension which adds support for certain C++ features (e.g. templates) in OpenCL source code. It's called the "OpenCL Static C++ Kernel Language" extension and the specification can be found here (PDF). This extension is quite useful (especially for people coming from NVIDIA's CUDA who see the C99-based OpenCL kernel language as lacking).

In this blog post I go over how to use C++ templates directly in OpenCL kernels with this extension. I hope this is useful to others as, to me at least, how to use this extension is not immediately apparent.

First, we define our templated function (the prototypical square() function template):

template<typename T>
inline T square(const T x)
    return x * x;

Next we define a templated kernel which calls the square() function:

template<typename T>
__kernel void square_kernel(__global T *data)
    const uint i = get_global_id(0);
    data[i] = square(data[i]);

Now, in order to use the templated kernel, we must explicitly instantiate it and give it a name that can be used to call it from the host. We do that by explicitly stating the template-types and using the mangled_name attribute as so:

template __attribute__((mangled_name(square_kernel_int)))
__kernel void square_kernel(__global int *data);

The kernel can be instantiated multiple times for different types (though each must be given a unique mangled name). For example, to define the square kernel for float's:

template __attribute__((mangled_name(square_kernel_float)))
__kernel void square_kernel(__global float *data);

That's it for the OpenCL code. To compile the program on the host you must pass the "-x clc++" compile option to the clBuildProgram() function. In Boost.Compute, that is done like so (where source is a string containing all of the OpenCL source code from above and context is the OpenCL context object):

compute::program square_program =
    compute::program::build_with_source(source, context, "-x clc++");

Once built, the templated kernels can be instantiated by passing their mangled name to the clCreateKernel() function. In Boost.Compute, you would do the following:

compute::kernel square_int_kernel(square_program, "square_kernel_int");

The square_int_kernel can now be used just like any other kernel object (e.g. passed to clEnqueueNDRangeKernel()for execution on the device).

A fully implemented and compilable example demonstrating the code above can be found here.

Saturday, April 12, 2014

Using OpenCL with Boost.Compute on Amazon EC2

This post details the setup required to run code written with OpenCL and Boost.Compute on Amazon EC2.

Currently, Amazon offers two different NVIDIA-based GPU instance types 
(G2 and CG1). The G2 instances come with a Kepler GK104 GPU while the CG1 instances have an older Tesla M2050. I went with the G2 instance. In my nearest zone (US West), rates ran ~70 cents/hour.

Surprisingly (compared with the last time I used EC2 a couple years ago) its comes with fairly recent software including GCC 4.8 and Boost 1.53. This makes setup much easier than it used to be.

First, install the necessary dependencies from the package manager:

sudo yum install gcc48-c++ cmake git boost-devel

Then, clone Boost.Compute:

git clone

Next, create a build directory:

mkdir compute-build && cd compute-build

And then run cmake:

cmake -DOPENCL_INCLUDE_DIRS=/opt/nvidia/cuda/include/

Now run make to compile everything:

make -j24

If successful, running the list_devices example should show the NVIDIA GPU:

$ ./example/list_devices
Platform 'NVIDIA CUDA'
  GPU Device: GRID K520

All in all, it was a fairly painless process to get up and running (and much cheaper than buying a Kepler-class Tesla card!).