Sunday, December 14, 2014

Boost.Compute v0.4 Released

I'm proud to announce the release of Boost.Compute version 0.4!

Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation.

Since v0.3 (released in July), over 100 commits have been made by thirteen different authors. Many improvements have been made including updated algorithms, new example applications, and numerous bug fixes.

I'm also happy to announce that the formal Boost peer-review for Boost.Compute is scheduled to begin on Monday (12/15). Thanks to Antony Polukhin for volunteering to be the review manager.

As always, we look forward to hearing user feedback. If you find any bugs or issues, please report them to the issue tracker. Any questions or comments about Boost.Compute can be posted on the mailing list.

Friday, October 31, 2014

Ubuntu PPA for Boost.Compute

Boost.Compute now has an Ubuntu package and is available through a PPA.

Just run the following commands to install Boost.Compute on an Ubuntu system:

# add the boost.compute ppa
sudo add-apt-repository ppa:kylelutz/compute

# update and install boost-compute
sudo apt-get update && sudo apt-get install boost-compute

Tuesday, July 15, 2014

Boost.Compute v0.3 Released

I'm proud to announce the release of Boost.Compute version 0.3!

Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation.

Since version 0.2 (released about two months ago), there have been over a hundred commits by eight different authors. Quite a few new algorithms been implemented and many bugs have been fixed (thanks to all who reported issues!). Notably, much work has already been done by Roshan as a part of his Google Summer of Code project for Boost.Compute. There have also been a variety of new examples contributed which help demonstrate the usage of the Boost.Compute API (including showing how to use Boost.Compute and OpenCL together with other third-party APIs like OpenGL and OpenCV).

In the next months the focus will be on general performance improvements and preparing the library for Boost peer review. See the project road map posted here on the mailing list for more details.

We look forward to any and all feedback. If you find any bugs or issues, please report them to the issue tracker. Any questions or comments about Boost.Compute can be posted on the mailing list.

Sunday, May 11, 2014

Boost.Compute v0.2 Released

I'm happy to announce that version 0.2 of Boost.Compute has been tagged and released!

Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation.

Since version 0.1 (released almost two months ago), there have been 81 new commits created by 4 different authors. A number of bugs have been fixed including bug #105 which caused crashes when using OpenCL v1.1 devices with code compiled against OpenCL v1.2 headers. Also, new algorithms including unique(), search() and find_end() have been added by Roshan Raghupathy as part of his ongoing work for the Boost.Compute GSoC project.

Many new examples have also been added. The mandelbrot example shows how to calculate the mandelbrot fractal and display it directly on the GPU using the OpenCL-OpenGL sharing extension. The new k_means example shows how to quickly calculate clusters for a set of 2D points on the GPU. Also, let me know (or submit an issue) if you have any ideas/requests for additional examples.

And, as always, I look forward to any and all feedback. You can submit an issue through the issue tracker or contact me directly (email can be found in my profile).

P.S. I'll be at the C++Now conference in Aspen all next week. Let me know if you'd like to meet up.

Monday, April 28, 2014

Using AMD's Static C++ Kernel Language in OpenCL

AMD provides an implementation of an OpenCL extension which adds support for certain C++ features (e.g. templates) in OpenCL source code. It's called the "OpenCL Static C++ Kernel Language" extension and the specification can be found here (PDF). This extension is quite useful (especially for people coming from NVIDIA's CUDA who see the C99-based OpenCL kernel language as lacking).

In this blog post I go over how to use C++ templates directly in OpenCL kernels with this extension. I hope this is useful to others as, to me at least, how to use this extension is not immediately apparent.

First, we define our templated function (the prototypical square() function template):

template<typename T>
inline T square(const T x)
    return x * x;

Next we define a templated kernel which calls the square() function:

template<typename T>
__kernel void square_kernel(__global T *data)
    const uint i = get_global_id(0);
    data[i] = square(data[i]);

Now, in order to use the templated kernel, we must explicitly instantiate it and give it a name that can be used to call it from the host. We do that by explicitly stating the template-types and using the mangled_name attribute as so:

template __attribute__((mangled_name(square_kernel_int)))
__kernel void square_kernel(__global int *data);

The kernel can be instantiated multiple times for different types (though each must be given a unique mangled name). For example, to define the square kernel for float's:

template __attribute__((mangled_name(square_kernel_float)))
__kernel void square_kernel(__global float *data);

That's it for the OpenCL code. To compile the program on the host you must pass the "-x clc++" compile option to the clBuildProgram() function. In Boost.Compute, that is done like so (where source is a string containing all of the OpenCL source code from above and context is the OpenCL context object):

compute::program square_program =
    compute::program::build_with_source(source, context, "-x clc++");

Once built, the templated kernels can be instantiated by passing their mangled name to the clCreateKernel() function. In Boost.Compute, you would do the following:

compute::kernel square_int_kernel(square_program, "square_kernel_int");

The square_int_kernel can now be used just like any other kernel object (e.g. passed to clEnqueueNDRangeKernel()for execution on the device).

A fully implemented and compilable example demonstrating the code above can be found here.

Saturday, April 12, 2014

Using OpenCL with Boost.Compute on Amazon EC2

This post details the setup required to run code written with OpenCL and Boost.Compute on Amazon EC2.

Currently, Amazon offers two different NVIDIA-based GPU instance types 
(G2 and CG1). The G2 instances come with a Kepler GK104 GPU while the CG1 instances have an older Tesla M2050. I went with the G2 instance. In my nearest zone (US West), rates ran ~70 cents/hour.

Surprisingly (compared with the last time I used EC2 a couple years ago) its comes with fairly recent software including GCC 4.8 and Boost 1.53. This makes setup much easier than it used to be.

First, install the necessary dependencies from the package manager:

sudo yum install gcc48-c++ cmake git boost-devel

Then, clone Boost.Compute:

git clone

Next, create a build directory:

mkdir compute-build && cd compute-build

And then run cmake:

cmake -DOPENCL_INCLUDE_DIRS=/opt/nvidia/cuda/include/

Now run make to compile everything:

make -j24

If successful, running the list_devices example should show the NVIDIA GPU:

$ ./example/list_devices
Platform 'NVIDIA CUDA'
  GPU Device: GRID K520

All in all, it was a fairly painless process to get up and running (and much cheaper than buying a Kepler-class Tesla card!).

Sunday, March 16, 2014

Boost.Compute v0.1 Released

I'm proud to announce the initial release (version 0.1) of Boost.Compute! It is available on GitHub and instructions for getting started can be found in the documentation.

Boost.Compute is a C++ library for GPGPU and parallel-computing based on OpenCL. There have been 292 commits by 4 authors comprising ~26,000 lines of code. I am very thankful to those who have contributed and would encourage any others who are interested to shoot me an email.

I look forward to any feedback. You can submit an issue through the issue tracker or contact me directly (email can be found in my profile).

Sunday, March 9, 2014

Custom OpenCL functions in C++ with Boost.Compute

Due to OpenCL's C99-based programming and compilation model, defining and using custom functions from C++ on the GPU can be challenging. However, Boost.Compute provides a few utilities to simplify function creation and execution from C++ (without ever having to touch a raw source code string!).

The BOOST_COMPUTE_FUNCTION() macro creates function objects in C++ which can then be executed on the GPU by OpenCL with the Boost.Compute algorithms (e.g. transform(), sort()).

As arguments, the macro takes the function's return type, name, argument list, and source. The first three arguments (return type, name, and argument list) are all C++ types/expressions. The source argument contains the body of the function which will be inserted into an OpenCL program when executed with one of the Boost.Compute algorithms.

The return type, name, and argument list are used by Boost.Compute to automatically generate the OpenCL function declaration as well as to instantiate the C++ function<> object with the correct signature. This ensures type-safety in C++ (e.g. calling the function with the wrong number of arguments will result in a C++ compile-time error).

The following example shows how to create a comparison function which can be passed to the sort() algorithm in order to sort a list of vectors by their length.

The BOOST_COMPUTE_CLOSURE() is similar to the function macro but additionally allows for a set of C++ variables to be "captured" and then used in the OpenCL function source. This is similar to passing variables to C++11 lambda functions with the capture list. For now, only value types (e.g. float, int4) can be captured. In the future I plan on extending this to allow memory-object types (e.g. buffer, image2d) as well.

The following example shows how to create a function which determines if a 2D cartesian point is contained in a circle with its radius and center given in C++.

As you can see, the C++ center and radius variables have been captured by the closure function and made available for use in the OpenCL source code. Under the hood this is accomplished by invisibly passing the captured values to OpenCL when the function is invoked.

In addition to these macros, Boost.Compute also contains a lambda-expression framework which allows for one-line C++ expressions to be converted to OpenCL source-code and executed on the GPU. This is similar to the Boost.Lambda library and based on the Boost.Proto library.

The previous example showing how to sort vectors by their length could also be written using a lambda-expression as follows:

Together these macros and the lambda-expression framework provide a powerful way to create OpenCL functions interspersed with C++ code (and, notably, doesn't require a special compiler or any non-standard compiler extensions!).

Edit: As of this commit on 4/20/2014 the BOOST_COMPUTE_FUNCTION() macro now uses a list of arguments including their type and name. The old, auto-generated names (e.g. _1, _2) are no longer used. The new version allows for clearer code with more descriptive variable names. The examples above have been updated to reflect the new API.

Sunday, February 2, 2014

Update 2014

It's been a while since I've posted anything here. I'll try to make 2014 different.

A lot has happened over the past year. Traveled to Europe, sailed around the Caribbean, moved back to California (bay area this time) with an amazing girl, and started a new job as a software engineer at Google.

Work on Boost.Compute continues slowly, but steadily. It's fairly feature complete at this point and the main focus now is performance. Hopefully it will be ready to propose for review somewhat soon.

Anyway, here's a few of my favorite pictures from 2013: