Sunday, December 20, 2015

Boost.Compute v0.5 Released

I'm happy to announce the v0.5 release of Boost.Compute!

Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation.

Since v0.4 (released last December), over 150 commits have been made by 18 different authors. During the past year there have been many improvements to the Boost.Compute algorithms and core library as well as a various bug fixes.

I'd like to thank all of the developers, reviewers, and users for their contributions to the library.

Sunday, December 14, 2014

Boost.Compute v0.4 Released

I'm proud to announce the release of Boost.Compute version 0.4!

Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation.

Since v0.3 (released in July), over 100 commits have been made by thirteen different authors. Many improvements have been made including updated algorithms, new example applications, and numerous bug fixes.

I'm also happy to announce that the formal Boost peer-review for Boost.Compute is scheduled to begin on Monday (12/15). Thanks to Antony Polukhin for volunteering to be the review manager.

As always, we look forward to hearing user feedback. If you find any bugs or issues, please report them to the issue tracker. Any questions or comments about Boost.Compute can be posted on the mailing list.


Friday, October 31, 2014

Ubuntu PPA for Boost.Compute

Boost.Compute now has an Ubuntu package and is available through a PPA.

Just run the following commands to install Boost.Compute on an Ubuntu system:

# add the boost.compute ppa
sudo add-apt-repository ppa:kylelutz/compute

# update and install boost-compute
sudo apt-get update && sudo apt-get install boost-compute

Tuesday, July 15, 2014

Boost.Compute v0.3 Released

I'm proud to announce the release of Boost.Compute version 0.3!

Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation.

Since version 0.2 (released about two months ago), there have been over a hundred commits by eight different authors. Quite a few new algorithms been implemented and many bugs have been fixed (thanks to all who reported issues!). Notably, much work has already been done by Roshan as a part of his Google Summer of Code project for Boost.Compute. There have also been a variety of new examples contributed which help demonstrate the usage of the Boost.Compute API (including showing how to use Boost.Compute and OpenCL together with other third-party APIs like OpenGL and OpenCV).

In the next months the focus will be on general performance improvements and preparing the library for Boost peer review. See the project road map posted here on the mailing list for more details.

We look forward to any and all feedback. If you find any bugs or issues, please report them to the issue tracker. Any questions or comments about Boost.Compute can be posted on the mailing list.

Sunday, May 11, 2014

Boost.Compute v0.2 Released

I'm happy to announce that version 0.2 of Boost.Compute has been tagged and released!

Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation.

Since version 0.1 (released almost two months ago), there have been 81 new commits created by 4 different authors. A number of bugs have been fixed including bug #105 which caused crashes when using OpenCL v1.1 devices with code compiled against OpenCL v1.2 headers. Also, new algorithms including unique(), search() and find_end() have been added by Roshan Raghupathy as part of his ongoing work for the Boost.Compute GSoC project.

Many new examples have also been added. The mandelbrot example shows how to calculate the mandelbrot fractal and display it directly on the GPU using the OpenCL-OpenGL sharing extension. The new k_means example shows how to quickly calculate clusters for a set of 2D points on the GPU. Also, let me know (or submit an issue) if you have any ideas/requests for additional examples.

And, as always, I look forward to any and all feedback. You can submit an issue through the issue tracker or contact me directly (email can be found in my profile).

P.S. I'll be at the C++Now conference in Aspen all next week. Let me know if you'd like to meet up.

Monday, April 28, 2014

Using AMD's Static C++ Kernel Language in OpenCL

AMD provides an implementation of an OpenCL extension which adds support for certain C++ features (e.g. templates) in OpenCL source code. It's called the "OpenCL Static C++ Kernel Language" extension and the specification can be found here (PDF). This extension is quite useful (especially for people coming from NVIDIA's CUDA who see the C99-based OpenCL kernel language as lacking).

In this blog post I go over how to use C++ templates directly in OpenCL kernels with this extension. I hope this is useful to others as, to me at least, how to use this extension is not immediately apparent.

First, we define our templated function (the prototypical square() function template):

template<typename T>
inline T square(const T x)
{
    return x * x;
}

Next we define a templated kernel which calls the square() function:

template<typename T>
__kernel void square_kernel(__global T *data)
{
    const uint i = get_global_id(0);
    data[i] = square(data[i]);
}

Now, in order to use the templated kernel, we must explicitly instantiate it and give it a name that can be used to call it from the host. We do that by explicitly stating the template-types and using the mangled_name attribute as so:

template __attribute__((mangled_name(square_kernel_int)))
__kernel void square_kernel(__global int *data);

The kernel can be instantiated multiple times for different types (though each must be given a unique mangled name). For example, to define the square kernel for float's:

template __attribute__((mangled_name(square_kernel_float)))
__kernel void square_kernel(__global float *data);

That's it for the OpenCL code. To compile the program on the host you must pass the "-x clc++" compile option to the clBuildProgram() function. In Boost.Compute, that is done like so (where source is a string containing all of the OpenCL source code from above and context is the OpenCL context object):

compute::program square_program =
    compute::program::build_with_source(source, context, "-x clc++");

Once built, the templated kernels can be instantiated by passing their mangled name to the clCreateKernel() function. In Boost.Compute, you would do the following:

compute::kernel square_int_kernel(square_program, "square_kernel_int");

The square_int_kernel can now be used just like any other kernel object (e.g. passed to clEnqueueNDRangeKernel()for execution on the device).

A fully implemented and compilable example demonstrating the code above can be found here.

Saturday, April 12, 2014

Using OpenCL with Boost.Compute on Amazon EC2

This post details the setup required to run code written with OpenCL and Boost.Compute on Amazon EC2.

Currently, Amazon offers two different NVIDIA-based GPU instance types 
(G2 and CG1). The G2 instances come with a Kepler GK104 GPU while the CG1 instances have an older Tesla M2050. I went with the G2 instance. In my nearest zone (US West), rates ran ~70 cents/hour.

Surprisingly (compared with the last time I used EC2 a couple years ago) its comes with fairly recent software including GCC 4.8 and Boost 1.53. This makes setup much easier than it used to be.

First, install the necessary dependencies from the package manager:

sudo yum install gcc48-c++ cmake git boost-devel

Then, clone Boost.Compute:

git clone https://github.com/kylelutz/compute.git

Next, create a build directory:

mkdir compute-build && cd compute-build

And then run cmake:

cmake -DOPENCL_INCLUDE_DIRS=/opt/nvidia/cuda/include/
-DOPENCL_LIBRARIES=/usr/lib64/libOpenCL.so
-DBOOST_COMPUTE_BUILD_EXAMPLES=ON ../compute

Now run make to compile everything:

make -j24

If successful, running the list_devices example should show the NVIDIA GPU:

$ ./example/list_devices
Platform 'NVIDIA CUDA'
  GPU Device: GRID K520

All in all, it was a fairly painless process to get up and running (and much cheaper than buying a Kepler-class Tesla card!).