Use OpenCL to execute native kernels


Today is a big day for Clover : we can finally use it to execute native kernels on the processor, in a command queue, asynchronously, and multiple one can be executed in parallel. A native kernel is a simple C/C++ function that we queue for execution on a CPU device, so there is no compiler, no bitcode, etc.

Here is a sample code executing a simple kernel (original in tests/test_kernel.cpp) :

#include <CL/cl.h>

struct args
    size_t buffer_size;
    char *buffer;

static void native_kernel(void *a)
    struct args *data = (struct args *)a;
    int i;
    // Not
    for (int i=0; i<data->buffer_size; ++i)
        data->buffer[i] = ~data->buffer[i];

int main(int argc, char **argv)
    cl_platform_id platform = 0;
    cl_device_id device;
    cl_context ctx;
    cl_command_queue queue;
    cl_event events[2];
    cl_mem buf1, buf2;
    char s1[] = "Lorem ipsum dolor sit amet";
    char s2[] = "I want to tell you that you rock";
    // Initialize the context
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_DEFAULT, 1, &device, 0);
    ctx = clCreateContext(0, 1, &device, 0, 0, 0);
    // And the command queue
    queue = clCreateCommandQueue(ctx, device, 
                                 CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, 0);

    // Create two buffers
    buf1 = clCreateBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, 
                          sizeof(s1), (void *)&s1, 0);
    buf2 = clCreateBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
                          sizeof(s2), (void *)&s2, 0);
    // Enqueue native kernels
    struct args a;
    const void *mem_loc = (const void *)&a.buffer; // Tell OpenCL to complete the struct
    a.buffer_size = sizeof(s1);
    clEnqueueNativeKernel(queue, &native_kernel, &a, sizeof(a),
                          1, &buf1, &mem_loc, 0, 0, &events[0]);
    a.buffer_size = sizeof(s2);
    clEnqueueNativeKernel(queue, &native_kernel, &a, sizeof(a),
                          1, &buf2, &mem_loc, 0, 0, &events[1]);
    // Wait for events
    clWaitForEvents(2, events);
    // Finished
    return 0;

The code has been pushed on git. You look at the diff to see how I implemented this. The code is full of casts, I don’t really like that, but it’s low-level and I try to keep a good code quality.

I’ll now polish a bit what I already made (for instance implementing clUnmapMemObject, an easy function), then I’ll begin the OpenCL C kernels. My exams will end on Tuesday, I will finally have plenty of time to work on Clover before my vacation (starting July 3)

Have fun parallelizing your applications !


One response to “Use OpenCL to execute native kernels

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: