Day 8: OpenCL blocks

How to have a function taking a callback as parameter?

You cannot use function pointers in OpenCL, even if they can be resolved during compilation.

It is not possible to write this (while it works in CUDA since the beginning):

void do_operation(const int* A, const int* B, int* C, int (*op)(const int a, const int b))
{
    C[get_global_id(0)] = op(A[get_global_id(0)], B[get_global_id(0)]);
}

int op_add(const int a, const int b)
{
    return a + b;
}

void kernel simple_add(global const int* A, global const int* B, global int* C)
{
    do_operation(A, B, C, &op_add);
}

OpenCL 2.0 to the rescue! Done? OpenCL 2.0 does everything so much better than OpenCL 1.2, it surely supports function pointer? Actually no, it still does not. But it has clang block syntax (closure-like entity).

clang blocks

OpenCL blocks [1] come from clang blocks [2] which themselves come from Apple [3].

void do_operation(const int* A, const int* B, int* C, int (^op)(const int a, const int b))
{
    C[get_global_id(0)] = op(A[get_global_id(0)], B[get_global_id(0)]);
}

void kernel simple_add(global const int* A, global const int* B, global int* C)
{
    do_operation(A, B, C, ^(const int a, const int b) { return a + b; });
}

It does not solve the issue:

$ ./simple_add
Using platform: Intel(R) OpenCL
Using device: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
 Error building:
Compilation started
Compilation done
Linking started
Linking done
Device build started
Failed to build device program
Error: unresolved pointer calls in function(s):
do_operation
_Z12do_operationPU3AS1iS0_S0_Pi
CompilerException Dynamic block variable call detected.

good old macro

The only solution I found (please write a comment if you know better), is to generate code use C macro, as suggested by [4]:

#define CAT(X,Y) X##_##Y   // concatenate words
#define TEMPLATE(X,Y) CAT(X,Y)

#define OP_NAME op_add
#define OP_BLOCK ^(const int a, const int b) { return a + b; }
typedef int (^do_operation_op_block_t)(const int a, const int b);
void TEMPLATE(do_operation, OP_NAME) (const int* A, const int* B, int* C)
{
    do_operation_op_block_t op = OP_BLOCK;
    C[get_global_id(0)] = op(A[get_global_id(0)], B[get_global_id(0)]);
}

void kernel simple_add(global const int* A, global const int* B, global int* C)
{
    TEMPLATE(do_operation, op_add)(A, B, C);
}

Note: you have to put the "template" function in a separate header file, and include it again each time a new specialization of the template is needed.