The return of the forgotten Master thesis

The Master thesis returns after seven years!

Seven years ago...

Seven years ago, I wrote my Master thesis [1] about using GPGPU for processing queries on XML trees.

Unfortunately, the result was not impressive. There were two major drawbacks:

  • It was required to first copy XML metadata to the GPU memory → only applicable to complicated queries and/or large XML datasets
  • Searching threads were divergent, forcing to use one compute unit per thread, while the GPUs available at the time did do feature many of them → a NVidia Quadro FX 4800, as used in the tests, only had 8

Annoying implementation difficulties had to be solved [2] for that solution:

  • Pointers in the GPU and CPU memory were incompatibles → had to use offset instead of pointer in all data structures
  • No debugging possible at all on the GPU → had to develop an abstraction layer in order to build the parallel kernel for CPU execution and debug there
  • Dynamic memory allocation is complicated → had to develop a memory manager framework

Seven years later... all those points are solved by the Intel NUC Skull NUC6i7KYK:

  • Unified GPU and CPU memory (and pointers).
  • 4 CPU + 72 GPU compute units.
  • Debugging on GPU and CPU with the same software.

There is even a bonus points:

  • OpenCL makes the implementation portable across different targets: from cheap ARM CPU-only single-board computers [3], via Intel heterogenous CPU + GPU workstations [4] or AMD open-source CPU + GPU + Discrete_GPU solutions [5], to Altera dedicated FPGA designs [6].

The Master thesis returned and is even angrier than in the first episode...