The Master thesis returns after seven years!
Seven years ago...
Seven years ago, I wrote my Master thesis  about using GPGPU for processing queries on XML trees.
Unfortunately, the result was not impressive. There were two major drawbacks:
- It was required to first copy XML metadata to the GPU memory → only applicable to complicated queries and/or large XML datasets
- Searching threads were divergent, forcing to use one compute unit per thread, while the GPUs available at the time did do feature many of them → a NVidia Quadro FX 4800, as used in the tests, only had 8
Annoying implementation difficulties had to be solved  for that solution:
- Pointers in the GPU and CPU memory were incompatibles → had to use offset instead of pointer in all data structures
- No debugging possible at all on the GPU → had to develop an abstraction layer in order to build the parallel kernel for CPU execution and debug there
- Dynamic memory allocation is complicated → had to develop a memory manager framework
Seven years later... all those points are solved by the Intel NUC Skull NUC6i7KYK:
- Unified GPU and CPU memory (and pointers).
- 4 CPU + 72 GPU compute units.
- Debugging on GPU and CPU with the same software.
There is even a bonus points:
- OpenCL makes the implementation portable across different targets: from cheap ARM CPU-only single-board computers , via Intel heterogenous CPU + GPU workstations  or AMD open-source CPU + GPU + Discrete_GPU solutions , to Altera dedicated FPGA designs .
The Master thesis returned and is even angrier than in the first episode...
|||Vincent Jordan, Master thesis, "Parallel XML query processing using GPGPU", https://vjordan.info/research/docs/master-thesis/report.xhtml|
|||gputwigstack, source code repository, https://git.vjordan.info/vjp/gputwigstack|
|||ARM, Compute library, https://developer.arm.com/technologies/compute-library|
|||Intel, OpenCL SDK, https://software.intel.com/en-us/intel-opencl|
|||AMD, Radeon Open Compute, https://github.com/RadeonOpenCompute/ROCm|