Let's make a friendly array for CPU/GPU processing
(A) Very well-supported, but impractical to use and error prone.
(B) Cannot be memcpy'ed. Contains a pointer. Indexing is indirect.
(C) Perfect if the array does not change.
(D) Need to unpack. No direct indexing possible.
In the "thesis reboot" post, I brought back from the dead the dusty implementation of my master thesis. Seeing that it was still (almost) working, a week off work was planned to prepare its adaptation to the almighty OpenCL. This post summarizes what this week was all about.
Everything starts with this article: Getting Started with Debugging with Intel® Parallel Studio XE 2016. unfortunately, the article is quite old now ("Last updated on August 25, 2015") and not really up-to-date anymore.
A multi-core CPU or multiple CPUs (in a multi-socket machine) constitute a single OpenCL device. Separate cores are compute units.
Work-group [is a ] a collection of related work-items that execute on …