Intel IGP, OpenCL compute units, and divergent threads

How many OpenCL compute unit has an Intel integrated graphic processor? aka How many divergent threads can be run in parallel?

On the quest to finding the OpenCL-compatible device having the largest amount of compute units (CU), it seems that around 30 CU, latest nVIDIA achitectures tend to feature a lower number of CU (aka multiprocessor in Nvidia's terminology), while packing more cores in each of them, while AMD/ATI's architectures stays at 64 CU units for the latest VEGA GPU (todo insert here the name of a CU in AMD naming).

But wait! There is another option: Intel's IGP [1], and surprise surprise, the "Intel® Iris™ Pro Graphics 580" has 72 EU (execution unit) [2].

Note: this is not the latest GPU from Intel, later Iris Pro GPU (6xx) actually have less EU, and probably more processing core in each of them, in the same trend followed by Nvidia. Also eDRAM is reduced.

The question is now: are Intel's EU [3] just another name for OpenCL CU? [4] ...and is it worth buying the Intel NUC Skull NUC6i7KYK for interesting parallel computing with divergent threads?

Intel OpenCL

Intel OpenCL driver [5], installation notes [6], release notes [7], (badly formated) architecture describtion [8], and other developers guides [9].

Ubuntu server needs a Hardware enablement stack for the Linux kernel [10].

Enable i915 when no monitor is attached (headless)

There are several sources on the internet decribing that the GPU is disabled if no screen is attached, for example [11]. There are tutorials to build dummy plugs for graphic cards [12].

I randomly noticed on my Gigabyte BRIX [13], that clinfo does not find any GPU as long as I do not login locally (with real keyboard). This also works when no screen is attached. One only need to blindly input user name and password, et voilà, clinfo over your ssh session suddenly had a revelation about the existence of your integrated GPU.

This key finding enables you to reach the answer [14]: the user must be in the video group.

Playing with Celeron N3150

$ grep -m 1 name /proc/cpuinfo
model name  : Intel(R) Celeron(R) CPU  N3150  @ 1.60GHz
$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:22b1] (rev 21)

Conclusion

Yes, 12 EU (i.e., Execution Unit) are seen as 12 CU (i.e., Compute Unit), and the 4 CPU cores are seen as 4 CU as well.