Playing with Intel OpenCL SDK and Etherneum mining
About the hardware
The Intel 6th-generation Core i7-6770HQ [1] is a neat little beast. Here are lspci [LSPCI], cpuinfo [CPUINFO], and clinfo [CLINFO] from an Intel NUC6i7KYK with Ubuntu server 16.04.
OpenCL "Installable Client Drivers" (ICD)
In order to prevent every OpenCL runtime library from fighting the other one to become the one and only libOpenCL.so of the system, a generic/dummy library was created, which dispatch the OpenCL calls to one of the available ICD of the system: from AMD, NVidia, Intel or others. More details at [4].
ICD register themselves in /etc/OpenCL/vendors/ by creating a simple text file which contains the path the actual OpenCL library. For example, the path for Intel runtime is
~$ cat /etc/OpenCL/vendors/intel.icd /opt/intel/opencl/libIntelOpenCL.so
First step, install the generic OpenCL library, along with the official OpenCL headers (only needed for development). clinfo is also very useful to list the available devices:
sudo apt install ocl-icd-libopencl1 sudo apt install opencl-headers sudo apt install clinfo
Of course, if clinfo is executed now, no device is found because there is no real OpenCL library yet:
~$ clinfo Number of platforms 0
Intel OpenCL (GPU+CPU)
Intel OpenCL SDK and runtime are found at [5]. Direct link to the package [6], installation notes [7] and release notes [8].
From the installation notes
Intel OpenCL driver require Linux kernel 4.8
Ubuntu 16.04 with the default kernel works fairly well but some core features (i.e. device enqueue, SVM memory coherency, VTune support) won’t work without kernel patches. This configuration has been minimally validated to prove that it is viable to suggest for experimental use, but it is not fully supported or certified.
SWM memory coherency? [9]

If you prefer to avoid patching your linux kernel, you will need kernel 4.8 or newer. On Ubuntu server 16.04, you might need to install a Hardware Enablement pack (aka HWE) to get a more recent kernel:
sudo apt install linux-generic-hwe-16.04
intel-opencl-r5.0-63503.x86_64.tar.xz etc/ ld.so.conf.d/ libintelopencl.conf /opt/intel/opencl OpenCL/ vendors/ intel.icd /opt/intel/opencl/libIntelOpenCL.so opt/ LICENSE docs NOTICES docs igdclbif.bin kernel-4.4-xcode.patch kernel patch (no patch needed for linux-4.8 and later) kernel-4.7.patch kernel patch (no patch needed for linux-4.8 and later) libIntelOpenCL.so main OpenCL library libOpenCL.so If you don't use OpenCL ICD libOpenCL.so.1 If you don't use OpenCL ICD libcommon_clang.so OpenCL kernels are built on-demand for each device libcommon_clang_legacy.so libiga64.so libigdbcl.so libigdccl.so libigdfcl.so libigdfcl_legacy.so libigdmcl.so libigdrcl.so libmd.so
After moving the files at the right places, do not forget to run sudo ldconfig in order to refresh the ld cache.
You maybe notice that the CPU is not found in the OpenCL device. Using sudo strace clinfo, you can see if some specific libraries are search and not found. For example, libnuma.so was missing on my system:
sudo apt install libnuma1
AMD OpenCL (CPU-only)
AMD OpenCL CPU SDK [10] also works with Intel CPU.
Etherneum mining
Two softwares are needed: the Etherneum client, and Etherneum miner. There are multiple implementation of each. An (outdated?) post about starting to mine [11].
To create a new account with the Etherneum client:
~$ geth account new WARN [09-30|11:09:00] No etherbase set and no accounts found as default Your new account is locked with a password. Please give a password. Do not forget this password. Passphrase: Repeat passphrase: Address: {ed318e9348d616594f1286072c3b6fc8b61a8fdf}
Pre-built binary of an open-source Etherneum miner can be found at [12].
My command to mine:
sudo ./ethminer -G -S eu1.ethermine.org:4444 -FS us1.ethermine.org:4444 -O 0xed318e9348d616594f1286072c3b6fc8b61a8fdf.nuc --farm-recheck 200
You will notice that ethminer only uses GPU devices even if CPU devices are available with sufficient memory (> 2GB).
In this file [13]
_platforms[platform_num].getDevices(
CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR,
&devices
);
Add CL_DEVICE_TYPE_CPU to the bitfield here, to select CPU as well.
At around 1.3 MHash/second, you are loosing money [14]. This Intel chip is not very adapted because of its limited CPU/GPU-to-memory bandwidth, which is the most important for Etherneum mining. High-end GPUs with on chip HBM2 memory achieve 30 MHash/sec. Intel memory CPU/GPU memory bandwidth: 34GB/s, NVidia Quadro GP100 bandwidth: 720GB/s.

Wall power consumption at full load is 77W [15].
Intel GPU tools
Would be nice to a kind of top command for GPU like there is for CPU, right? See [16].
Check intel_gpu_top [17].
References
[1] | Intel ARK, Core i7-6770HQ, https://ark.intel.com/products/93341/Intel-Core-i7-6770HQ-Processor-6M-Cache-up-to-3_50-GHz |
[2] | Geekbench, Intel Skull Canyon, https://browser.geekbench.com/v4/compute/223790 |
[3] | GfxBench, Intel® Core™ i7-6770HQ CPU with Iris™ Pro Graphics 580, https://compubench.com/device.jsp?D=Intel%28R%29+Core%28TM%29+i7-6770HQ+CPU+with+Iris%28TM%29+Pro+Graphics+580&testgroup=overall |
[4] | Andreas Klöckner's wiki, How to set up OpenCL in Linux, https://wiki.tiker.net/OpenCLHowTo |
[5] | OpenCL™ Drivers and Runtimes for Intel® Architecture, https://software.intel.com/en-us/articles/opencl-drivers |
[6] | http://registrationcenter-download.intel.com/akdlm/irc_nas/11396/SRB5.0_linux64.zip |
[7] | http://registrationcenter-download.intel.com/akdlm/irc_nas/11396/SRB5.0_intel-opencl-installation.pdf |
[8] | http://registrationcenter-download.intel.com/akdlm/irc_nas/11396/SRB5.0_intel-opencl-release-notes.pdf |
[9] | https://community.arm.com/processors/b/blog/posts/exploring-how-cache-coherency-accelerates-heterogeneous-compute |
[10] | AMD APPSDK, http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/ |
[11] | https://www.meebey.net/posts/ethereum_gpu_mining_on_linux_howto/ |
[12] | https://github.com/ethereum-mining/ethminer/releases |
[13] | https://github.com/ethereum-mining/ethminer/blob/master/libethash-cl/CLMiner.cpp |
[14] | https://etherscan.io/ether-mining-calculator |
[15] | https://www.anandtech.com/show/10343/the-intel-skull-canyon-nuc6i7kyk-minipc-review/7 |
[16] | http://www.rkblog.rk.edu.pl/w/p/monitoring-amd-intel-and-nvidia-graphics-card-usage-under-linux/ |
[17] | https://01.org/linuxgraphics/gfx-docs/igt/ |
Appendix
[LSPCI] | List of PCI peripherals |
$ lspci 00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 0a) 00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 09) 00:08.0 System peripheral: Intel Corporation Sky Lake Gaussian Mixture Model 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31) 00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1) 00:1c.1 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #2 (rev f1) 00:1c.2 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #3 (rev f1) 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1) 00:1d.4 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #13 (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31) 00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31) 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31) 02:00.0 SD Host controller: O2 Micro, Inc. Device 8621 (rev 01) 03:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a) 3e:00.0 Non-Volatile memory controller: Device 1987:5007 (rev 01)
[CPUINFO] | CPU info (smae info repeated for 8 cores) |
ina@nuc:~$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 94 model name : Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz stepping : 3 microcode : 0x8a cpu MHz : 845.101 cache size : 6144 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5183.87 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:
[CLINFO] | List of OpenCL-compatible devices |
platformes
Number of platforms 2 Platform Name Intel(R) OpenCL Platform Vendor Intel(R) Corporation Platform Version OpenCL 2.0 Platform Profile FULL_PROFILE Platform Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir Platform Extensions function suffix INTEL Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.0 AMD-APP (1800.8) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Extensions function suffix AMD
platform 0, device 0
Platform Name Intel(R) OpenCL Number of devices 2 Device Name Intel(R) HD Graphics Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 2.0 Driver Version r5.0.63503 Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Profile FULL_PROFILE Max compute units 72 Max clock frequency 950MHz Device Partition (core) Max number of sub-devices 0 Supported partition types by <unknown> (0x7F4600000000) Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 32 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 1 / 1 half 8 / 8 (cl_khr_fp16) float 1 / 1 double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 6595090842 (6.142GiB) Error Correction support No Max memory allocation 3297545421 (3.071GiB) Unified memory for Host and Device Yes Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 64 bytes Global 64 bytes Local 64 bytes Max size for global variable 65536 (64KiB) Preferred total size of global vars 3297545421 (3.071GiB) Global Memory cache type Read/Write Global Memory cache size 1572864 Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 206096588 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 4 bytes Pitch alignment for 2D image buffers 4 bytes Max 2D image size 16384x16384 pixels Max 3D image size 16384x16384x2048 pixels Max number of read image args 128 Max number of write image args 128 Max number of read/write image args 128 Max number of pipe args 16 Max active pipe reservations 1 Max pipe packet size 1024 Local memory type Local Local memory size 65536 (64KiB) Max constant buffer size 3297545421 (3.071GiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties (on host) Out-of-order execution Yes Profiling Yes Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 131072 (128KiB) Max size 67108864 (64MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop Yes Profiling timer resolution 83ns Execution capabilities Run OpenCL kernels Yes Run native kernels No SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel Motion Estimation accelerator version (Intel) 2 Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_driver_diagnostics cl_intel_media_block_io cl_intel_motion_estimation cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_required_subgroup_size cl_intel_subgroups cl_intel_subgroups_short cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp16 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir cl_khr_subgroups
platform 0, device 1
Device Name Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 2.0 (Build 475) Driver Version 1.2.0.475 Device OpenCL C Version OpenCL C 2.0 Device Type CPU Device Profile FULL_PROFILE Max compute units 8 Max clock frequency 2600MHz Device Partition (core) Max number of sub-devices 8 Supported partition types by counts, equally, by names (Intel) Max work item dimensions 3 Max work item sizes 8192x8192x8192 Max work group size 8192 Preferred work group size multiple 128 Preferred / native vector sizes char 1 / 32 short 1 / 16 int 1 / 8 long 1 / 4 half 0 / 0 (n/a) float 1 / 8 double 1 / 4 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 8254349312 (7.687GiB) Error Correction support No Max memory allocation 2063587328 (1.922GiB) Unified memory for Host and Device Yes Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 64 bytes Global 64 bytes Local 0 bytes Max size for global variable 65536 (64KiB) Preferred total size of global vars 65536 (64KiB) Global Memory cache type Read/Write Global Memory cache size 262144 Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 480 Max size for 1D images from buffer 128974208 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 64 bytes Pitch alignment for 2D image buffers 64 bytes Max 2D image size 16384x16384 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 480 Max number of write image args 480 Max number of read/write image args 480 Max number of pipe args 16 Max active pipe reservations 32767 Max pipe packet size 1024 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 131072 (128KiB) Max number of constant args 480 Max size of kernel argument 3840 (3.75KiB) Queue properties (on host) Out-of-order execution Yes Profiling Yes Local thread execution (Intel) Yes Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 4294967295 (4GiB) Max size 4294967295 (4GiB) Max queues on device 4294967295 Max events on device 4294967295 Prefer user sync for interop No Profiling timer resolution 1ns Execution capabilities Run OpenCL kernels Yes Run native kernels Yes SPIR versions 1.2 printf() buffer size 1048576 (1024KiB) Built-in kernels Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
platform 1, device 0
Platform Name AMD Accelerated Parallel Processing Number of devices 1 Device Name Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz Device Vendor GenuineIntel Device Vendor ID 0x1002 Device Version OpenCL 1.2 AMD-APP (1800.8) Driver Version 1800.8 (sse2,avx) Device OpenCL C Version OpenCL C 1.2 Device Type CPU Device Profile FULL_PROFILE Device Board Name (AMD) Device Topology (AMD) (n/a) Max compute units 8 Max clock frequency 899MHz Device Partition (core, cl_ext_device_fission) Max number of sub-devices 8 Supported partition types equally, by counts, by affinity domain Supported affinity domains L3 cache, L2 cache, L1 cache, next partitionable Supported partition types (ext) equally, by counts, by affinity domain Supported affinity domains (ext) L3 cache, L2 cache, L1 cache, next fissionable Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 1024 Preferred work group size multiple 1 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 4 / 4 (n/a) float 8 / 8 double 4 / 4 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 8254349312 (7.687GiB) Error Correction support No Max memory allocation 2147483648 (2GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size 32768 Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 8192x8192 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 64 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 4096 (4KiB) Queue properties Out-of-order execution No Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 1ns Profiling timer offset since Epoch (AMD) 1506759285803052634ns (Sat Sep 30 10:14:45 2017) Execution capabilities Run OpenCL kernels Yes Run native kernels Yes SPIR versions 1.2 printf() buffer size 65536 (64KiB) Built-in kernels Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event