Other articles

  1. How to make my array

    Let's make a friendly array for CPU/GPU processing

    (A) Very well-supported, but impractical to use and error prone.
    (B) Cannot be memcpy'ed. Contains a pointer. Indexing is indirect.
    (C) Perfect if the array does not change.
    (D) Need to unpack. No direct indexing possible.

    (C) is memcpy-able, does no …

    read more
  2. OpenCL zerocopy


    This is both CPU and GPU main memory. How to avoid copying buffers from and to the same memory?

    Allocate aligned memory

    // note: buffer_size is re-written in place
    void *zerocopy_alloc(size_t &buffer_size)
        // OpenCL zero-copy buffers need to be aligned on a page size (i.e., 4KB),
        // and the allocated …
    read more
  3. OpenCL with pocl

    pocl will convert your opencl kernels into AVX-vectorized CPU binaries

    pocl stands for portable compute language [1].

    Building pocl

    For sweet and crispy pocl, you'll need:

    • warm up your favorite C++ compiler
    • prepare the follow baking packages (if you have an Ubuntu 16.04 kitchen)
    # llvm clang 5.0 or …
    read more
  4. OpenCL 2.0 overview

    What's possible with OpenCL 2.0 today?


    AMDGPU-PRO is the closed-source implementation of OpenCL 2.x for AMD. It only targets AMD GPU. AMD used to provided a CPU target, but it is OpenCL 1.2 only.

    rocm is the heterogeneous computing effort of AMD, and is open-source. It …

    read more
  5. Day 9: Week summary

    In the "thesis reboot" post, I brought back from the dead the dusty implementation of my master thesis. Seeing that it was still (almost) working, a week off work was planned to prepare its adaptation to the almighty OpenCL. This post summarizes what this week was all about.

    TL; DR …

    read more
  6. Day 9: OpenCL target debugging with gdb-igfx

    How to use Intel gdb-igfx to debug Intel GPU.

    Everything starts with this article: Getting Started with Debugging with Intel® Parallel Studio XE 2016 [1]. unfortunately, the article is quite old now ("Last updated on August 25, 2015") and not really up-to-date anymore.

    No, wait, there is a newer version …

    read more
  7. Day 8: OpenCL blocks

    How to have a function taking a callback as parameter?

    You cannot use function pointers in OpenCL, even if they can be resolved during compilation.

    It is not possible to write this (while it works in CUDA since the beginning):

    void do_operation(const int* A, const int* B, int* C …
    read more
  8. Day 6: OpenCL malloc

    How to make OpenCL memory allocation functions more like normal malloc/memcpy?

    The problem

    There is a problem with clCreateBuffer [1] is that it returns an opaque cl_mem object, rather than a good old pointer to the allocated memory.

    cl_mem clCreateBuffer (
        cl_context context,
        cl_mem_flags flags,
        size_t size,
        void *host_ptr,
        cl_int …
    read more
  9. Day 7: OpenCL on ARM Mali

    Surprisingly, ARM Mali and Intel HD Graphics GPUs are similar in their design, in comparison to NVidia and AMD's designs.

    Intel and ARM are friendlier to divergent threads

    On the left, ARM Mali [ARM], on the right Intel IGP [INTEL].

    The similarity seems to stem from the two companies having …

    read more
  10. Day 5: OpenCL basics

    What is a device, a compute unit (CU), a work-group, a work-item, a command-queue?

    Work-group vs Work-item

    From [1]:
    A multi-core CPU or multiple CPUs (in a multi-socket machine) constitute a single OpenCL device. Separate cores are compute units. Work-group [is a ] a collection of related work-items that execute on …
    read more
  11. Intel IGP, OpenCL, and GDB

    So nice!

    Just discovered that Intel provides a special version of GDB for debugging OpenCL kernels on their GPU [1]. What a great move!

    Knowing Intel, I was already imagining that they would have a fully proprietary solution like the Intel debugger [2]. Fortunately, they deprecated it in 2013 in …

    read more
  12. Intel IGP, OpenCL compute units, and divergent threads

    How many OpenCL compute unit has an Intel integrated graphic processor? aka How many divergent threads can be run in parallel?

    On the quest to finding the OpenCL-compatible device having the largest amount of compute units (CU), it seems that around 30 CU, latest nVIDIA achitectures tend to feature a …

    read more

Page 1 / 1