A part of the GPU program development cycle is to incrementally optimize a GPU program by applying manual or semi-automatic optimizations to the source code prior to compilation. This process is tedious and error-prone. One way of preventing these errors is by verifying the GPU program using a deductive verifier such as VerCors, where the user adds annotations to the GPU program to capture functional properties.

In addition to optimizing the GPU program, one also needs to change the annotations to get an optimized GPU program that is also verifiable. The more complex the source program gets (due to incrementally optimizing it), the more challenging it becomes to annotate it.

This is where Alpinist comes in. Alpinist adds deductive verification into the GPU program development cycle and aids the developers in incrementally applying optimizations to a GPU program while preserving the provability of the optimized program.

The approach of Alpinist
The approach of Alpinist

The figure above shows a visual representation of Alpinist's approach in four steps. First, the developer starts with a naive or semi-optimized version of the GPU program. The developer annotates this GPU program and verifies it with VerCors (the deductive verifier). Second, the developer identifies parts of the code that can be optimized and places special opt annotations indicating which optimization Alpinist (the annotation aware program optimizer) should apply. Third, this the annotated, verfied GPU program along with the special opt annotations is given to Alpinist. The output of Alpinist is an annotated, optimized GPU program where the specified optimizations have been performed on the code and annotations. This avoids having to re-annotate the program every time it is optimized for a specific GPU device. Last, this annotated, optimized GPU program can be reverified with VerCors.

This approach has been applied to six frequently used optimizations, namely:

  1. Loop unrolling: Executing loop iterations before the loop.
  2. Iteration merging: Executing multiple loop iterations in one single iteration.
  3. Data prefetching: Prefetching data residing in global memory into registers.
  4. Matrix linearization: Changing a matrix into an array.
  5. Tiling: Increasing the work done by each thread.
  6. Kernel fusion: Joining two consecutive kernels.

Internally, Alpinist applies the optmization in four phases: The parsing phase, the applicability checking phase, the transformation phase and the output phase. The strength of Alpinist lies in the applicability checking phase. The transformation phase consists of applying a specific optimization, such as the optimization in the list above. However, this assumes that the optimization is applicable, which is where the applicability checking phase comes in.

In the applicability checking phase, various techniques are used to determine whether an optimization is applicable. These techniques can range from static analysis of the code to encoding a problem to solve with VerCors. An example of an applicability check is the unrollability of a loop with the loop unrolling optimization. A loop is safe to unroll k times (without guards) if the loop is executed at least k times. This property of k-unrollability can be encoded in a program that can be checked by VerCors.

The interal design of Alpinist
The interal design of Alpinist