Advanced Compilers: Loop Transformations and High-Level Synthesis
Tomofumi Yuki (CR Inria, contact person) and
Christophe Alias (CR Inria).
(possibly with the help of Alain Darte)
Prerequisites
- Undergraduate compiler and architecture course.
- Strong interest for both applied and theoretical research.
Overview
The main topic of this course is to understand high-level (loop-level)
transformations employed by "optimizing" compilers and the techniques
used in these compilers to reason about such optimizations. The two
key performance principles in optimizing compilers are parallelism and
data locality, and compilers must reason at the level of loop nests to
identify and to fully exploit opportunities for improvement. These
principles are applicable to most architecture in use today, general
purpose CPUs, GPUs, distributed memory computing, and even to
FPGAs/hardware with the increasing maturity of High-Level Synthesis
tools.
In this course, we will cover various techniques and frameworks for
reasoning about programs, with emphasis on the polyhedral model, which is one of the most successful formalism for automatic
parallelization today. We will also interact with a number of research tools that implement the techniques covered in the course.
Course Outline
Part I. Loop Transformations (8 lectures)
- Introduction
- Target architectures: importance of parallelism and locality
- Optimizing (source-to-source) compilers
- Slides: L01
- HW: paper-review (due in 3 weeks)
- HW Instruction: HW01
- HW Detailed Spec: paper-review-spec
- Dependence Analysis
- Analysis of loops with array accesses
- Low-cost, inexact (approximate) methods (Banerjee test, GCD test, ...)
- Exact dependence analysis (PRDG, array dataflow analysis)
- Affine loop nests
- Loop view, algebraic view, geometric view
- Mathematical objects (parameterized polyhedra, affine lattices, ...)
- Slides: L02
- Transformations
- Unimodular and Affine Transformations
- Loop Tiling
- Scheduling
- Affine scheduling (1D case, nD-case)
- Greedy algorithm, optimality
- Slides: L04
- Loop Tiling
- Loop tiling, efficiency criteria (time and communication)
- Finding (good) tilable schedules (Pluto algorithm)
- Tile size selection, tile shapes, handling parametric tile sizes
- Slides: L05
- Code Generation
- Projection algorithm (perfect loop nest)
- Projection/separation algorithm (imperfect loop nests)
- Tiled Code Generation
- Slides: L06
- Beyond Loop Transformations
- Sparse matrix vector product
- Inspector/Executor pattern
- Sparse Polyhedral Framework
- Simplifying Reductions: reducing computational complexity
- Semantic tiling
- Slides: L07
- Memory + Power
- Memory Allocation
- Universal Occupancy Vectors
- Legal Allocation given a Schedule
- Power and Energy
- What can be done in the compiler?
- Compiling for Energy = Compiling for Speed
- Slides: L08
Part II. Application to High-Level Synthesis (4 lectures)
- Execution Models (KPN, CRP, PPN, ...)
- Definition, correctness and determinism
- Direct translation (mux/demux)
- Control & Data-path Synthesis
- Piece-wise affine functions: simplification, pipelining, speculation
- Loop tiling for pipelined datapath
- Internal Communications
- Buffer sizing and allocation
- Synchronization generation
- FIFO detection
- Tool: Bee
- External Communications
- Loop tiling and I/O pipelining
Evaluation
Homework/presentation of a research paper.
References