- Undergraduate course on compilers and computer architecture.
- Strong interest for both applied and theoretical research.

Since the end of Dennard scaling, processor architectures rely more
and more on parallelism to gain performances. Implementing and
debugging parallel programs is bug prone and the programmer
want to rely as much as possible on the compiler to automate this
task.
This course is about *automatic parallelization*, the process of
translating automatically a sequential program to a parallel
program. Automatic parallelization raises many challenges: How to
represent the computation? How much parallelism? Which parallelism?
Which resource allocation? How to generate the target code/circuit?
This course will present the current challenges, trends and
state-of-the-art parallelizing technologies, with a special focus on
*high-level circuit synthesis for FPGA*, a fondamental application of
automatic parallelization.

**Introduction**- Parallel architectures and optimization issues
- Parallel models of computation: KPN, PPN, DPN
- Challenges and trends of automatic parallelization

**Polyhedral Model**- Polyhedral traces and instance-wise program analysis
- Dependence Analysis: PRDG, Array dataflow analysis
- Loop Transformations: linear and affine scheduling, communication-optimal loop tiling
- Code Generation: linear case, affine case.

**Topics in High-Level Synthesis for FPGA**- FPGA technology: challenges and trends
- Control & Data-path Synthesis
- Generation of buffers, FIFOs and synchronizations
- Loop tiling and I/O pipelining

**Hybrid analysis**- Dynamic/static dependence graph, static analysis, profiling tools
- Performance bugs. Hot regions, fat regions
- Alias disambiguation: non-overlapping intervals
- Dependence disambiguation

**Sparse data-flow analysis**- Static Single Assignment
- Static Single Information

**Data-movement complexity analysis**- Data mouvement versus computation. Roofline model. Complexity, analysis
- Convex graph partitioning. Upper bound analysis for I/O complexity
- Projections and volumes. Lower bound analysis for I/O complexity

**Step 1: Choose a paper (deadline: asap).** choose a paper among the following and notify C. Alias
(mail above) asap:

- Execution-Cache-Memory Performance Model: Introduction and Validation
**Victor Lutfalla** - The Polyhedral Model Is More Widely
Applicable Than You Think
**Laureline Pinault** - Enabling Automatic Pipeline Utilization Improvemnet in Polyhedral Process Network Implementations
- A roofline model of energy
- A theoretical framework for algorithm-architecture co-design
- Integer Set Coalescing
- Automatic Tiling of "Mostly-Tileable" Loop Nests
**Clement Sartori** - Improved Derivation of Process Networks
**Maverick Chardet** - Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
- McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures
- On Program Equivalence with Reductions
**Agathe Herrou** - Program Locality Analysis Using Reuse Distance
**Raphael Monat** - Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation
**Remy Grunblatt**

**Step 2: Report (deadline: january 12).** Write a small report (3-4 pages) using the LaTeX template here. **Send your report to Christophe Alias and Fabrice Rastello (mail above)**.

**Step 3: Slides (deadline: january 12).** Prepare a presentation
(15 minutes + 5 minutes questions). The timing is strict, overlasting
presentations will be stopped as for any conference presentation. A
good rule of thumb is to spend 1-2 minutes per slide. Hence 10 slides
should be a maximum.
**Send your slides to Christophe Alias and Fabrice Rastello (mail above)**.

**Step 4: Presentation.** Friday 13 january, 14:00 - 17:00, **amphi B**.