# Data-Aware Algorithms

Loris Marchal, Fall 2019

In many today's computing systems, the cost, both in time and energy, of moving the data has become predominant compared to the cost of computing. Part of the solution for this problem can be found in algorithms: over the years, several computation models and algorithmic strategies have been proposed to optimize data movement instead of (or in addition to) the classical objective of computational complexity. The goal of this course is to present a variety of these solutions.

## Modeling register allocations

In this first part, we will review studies which dates back to the 1960s, the pioneering work of Hong and King on I/O complexity, and their extensions.

• Register allocation on DAGs
• Pebble games
• I/O complexity, lower bounds
• Space-time tradeoffs

## Cache-oblivious algorithms and to external memory models

In some cases, it is possible to design algorithms that minimize the data accesses whatever the size of the memory, or the cache, even when this size in unkown to the algorithm.

• Cache oblivious algorithms for simple operations
• External memory and parallel external memory models

## Communication-avoiding algorithms

In numerical linear algebra, some algorithms have been proposed who reach the lower bounds on data movement. We will study some of these algorithms, both for the sequential case (I/O minimization) and the parallel case (communication minimization).

• Principle of communication-avoiding algorithms
• Communication-avoiding matrix multiplication
• Extension to other linear algebra operations

## Memory aware DAGs scheduling

Several scientific computations can be modeled as DAGs of tasks. The way the DAG is scheduled (traversed) influences the amount of memory needed for the computation.

• Scheduling algorithms for trees
• Extension to more general DAGs
• Extension to parallel scheduling

## Algorithms for Big Data

Some data are so large that we have to drastically limit the data access when analizing them, possibly with an impact on the accuracy of the result: it is sometimes better to have a quick good estimate than a precise result that takes forever to compute. Models have been proposed that either allow a simple linear scan of the data, and/or limit the temporary storage of the analysis.

• Data strem model
• Approximate counting (Morris' algorithm)
• Similarity search
• Finding frequent items

### Evaluation

The evaluation will consist in two parts:

• Written evaluations on the models and algorithms seen during classes (~two evaluations of 1h each).
• Presentation of proposed research articles (one paper per student, scattered during the course depending on the chosen article)

Created: 2018-12-06 Thu 11:05

Emacs 25.3.50.1 (Org mode 8.2.10)

Validate