# Data-Aware Algorithms

In many today's computing systems, the cost, both in time and energy, of moving the data has become predominant compared to the cost of computing. Part of the solution for this problem can be found in algorithms: over the years, several computation models and algorithmic strategies have been proposed to optimize data movement instead of (or in addition to) the classical objective of computational complexity. The goal of this course is to present a variety of these solutions.

## Part 1: Modeling register allocations

In this first part, we will review studies which dates back to the 1960s, the pioneering work of Hong and King on I/O complexity, and their extensions.

- Register allocation on DAGs
- Pebble games
- I/O complexity, lower bounds
- Space-time tradeoffs

#### Available material:

#### Other resources:

More information on the pebble games in the book "Models Of Computation" by John Savage (pdf available online): (black) pebble game in Chapter 10, red-blue pebble game and hierarchical memory pebble game in Chapter 11.

#### Some articles where results are taken from:

The tight analysis of the matrix product lower bound comes from: Tyler Smith, Bradley Lowery, Julien Langou and Robert van de Geijn. 2019. A Tight I/O Lower Bound for Matrix Multiplication (arXiv preprint).

## Part 2: Cache-oblivious algorithms and to external memory models

In some cases, it is possible to design algorithms that minimize the data accesses whatever the size of the memory, or the cache, even when this size in unkown to the algorithm.

- External memory algorithms
- Cache oblivious algorithms
- Parallel external memory models
- Schedulers for multithreaded computations (parallel cache oblivious)

#### Available material:

#### Other courses/resources:

- External Memory Algorithms and Data Structures by Gerth Stølting Brodal and Rolf Fagerberg, Aarhus University, 2003.
- Cormen et al., Introduction on Algorithms, Chap. 18: B-Trees
- Lecture notes by Erik Demaine on Cache-Oblivious Algorithms and Data Structures at the 2002 EEF Summer School on Massive Data Sets
- Course by Nodari Sitchinava Algorithms for Memory Hierarchies at KIT, lecture notes written by students. Covers both external memory, cache oblivious and PEM and PCO models.
- Proof of LRU competitive ratio (fr) from my previous course on scheduling

#### Some articles where results are taken from:

- Sleator & Tarjan (1985) for the proof of LRU competitive ratio
- Frigo et al. (1999) for the justification of the ideal cache model (Section 6), first cache-oblivious algorithms (matrix mult., etc.)
- Aggarwal and Witter for the proof of I/O lower bounds (sorting and permuting)
- Brodal and Fagerberg (2003) for the proof of searching lower bound (Lemma 1)
- Survey on External Memory Algorithms and Data Structures by Jeffrey Scott Vitter (1998)
- Brodal and Fagerberg (2002): Funnel Sort and Distribution Sweeping
- Arge et al. (2008): PEM model, all-prefix-sum, sorting
- Blumofe and Leiserson (1999): multithreaded computations and work-stealing scheduler

## Part 3: Memory aware DAGs scheduling

Several scientific computations can be modeled as DAGs of tasks. The way the DAG is scheduled (traversed) influences the amount of memory needed for the computation.

- Scheduling algorithms for trees
- Extension to more general DAGs
- Extension to parallel scheduling

#### Available material:

#### Some articles where results are taken from:

- Agullo et al. (2010) analysis of postorder tree traversal for minimizing I/Os
- Liu (1987) optimal algorithms for minimizing memory of trees
- Kayaaslan et al. (2018) optimal algorithms for minimizing memory of SP-graphs
- Jacquelin et al. (2011) analysis of postorder and general traversals for trees
- Marchal et al. (2018) computing and bounding maximum parallel memory

## Part 4: Communication-avoiding algorithms

In numerical linear algebra, some algorithms have been proposed who reach the lower bounds on data movement. We will study some of these algorithms, both for the sequential case (I/O minimization) and the parallel case (communication minimization).

- Principle of communication-avoiding algorithms
- Communication-avoiding matrix multiplication
- Extension to other linear algebra operations

#### Available material:

#### Other courses/resources:

- Lecture at Argonne (video) by Jim Demmel
- Course at UC Berkeley by Jim Demmel and Laura Grigori

#### Some articles where results are taken from:

- Ballard et al. (2011) analysis of generalized matrix computations and algorithm design
- Solomonik et al. (2011) 2.5D matrix multiplication algorithm

## Part 5: Algorithms for Big Data (by Olivier Beaumont)

Some data are so large that we have to drastically limit the data access when analising them, possibly with an impact on the accuracy of the result: it is sometimes better to have a quick good estimate than a precise result that takes forever to compute. Models have been proposed that either allow a simple linear scan of the data, and/or limit the temporary storage of the analysis.

- Data strem model
- Approximate counting
- Similarity search

#### Available material:

#### Other resources:

- Wikipedia entries on Morris approximate counting algorithm, Count-distinct problems and Flagolet-Martin distinct counting algorithm
- Chapter Finding Similar Items from "Mining of Massive Datasets" book, for plagiarism detection (see also slides and videos on the book website)

## Evaluation

The evaluation will consist in two parts:

- Written evaluations on the models and algorithms seen during classes (two evaluations of ~1.5h each).
- Research project per group of 1/2 students:
- Read one research paper (given)
- Explain research question and algorithmic solution(s)
- Contribute to the experimental validation!
- Code some algorithm (100 lines of python is OK)
- Find or generate relevant dataset
- Plot/print relevant results
- Draw (partial) conclusions

- Outcome:
- Report (5/10 pages) due for January 5
- Presentation in class (January 8 or 15)

### Research papers for the project

Pebbles games and extensions:

- Paper 1:
*(chosen by Tran Xuan Thang)*Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication by G. Kwasniewski , M. Kabic, M. Besta, J. VandeVondele, R. Solca, T. Hoefler (SC 2019) - Paper 2: On characterizing the data access complexity of programs by V. Elango, F. Rastello, K.N. Pouchet, J. Ramanujam, and P. Sadayappan (ACM SIGPLAN Notices 2015)

External memory, cache oblivious algorithms:

- Paper 3:
*(chosen by Lucas Perotin and Lucas Venturini)*Lower Bounds for External Memory Dictionaries by G.S. Brodal, R. Fagerberg (SODA 2003) - Paper 4:
*(chosen by Sébastien Michelland)*Write-Optimized Skip Lists by M.A. Bender, M. Farach-Colton, R. Johnson, S. Mauras, T. Mayer, C.A. Phillips, Helen Xu (PODS 2017)

Work stealing and other schedulers:

- Paper 5: The Data Locality of Work Stealing by U.A. Acar, G.E. Blelloch, and R.D. Blumofe (TCS journal, 2002)
- Paper 6:
*(chosen by Teresa Signati and Matthieu Vieira)*Efficient Load Balancing for Wide-Area Divide-and-Conquer Applications by R.V. van Nieuwpoort, T. Kielmann, H.E. Bal (PPoPP 2001)

Task graph scheduling:

- Paper 7:
*(chosen by Rémy Neveu)*Bounded Memory Scheduling Of Dynamic Task Graphs by D. Sbirlea, Z. Budimlic, V. Sarkar (PACT 2014) - Paper 8:
*(chosen by Nicolas Chappe)*Memory-optimal evaluation of expression trees involving large objects by C.-C. Chi-Chung Lam, T. Rauber, G. Baumgartner, D. Cociorva, P. Sadayappan (CLSS journal, 2011)

Randomized linear algebra:

- Paper 9: Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication by P. Drineas, R. Kannan, M. Mahoney (SIAM J. Computing 2006)

Recommendation system:

- Paper 10:
*(chosen by Abou Haidar Calvin and Stéphane Pouget)*Methods for large scale SVD with missing values by M. Kurucz, A.A. Benczur, K. Csalogany (KDD cup and workshops, 2007) with introductory paper: Netflix prize and SVD by S. Gower (IEEE Computer, 2014)