# Data-Aware Algorithms

In many today's computing systems, the cost, both in time and energy, of moving the data has become predominant compared to the cost of computing. Part of the solution for this problem can be found in algorithms: over the years, several computation models and algorithmic strategies have been proposed to optimize data movement instead of (or in addition to) the classical objective of computational complexity. The goal of this course is to present a variety of these solutions.

NB: you may check the webpage of this course for Fall 2019.

## Part 1: Modeling register allocations

In this first part, we will review studies which dates back to the 1960s, the pioneering work of Hong and King on I/O complexity, and their extensions.

- Register allocation on DAGs
- Pebble games
- I/O complexity, lower bounds
- Space-time tradeoffs

#### Available material:

#### Other resources:

More information on the pebble games in the book "Models Of Computation" by John Savage (pdf available online): (black) pebble game in Chapter 10, red-blue pebble game and hierarchical memory pebble game in Chapter 11.

## Part 2: Cache-oblivious algorithms and to external memory models

In some cases, it is possible to design algorithms that minimize the data accesses whatever the size of the memory, or the cache, even when this size in unkown to the algorithm.

- External memory algorithms
- Cache oblivious algorithms
- Parallel external memory models
- Schedulers for multithreaded computations (parallel cache oblivious)

#### Available material:

#### Other courses/resources:

- Lecture notes by Erik Demaine on Cache-Oblivious Algorithms and Data Structures at the 2002 EEF Summer School on Massive Data Sets
- Series of lectures at MIT by Erik Demaine, available on YouTube:
- Cormen et al., Introduction on Algorithms, Chap. 18: B-Trees

- Proof of LRU competitive ratio (in French) from my previous course on scheduling, the proof is also available (in English) in Savage's book (see above), at the end of Chapter 11 (Theorem 11.10.1)

#### Some articles where results are taken from:

- Sleator & Tarjan (1985) for the proof of LRU competitive ratio
- Frigo et al. (1999) for the justification of the ideal cache model (Section 6), first cache-oblivious algorithms (matrix mult., etc.)
- Aggarwal and Witter for the proof of I/O lower bounds (sorting and permuting)
- Brodal and Fagerberg (2003) for the proof of searching lower bound (Lemma 1)
- Survey on External Memory Algorithms and Data Structures by Jeffrey Scott Vitter (1998)
- Brodal and Fagerberg (2002): Funnel Sort and Distribution Sweeping

## Part 3: Algorithms for Big Data (by Olivier Beaumont)

Some data are so large that we have to drastically limit the data access when analising them, possibly with an impact on the accuracy of the result: it is sometimes better to have a quick good estimate than a precise result that takes forever to compute. Models have been proposed that either allow a simple linear scan of the data, and/or limit the temporary storage of the analysis.

- Data strem model
- Approximate counting
- Similarity search

#### Available material:

- Slides of all four lectures (Sep 30, Oct. 1st, Oct. 7 & Oct 8)
- Homework for this part (updated on Oct. 20, deadline: November 2nd)

#### Other resources:

- Wikipedia entries on Morris approximate counting algorithm, Count-distinct problems and Flajolet-Martin distinct counting algorithm

- Course on Algorithms for Big Data at Harvard by Jelani Nelson (see in particular Lectures 1 & 2)
- Lecture on Counting Distinct Elements by Jeffrey D. Ullman

## Part 4: Memory aware DAGs scheduling

Several scientific computations can be modeled as DAGs of tasks. The way the DAG is scheduled (traversed) influences the amount of memory needed for the computation.

- Scheduling algorithms for trees
- Extension to more general DAGs
- Extension to parallel scheduling

#### Available material:

#### Some articles where results are taken from:

- Agullo et al. (2010) analysis of postorder tree traversal for minimizing I/Os

- Kayaaslan et al. (2018) optimal algorithms for minimizing memory of SP-graphs
- Jacquelin et al. (2011) analysis of postorder and general traversals for trees
- Marchal et al. (2018) computing and bounding maximum parallel memory

## Part 5: Communication-avoiding algorithms

In numerical linear algebra, some algorithms have been proposed who reach the lower bounds on data movement. We will study some of these algorithms, both for the sequential case (I/O minimization) and the parallel case (communication minimization).

- Principle of communication-avoiding algorithms
- Communication-avoiding matrix multiplication
- Extension to other linear algebra operations

#### Available material:

#### Other courses/resources:

- Lecture at Argonne (video) by Jim Demmel
- Course at UC Berkeley by Jim Demmel and Laura Grigori

#### Some articles where results are taken from:

- Ballard et al. (2011) analysis of generalized matrix computations and algorithm design
- Solomonik et al. (2011) 2.5D matrix multiplication algorithm

## Evaluation

The evaluation will consist in two parts:

- 2 Homeworks: exercices given in class
- Research project per group of 1/2 students:
- Paper+Topic to choose in the following list
- Outcome:
- Report (5/10 pages)
- Presentation in class

- If topic includes programming:
- Implement the algorithm
- Perform interesting experiments
- Report the important points/difficulties when coding it
- Report the results and draw some conclusions

- If topic includes theoretical work (proof)
- Read and understand the proof
- Present the proof, possibly skipping some lemmas and intermediate points
- In both cases:
- Present the context and motivation of the paper
- Summarize the whole paper and its contributions

The **presentations** will take place on **November 5 and 12**, and the
**reports are due for November 2**. Presentations can be held remotely
(throught BBB) if needed. If the research project includes coding, the
code must be sent with the report in a archive (zip, tgz or available online).

### Schedule of the presentations

Each group will do a presentation of 15-20 minutes, followed by 5-10 minutes of questions/comments. Here is the schedule, send me a mail to register.

November 5:

- 10:00: (free)
- 10:30: (free)
- 11:00: V. Pasquale (project 9)
- 11:30: (free)
- 13:30: (free)
- 14:00: (free)
- 14:30: (free)
- 15:00: (free)
- 15:30: (free)

November 12:

- 10:00: H. Thievenaz & V. Siguret (project 5)
- 10:30: Q. Corradi (project 8)
- 11:00: J. Bertrand & N. Champseix (project 1)
- 11:30: B. Allorant & P.-M. Marcille (project 10)
- 13:00: (free)
- 13:30: O. Cedron (project 4)
- 14:00: (free)
- 14:30: T. Marette & D. El Zein (project 2)
- 15:00: (free)
- 15:30: (free)

### Research papers for the project

Pebbles games and extensions:

**Project 1**: (*assigned to J. Bertrand and N. Champseix*) On the hardness of Red-Blue Pebble Games by P.A. Papp and R. Wattenhofer, SPAA 2020- Topic:
`proof`

of Theorem 2 (NP-hardness)

- Topic:

**Project 2**: (*assigned to D. El Zein and T. Marette*) Move Rules and Trade-Offs in the Pebble Game by P. van Emde Boas, J. van Leeuwen (Theoretical Computer Science 1979)- Topic:
`proof`

of Theorem D (removing one pebble leads to exponential time)

- Topic:

External memory, cache oblivious algorithms:

**Project 3**: Lower Bounds for External Memory Dictionaries by G.S. Brodal, R. Fagerberg (SODA 2003)- Topic:
`proof`

of Lemma 2

- Topic:
**Project 4**: (*assigned to by O. Cedron*) The Buffer Tree: A New Technique for Optimal I/O Algorithms by L. Arge. (WADS 1995)- Topic:
`code`

the buffer tree and compare it to B-tree

- Topic:
**Project 5**: (*assigned to V. Siguret and H. Thievenaz*) Write-Optimized Skip Lists by M.A. Bender, M. Farach-Colton, R. Johnson, S. Mauras, T. Mayer, C.A. Phillips, Helen Xu (PODS 2017)- Topic:
`code`

and test the skip list algorithm

- Topic:
**Project 6**: Cache-Adaptive Algorithms by M.A. Bender et al. (SODA 2014)- Topic: some
`proof`

(your choice, possibly several small results)

- Topic: some

Work stealing and other schedulers:

**Project 7**: The Data Locality of Work Stealing by U.A. Acar, G.E. Blelloch, and R.D. Blumofe (TCS journal, 2002)- Topic: some
`proof`

(your choice)

- Topic: some
**Project 8**: (*assigned to Q. Corradi*) Scheduling Multithreaded Computations by Work Stealing by R.D. Blumofe and C.E. Leiserson- Topic:
`proof`

of the analysis of the work-stealing algorithm

- Topic:

**Project 9**: (*assigned to V. Pasquale*) Hierarchical Work-Stealing by J.-N. Quintin and F. Wagner (Euro-Par 2010)- Topic:
`code`

a (simple, centralized) work-stealing simulator and test the data locality of several work-stealing variants

- Topic:

Task graph scheduling:

**Project 10**: (*assigned P.-M. Marcille & B. Allorant*) Memory-optimal evaluation of expression trees involving large objects by C.-C. Chi-Chung Lam, T. Rauber, G. Baumgartner, D. Cociorva, P. Sadayappan (CLSS journal, 2011)- Topic:
`code`

the algorithm, compare it to a simpler strategy (optional: compare to the paper An application of generalized tree pebbling to sparse matrix factorization by Joseph W. H. Liu (SIAM J. Algebraic Discrete Methods, 1987)

- Topic:

Streaming algorithms:

**Project 11**: Video of the seminar Random Matrices, Dimensionality Reduction, Faster Numerical Algebra Algorithms by Jelani Nelson until 25:20- Topic: In case \(l=2\), detail the analysis of the proof (from 22:00),
explaining how to choose the random coefficients (k-wise
independence for which k for δ ? for σ)?
`Code`

to verify some approximation of the analysis part (from 18:20), on matrices of reasonable size to fit in the memory.

- Topic: In case \(l=2\), detail the analysis of the proof (from 22:00),
explaining how to choose the random coefficients (k-wise
independence for which k for δ ? for σ)?