Data-Aware Algorithms

Loris Marchal (and Olivier Beaumont), Fall 2020

In many today's computing systems, the cost, both in time and energy, of moving the data has become predominant compared to the cost of computing. Part of the solution for this problem can be found in algorithms: over the years, several computation models and algorithmic strategies have been proposed to optimize data movement instead of (or in addition to) the classical objective of computational complexity. The goal of this course is to present a variety of these solutions.

NB: you may check the webpage of this course for Fall 2019.

Part 1: Modeling register allocations

In this first part, we will review studies which dates back to the 1960s, the pioneering work of Hong and King on I/O complexity, and their extensions.

Register allocation on DAGs
Pebble games
I/O complexity, lower bounds
Space-time tradeoffs

Available material:

Slides of the first lecture (September 9)
Notes and proofs of the first lecture
Slides of the second lecture (September 10)
Notes and proofs of the second lecture

Other resources:

More information on the pebble games in the book "Models Of Computation" by John Savage (pdf available online): (black) pebble game in Chapter 10, red-blue pebble game and hierarchical memory pebble game in Chapter 11.

Part 2: Cache-oblivious algorithms and to external memory models

In some cases, it is possible to design algorithms that minimize the data accesses whatever the size of the memory, or the cache, even when this size in unkown to the algorithm.

External memory algorithms
Cache oblivious algorithms
Parallel external memory models
Schedulers for multithreaded computations (parallel cache oblivious)

Available material:

Other courses/resources:

Lecture notes by Erik Demaine on Cache-Oblivious Algorithms and Data Structures at the 2002 EEF Summer School on Massive Data Sets
Series of lectures at MIT by Erik Demaine, available on YouTube:
- Cache-Oblivious Algorithms: Medians & Matrices
- Cache-Oblivious Algorithms: Searching & Sorting
Cormen et al., Introduction on Algorithms, Chap. 18: B-Trees

Proof of LRU competitive ratio (in French) from my previous course on scheduling, the proof is also available (in English) in Savage's book (see above), at the end of Chapter 11 (Theorem 11.10.1)

Some articles where results are taken from:

Sleator & Tarjan (1985) for the proof of LRU competitive ratio
Frigo et al. (1999) for the justification of the ideal cache model (Section 6), first cache-oblivious algorithms (matrix mult., etc.)
Aggarwal and Witter for the proof of I/O lower bounds (sorting and permuting)
Brodal and Fagerberg (2003) for the proof of searching lower bound (Lemma 1)
Survey on External Memory Algorithms and Data Structures by Jeffrey Scott Vitter (1998)
Brodal and Fagerberg (2002): Funnel Sort and Distribution Sweeping

Part 3: Algorithms for Big Data (by Olivier Beaumont)

Some data are so large that we have to drastically limit the data access when analising them, possibly with an impact on the accuracy of the result: it is sometimes better to have a quick good estimate than a precise result that takes forever to compute. Models have been proposed that either allow a simple linear scan of the data, and/or limit the temporary storage of the analysis.

Data strem model
Approximate counting
Similarity search

Available material:

Slides of all four lectures (Sep 30, Oct. 1st, Oct. 7 & Oct 8)
Homework for this part (updated on Oct. 20, deadline: November 2nd)

Other resources:

Wikipedia entries on Morris approximate counting algorithm, Count-distinct problems and Flajolet-Martin distinct counting algorithm

Course on Algorithms for Big Data at Harvard by Jelani Nelson (see in particular Lectures 1 & 2)
Lecture on Counting Distinct Elements by Jeffrey D. Ullman

Part 4: Memory aware DAGs scheduling

Several scientific computations can be modeled as DAGs of tasks. The way the DAG is scheduled (traversed) influences the amount of memory needed for the computation.

Scheduling algorithms for trees
Extension to more general DAGs
Extension to parallel scheduling

Available material:

Slides course 1 & 2 (Oct. 13 & 15)

Some articles where results are taken from:

Agullo et al. (2010) analysis of postorder tree traversal for minimizing I/Os

Kayaaslan et al. (2018) optimal algorithms for minimizing memory of SP-graphs
Jacquelin et al. (2011) analysis of postorder and general traversals for trees
Marchal et al. (2018) computing and bounding maximum parallel memory

Part 5: Communication-avoiding algorithms

In numerical linear algebra, some algorithms have been proposed who reach the lower bounds on data movement. We will study some of these algorithms, both for the sequential case (I/O minimization) and the parallel case (communication minimization).

Principle of communication-avoiding algorithms
Communication-avoiding matrix multiplication
Extension to other linear algebra operations

Available material:

Slides (Oct. 22)

Other courses/resources:

Lecture at Argonne (video) by Jim Demmel
Course at UC Berkeley by Jim Demmel and Laura Grigori

Some articles where results are taken from:

Ballard et al. (2011) analysis of generalized matrix computations and algorithm design
Solomonik et al. (2011) 2.5D matrix multiplication algorithm

Evaluation

The evaluation will consist in two parts:

2 Homeworks: exercices given in class
Research project per group of 1/2 students:
- Paper+Topic to choose in the following list
- Outcome:
  - Report (5/10 pages)
  - Presentation in class
- If topic includes programming:
  - Implement the algorithm
  - Perform interesting experiments
  - Report the important points/difficulties when coding it
  - Report the results and draw some conclusions
- If topic includes theoretical work (proof)
  - Read and understand the proof
  - Present the proof, possibly skipping some lemmas and intermediate points
  - In both cases:
    - Present the context and motivation of the paper
    - Summarize the whole paper and its contributions

The presentations will take place on November 5 and 12, and the reports are due for November 2. Presentations can be held remotely (throught BBB) if needed. If the research project includes coding, the code must be sent with the report in a archive (zip, tgz or available online).

Schedule of the presentations

Each group will do a presentation of 15-20 minutes, followed by 5-10 minutes of questions/comments. Here is the schedule, send me a mail to register.

November 5:

10:00: (free)
10:30: (free)
11:00: V. Pasquale (project 9)
11:30: (free)
13:30: (free)
14:00: (free)
14:30: (free)
15:00: (free)
15:30: (free)

November 12:

10:00: H. Thievenaz & V. Siguret (project 5)
10:30: Q. Corradi (project 8)
11:00: J. Bertrand & N. Champseix (project 1)
11:30: B. Allorant & P.-M. Marcille (project 10)
13:00: (free)
13:30: O. Cedron (project 4)
14:00: (free)
14:30: T. Marette & D. El Zein (project 2)
15:00: (free)
15:30: (free)

Research papers for the project

Pebbles games and extensions:

Project 1: (assigned to J. Bertrand and N. Champseix) On the hardness of Red-Blue Pebble Games by P.A. Papp and R. Wattenhofer, SPAA 2020
- Topic: proof of Theorem 2 (NP-hardness)

Project 2: (assigned to D. El Zein and T. Marette) Move Rules and Trade-Offs in the Pebble Game by P. van Emde Boas, J. van Leeuwen (Theoretical Computer Science 1979)
- Topic: proof of Theorem D (removing one pebble leads to exponential time)

External memory, cache oblivious algorithms:

Project 3: Lower Bounds for External Memory Dictionaries by G.S. Brodal, R. Fagerberg (SODA 2003)
- Topic: proof of Lemma 2
Project 4: (assigned to by O. Cedron) The Buffer Tree: A New Technique for Optimal I/O Algorithms by L. Arge. (WADS 1995)
- Topic: code the buffer tree and compare it to B-tree
Project 5: (assigned to V. Siguret and H. Thievenaz) Write-Optimized Skip Lists by M.A. Bender, M. Farach-Colton, R. Johnson, S. Mauras, T. Mayer, C.A. Phillips, Helen Xu (PODS 2017)
- Topic: code and test the skip list algorithm
Project 6: Cache-Adaptive Algorithms by M.A. Bender et al. (SODA 2014)
- Topic: some proof (your choice, possibly several small results)

Work stealing and other schedulers:

Project 7: The Data Locality of Work Stealing by U.A. Acar, G.E. Blelloch, and R.D. Blumofe (TCS journal, 2002)
- Topic: some proof (your choice)
Project 8: (assigned to Q. Corradi) Scheduling Multithreaded Computations by Work Stealing by R.D. Blumofe and C.E. Leiserson
- Topic: proof of the analysis of the work-stealing algorithm

Project 9: (assigned to V. Pasquale) Hierarchical Work-Stealing by J.-N. Quintin and F. Wagner (Euro-Par 2010)
- Topic: code a (simple, centralized) work-stealing simulator and test the data locality of several work-stealing variants

Task graph scheduling:

Project 10: (assigned P.-M. Marcille & B. Allorant) Memory-optimal evaluation of expression trees involving large objects by C.-C. Chi-Chung Lam, T. Rauber, G. Baumgartner, D. Cociorva, P. Sadayappan (CLSS journal, 2011)
- Topic: code the algorithm, compare it to a simpler strategy (optional: compare to the paper An application of generalized tree pebbling to sparse matrix factorization by Joseph W. H. Liu (SIAM J. Algebraic Discrete Methods, 1987)

Streaming algorithms:

Project 11: Video of the seminar Random Matrices, Dimensionality Reduction, Faster Numerical Algebra Algorithms by Jelani Nelson until 25:20
- Topic: In case \(l=2\), detail the analysis of the proof (from 22:00), explaining how to choose the random coefficients (k-wise independence for which k for δ ? for σ)? Code to verify some approximation of the analysis part (from 18:20), on matrices of reasonable size to fit in the memory.