M2 course, ENS Lyon:
Concentration of measure in probability and high-dimensional statistical learning

MATH5107 & INFO5161 : Master of Advanced Mathematics track Probability and Statistics, and Course CR15 of the M2SIF.

News: homeworks

Two homeworks will count for 25% each of the final note. Homework 1 and Homework 2 are due on November 15th.

Lecturers

Guillaume Aubrun, Aurélien Garivier, Rémi Gribonval

Course description

This course will introduce the notion of concentration of measure and highlight its applications, notably in high dimensional data processing and machine learning. The course will start from deviations inequalities for averages of independent variables, and illustrate their interest for the analysis of random graphs and random projections for dimension reduction. It will then be shown how other high-dimensional random functions concentrate, and what guarantees this concentration yields for randomized algorithms and machine learning procedures to learn from large training collections.
Presentation slide

Weekly schedule: Mondays 13:30-15:30, Fridays 10:15-12:15

  1. [09.07, Amphi A] (Gribonval) [INFO5161 only] Learning in high-dimension – introduction – basic concentration results – Chernoff-Hoeffding
  2. [09.11, Amphi F] (Garivier) [INFO5161 only] Martingales, Azuma, McDiarmid and applications
  3. [09.14 Amphi A] (Gribonval) [MATH5107 & INFO5161 from now on] McDiarmid inequality - The PAC framework for statistical learning - general principles
  4. [09.18 Amphi F] (Gribonval) Agnostic PAC bounds for ERM - sub-Gaussian and sub-Exponential variables
  5. [09.21 Amphi D] (Garivier) Uniform convergence, VC dimension and the fundamental theorem of PAC learning
  6. [09.25 Amphi H] (Aubrun) Geometry of high-dimensional spaces
  7. [09.28 Amphi D] (Garivier) Lower bounds for large deviations, and No Free Lunch theorems
  8. [10.02 Amphi H] (Aubrun) Gaussian Concentration
  9. [10.05 Amphi D] (Garivier) Negative association, Sequential analysis
  10. [10.09 Amphi H] (Aubrun) Correction of the exercises
  11. [10.12 Amphi D] (Gribonval) Dimensionality reduction
  12. [10.16 Amphi H] (Aubrun) Chaining
  13. [10.19 Amphi D] (Gribonval) Random projections
  14. [10.23 Amphi H] (Aubrun) Analysis of Boolean functions
  15. [11.02 Amphi D] (Garivier) Case study: optimal discovery and the Good-Turing estimator
  16. [11.06 Amphi H] (Aubrun) Hypercontractivity
  17. [11.09 Amphi F] student presentations / exam
  18. [11.13 Amphi H] student presentations / exam

Following at distance

  1. Connect to the course web page https://etudes.ens-lyon.fr/course/view.php?id=4270
  2. Click on the link "Link for the virtual class - cours "Concentration”
  3. Click on “Enter the session” / “Entrer dans la session”

Exercices

Exercise sheet number 1
Exercise sheet number 2
Exercise sheet number 3

Propositions of internship

  1. Modèles génératifs des méthodes de réduction de dimension par embedding, avec Franck Picard au LBMC
  2. Prévision d'extrêmes météorologiques par des méthodes de ML/DL, avec Clément Dombry (LmB), Philippe Naveau (LSCE) et Maxime Taillardat (CNRM, Meteo-France)
  3. Open Internship positions in Artificial Intelligence and Machine learning in Toulouse

Resarch articles

Planning for the defenses on 11.13:
10:15 video Clément Legrand-Duchesne, Antoine Gonon, Paul Rosa: Rahimi, A., & Recht, B. (2007). Random Features for Large Scale Kernel Machines. Advances in Neural Information Processing Systems (NIPS), (1), 1–8. https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf
10:30 video Nathan Sauldubois, Konrad Dierks, Yema Paul, Léo Parpais: Le, Q., T Sarlós, & Smola, A. (2013). Fastfood-approximating kernel expansions in loglinear time. http://proceedings.mlr.press/v28/le13.pdf
10:45 video Mathieu Roget, Thomas Buc-d'Alché et Baptiste Gros: Tropp, J. A. (2011). User-friendly tail bounds for Sums of Random Matrices. Foundations of Computational Mathematics. http://doi.org/doi:10.1007/s10208-011-9099-z
11:00 video Gabriel Bathie, Jules Bertrand, Justine Sauvage, Zoé Varin: MacAllester, D. and Ortiz, L. Concentration Inequalities for the Missing Mass and for Histogram Rule Error. https://www.jmlr.org/papers/volume4/mcallester03a/mcallester03a.pdf
11:15 video Nina Vesseron, Yann-Situ Gazull et Eliot Tron: Baraniuk, R., Davenport, M., DeVore, R. A., & Wakin, M. B. (2008). A simple proof of the restricted isometry property for random matrices. Constructive Approximation, 28(3), 253–263. https://www.math.tamu.edu/~rdevore/publications/131.pdf
11:30 video Victor Mercklé, Hadrien Brochet, Corentin Lunel, Victor Boone: Sélection de modèles et sélection d’estimateurs pour l'apprentissage statistique (Cours Peccot) Premier cours: Apprentissage statistique et sélection d’estimateurs https://www.imo.universite-paris-saclay.fr/~arlot/enseign/2011Peccot/peccot_notes_cours1.pdf https://www.math.tamu.edu/~rdevore/publications/131.pdf
11:45 video Vincent Alouin, Corentin Faipeur, Valodimir Mitzrchuk et Josselin Pomat: Boucheron S. and Thomas M., Concentration inequalities for order statistics - https://projecteuclid.org/euclid.ecp/1465263184
12:00 video Liu Mike: Chandrasekaran, V., Recht, B., Parrilo, P. A., & Willsky, A. S. (2012). The Convex Geometry of Linear Inverse Problems. Found. Comput. Math., 12(6), 805–849. http://doi.org/10.1007/s10208-012-9135-7

Articles to choose Each group of 4 students chooses one among the following articles:
  1. Achlioptas, D. (2001). Database-friendly random projections (pp. 274–281). Presented at the Principles of database systems (PODS), New York, New York, USA: ACM Press. http://doi.org/10.1145/375551.375608
  2. Puy, G., Davies, M. E., & Gribonval, R. (2017). Recipes for Stable Linear Embeddings From Hilbert Spaces to $ {\mathbb {R}}^{m}$. IEEE Transactions on Information Theory, 63(4), 2171–2187. http://doi.org/10.1109/tit.2017.2664858
  3. Tropp, J. A. (2008). On the conditioning of random subdictionaries. Appl. Comput. Harmon. Anal., 25(1), 1–24. https://www.sciencedirect.com/science/article/pii/S1063520307000863
  4. Bach, F. (2015). On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions. https://jmlr.org/papers/volume18/15-178/15-178.pdf

Prerequisite

Basic knowledge of probability theory, linear algebra and analysis over the reals.

Bibliography

  1. Concentration Inequalities, by Stéphane Boucheron, Pascal Massart and Gabor Lugosi
  2. High-Dimensional Probability – An Introduction with Applications in Data Science, by Roman Vershynin
  3. Understanding Machine Learning, From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David

Evaluation

Two homeworks: 25% each. Homework 1 and Homework 2 are due on November 15th.

Project: paper / chapter presentation

End of September

Distribution and allocation of topics
Students organize an allocation of the proposed topics by groups of 4
In case of a disagreement, an allocation will be adjusted by the course lecturers

Sunday November 8th

Submission deadline for videos + slides + written reports (half a page summary)

Friday November 13th

Oral session (10 minutes of questions per group)

EXPECTED WORK
a/ Written document: half a page summary

Written in French or English, on half a page, in the format of the summary of a conference paper describing the topic, its stakes, the difficulties, novelties, and main results.
Writing style, grammar and spelling must be done with care.

b/ Video presentation and slides

15-minutes lecture to be made available to the course lecturers as a video recording, paying attention to ensuring a balanced share of time highlighting the participation of each student of the group.
PDF version of slides with page numbers to be made available to the course lecturers.
Each student is invited to watch the videos of the other groups before the oral session.
Mandatory participation of all students to the oral session to answer questions from the course lecturers.
It is important to respect the allotted duration of the video lecture : do not hesitate to use a timer!
A limited duration lecture cannot contain the same information as a 20 to 30 pages paper / chapter : you have to make choices to bring us a viewpoint on the addressed topic. Roughly count 1 minute 30 per slide, and seek a good balance between giving a global view and being technically precise on selected aspects.
Adapting the content to the expected background of the targeted audience is important: here the audience consists of the course lecturers as well as all the students who followed the course.
As an example, a standard lecture structure can be as follows: considered problem and its context; state of the art approaches and/or approaches studied during the course; proposed approach; synthesis and discussion, possibly highlighting your own contributions (critical perspective, complementary bibliographical study, implementation, difficulties …) ; conclusion.
Among other things, the grade will reflect an evaluation of:
Your understanding of the topic and of the studied papers / chapters.
Your ability to explain them: many persons in the audience are not necessarily experts.
Your critical analysis of the studied papers / chapters.
Possible complementary bibliographical references bringing additional viewpoints.

PLANNED PRACTICALITIES

Submission of written + video material : via « portail des études » if technically adapted
Oral session: on site if possible, otherwise in hybrid mode or by videoconf (BBB of “portail des études”)
Details will be given as soon as possible according to technical and sanitary constraints.