M2 course, ENS Lyon:
Concentration of measure in probability and high-dimensional statistical learning

MATH5107 & INFO5161 : Master of Advanced Mathematics track Probability and Statistics, and Course CR01 of the M2IF.

Lecturers

Jean-Christophe Mourrat, Aurélien Garivier, Rémi Gribonval

Course description

This course will introduce the notion of concentration of measure and highlight its applications, notably in high dimensional data processing and machine learning. The course will start from deviations inequalities for averages of independent variables, and illustrate their interest for the analysis of random graphs and random projections for dimension reduction. It will then be shown how other high-dimensional random functions concentrate, and what guarantees this concentration yields for randomized algorithms and machine learning procedures to learn from large training collections.
Presentation slide

Weekly schedule 2024-2025: Friedays 14:00-16:00

  1. [09.13] (Gribonval) Deviation bounds and Chernoff's methods
  2. [09.20] (Gribonval)
  3. [09.27] (Garivier) notes for lectures 3, 4 and 5
  4. [10.04] (Garivier)
  5. [10.11] (Garivier)
  6. [10.18] (Mourrat)
  7. [10.25] (Mourrat)
  8. [11.08] (Garivier)
  9. [11.15] (Mourrat)
  10. [11.22] (Mourrat)
  11. [11.29] (Gribonval)
  12. [12.06] (Gribonval)
  13. [12.13] Project Defenses (13:30-17:30)

Exercices

Exercise sheet number 1
Exercise sheet number 2
Exercise sheet number 3
Three homeworks count for 50% in total of the final note: they will be progressively available on the portail des études.

Notebooks

Research articles

Each group of 4 students chooses one among the following articles:
  1. Rahimi, A., & Recht, B. (2007). Random Features for Large Scale Kernel Machines. Advances in Neural Information Processing Systems (NIPS), (1), 1–8. https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf
  2. Le, Q., T Sarlós, & Smola, A. (2013). Fastfood-approximating kernel expansions in loglinear time. http://proceedings.mlr.press/v28/le13.pdf
  3. Tropp, J. A. (2011). User-friendly tail bounds for Sums of Random Matrices. Foundations of Computational Mathematics. http://doi.org/doi:10.1007/s10208-011-9099-z
  4. MacAllester, D. and Ortiz, L. Concentration Inequalities for the Missing Mass and for Histogram Rule Error. https://www.jmlr.org/papers/volume4/mcallester03a/mcallester03a.pdf
  5. Baraniuk, R., Davenport, M., DeVore, R. A., & Wakin, M. B. (2008). A simple proof of the restricted isometry property for random matrices. Constructive Approximation, 28(3), 253–263. https://www.math.tamu.edu/~rdevore/publications/131.pdf
  6. Sélection de modèles et sélection d’estimateurs pour l'apprentissage statistique (Cours Peccot) Premier cours: Apprentissage statistique et sélection d’estimateurs https://www.imo.universite-paris-saclay.fr/~arlot/enseign/2011Peccot/peccot_notes_cours1.pdf https://www.math.tamu.edu/~rdevore/publications/131.pdf
  7. Boucheron S. and Thomas M., Concentration inequalities for order statistics - https://projecteuclid.org/euclid.ecp/1465263184
  8. Chandrasekaran, V., Recht, B., Parrilo, P. A., & Willsky, A. S. (2012). The Convex Geometry of Linear Inverse Problems. Found. Comput. Math., 12(6), 805–849. http://doi.org/10.1007/s10208-012-9135-7
  9. Garivier, a. & Pilliat, E. (2024). On Sparsity and Sub-Gaussianity in the Johnson-Lindenstrauss Lemma. http://doi.org/10.1145/375551.375608
  10. Puy, G., Davies, M. E., & Gribonval, R. (2017). Recipes for Stable Linear Embeddings From Hilbert Spaces to $ {\mathbb {R}}^{m}$. IEEE Transactions on Information Theory, 63(4), 2171–2187. http://doi.org/10.1109/tit.2017.2664858
  11. Tropp, J. A. (2008). On the conditioning of random subdictionaries. Appl. Comput. Harmon. Anal., 25(1), 1–24. https://www.sciencedirect.com/science/article/pii/S1063520307000863
  12. Bach, F. (2015). On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions. https://jmlr.org/papers/volume18/15-178/15-178.pdf
  13. Chatterjee, S. Stein's method for concentration inequalities (2006).https://arxiv.org/abs/math/0604352
  14. Spencer, J. Six standard deviations suffice. Trans. Amer. Math. Soc. 289 (1985), 679-706. https://www.ams.org/journals/tran/1985-289-02/S0002-9947-1985-0784009-0

Prerequisite

Basic knowledge of probability theory, linear algebra and analysis over the reals.

Bibliography

  1. Concentration Inequalities, by Stéphane Boucheron, Pascal Massart and Gabor Lugosi
  2. High-Dimensional Probability – An Introduction with Applications in Data Science, by Roman Vershynin
  3. Understanding Machine Learning, From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David

Evaluation

Three homeworks count for 50% in total of the final note: they will be progressively available on the portail des études. The projet on a research article will give the other 50%.

Project: paper / chapter presentation

End of September

Distribution and allocation of topics
Students organize an allocation of the proposed topics by groups of 4
In case of a disagreement, an allocation will be adjusted by the course lecturers

Monday November 1st

Submission deadline for videos + slides + written reports (half a page summary)

Monday November 8th

Oral session (10 minutes of questions per group)

EXPECTED WORK
a/ Written document: half a page summary

Written in French or English, on half a page, in the format of the summary of a conference paper describing the topic, its stakes, the difficulties, novelties, and main results.
Writing style, grammar and spelling must be done with care.

b/ Video presentation and slides

15-minutes lecture to be made available to the course lecturers as a video recording, paying attention to ensuring a balanced share of time highlighting the participation of each student of the group.
PDF version of slides with page numbers to be made available to the course lecturers.
Each student is invited to watch the videos of the other groups before the oral session.
Mandatory participation of all students to the oral session to answer questions from the course lecturers.
It is important to respect the allotted duration of the video lecture : do not hesitate to use a timer!
A limited duration lecture cannot contain the same information as a 20 to 30 pages paper / chapter : you have to make choices to bring us a viewpoint on the addressed topic. Roughly count 1 minute 30 per slide, and seek a good balance between giving a global view and being technically precise on selected aspects.
Adapting the content to the expected background of the targeted audience is important: here the audience consists of the course lecturers as well as all the students who followed the course.
As an example, a standard lecture structure can be as follows: considered problem and its context; state of the art approaches and/or approaches studied during the course; proposed approach; synthesis and discussion, possibly highlighting your own contributions (critical perspective, complementary bibliographical study, implementation, difficulties …) ; conclusion.
Among other things, the grade will reflect an evaluation of:
Your understanding of the topic and of the studied papers / chapters.
Your ability to explain them: many persons in the audience are not necessarily experts.
Your critical analysis of the studied papers / chapters.
Possible complementary bibliographical references bringing additional viewpoints.

PLANNED PRACTICALITIES

Submission of written + video material : via « portail des études » if technically adapted
Oral session: on site if possible, otherwise in hybrid mode or by videoconf (BBB of “portail des études”)
Details will be given as soon as possible according to technical and sanitary constraints.

Propositions of internship

Some of these propositions date back to last year, but some may remain active or relevant. Otherwise, you can find here a list of researchers likely to propose good internship in statistics and/or statistical learning . The list is not at all comprehensive: it gathers some propositions that we have received with a few colleagues from Lyon. Other possibilities exist.
  1. AccueilImaginEcology@Alpes : image, écologie et machine learning pour l’étude de la faune sauvage des Alpes , with Vincent Miele (Univ Lyon 1)
  2. Modèles génératifs des méthodes de réduction de dimension par embedding, avec Franck Picard au LBMC
  3. Prévision d'extrêmes météorologiques par des méthodes de ML/DL, avec Clément Dombry (LmB), Philippe Naveau (LSCE) et Maxime Taillardat (CNRM, Meteo-France)
  4. Open Internship positions in Artificial Intelligence and Machine learning in Toulouse
  5. Prédiction du besoin de réanimation à partir de l’enregistrement de constantes respiratoires, avec julie.josse@inria.fr, antoine.liutkus@inria.fr et claire.boyer@sorbonne-universite.fr
  6. Identification et classification automatique de style de textes littéraires francophones nativement numérique, avec julien.velcin@univ-lyon2.fr
  7. Machine learning for the geometrical identification of landmarks in shoot apical meristem images, avec Guillaume Cerutti (ENSL)
  8. Machine-learning-assisted turbulence modeling in dense gas flows, avec alexis.giauque@ec-lyon.fr et christophe.corre@ec-lyon.fr (ECL, LMFA)
  9. Continuous testing for spatial point processes, avec Julien Chevallier et Jean-François Coeurjolly (LJK Grenoble)
  10. Quantized and robust neural networks, avec François Malgouyres (IMT Univ. Toulouse)
  11. Apprentissage profond: propagation & bioacoustique, avec Antoine Liutkus, Ricard Marxer et Hervé Glotin, entre le LIS (​Toulon, équipe DYNI) et Inria (Montpellier, équipe Zenith)
  12. Texture segmentation with deep learning, avec Patrice Abry et Nelly Pustelnik (SYSIPH, LPC, ENSL)
  13. Sparse learning for fetal heart rate characterization, avec Patrice Abry et Nelly Pustelnik (SYSIPH, LPC, ENSL)
  14. Interprétabilité en machine learning et interaction homme/machine, avec Julien Velcin et Stéphane Chrétien (ERIC, Lyon 2)
  15. Artificial Intelligence to Decode the Genomic Replication Programme of Human Cells, avec Benjamin Audit (LP, CNRS, ENSL)
  16. Game-theoretical analysis of deep neural networks, avec Charlotte Laclau et Ievgen Redko (Univ. Saint-Etienne)
  17. Mixed data temporal clustering for modelling longitudinal surveys, avec Julien Jacques et Isabelle Prim-Allaz (Lyon 2)
  18. Simulation, estimation and control of structured and spatialized epidemiological models, avec Clémentine Prieur (UGA, LJK) et Didier Georges (INP, GIPSA-lab)
  19. Dynamic Precision Training of Deep Neural Network Models on the Edge at IRISA (Rennes) and/or LS2N (Nantes), avec Silviu Filip (INRIA, IRISA), Anastasia Volkova (univ. Nantes) et Elisa Riccetti (ENS Lyon)
Other offers available on the page of M2MA

Weekly schedule 2021-2022: Mondays 13:30-15:30, Fridays 10:15-12:15

  1. [09.06] (Gribonval) [INFO5161 only] Learning in high-dimension – introduction – basic concentration results – Chernoff-Hoeffding
  2. [09.10] (Garivier) [INFO5161 only] Martingales, Azuma, McDiarmid and applications
  3. [09.13] (Gribonval) [MATH5107 & INFO5161 from now on] McDiarmid inequality - The PAC framework for statistical learning - general principles
  4. [09.17] (Gribonval) Agnostic PAC bounds for ERM - sub-Gaussian and sub-Exponential variables
  5. [09.20] (Garivier) Uniform convergence, VC dimension and the fundamental theorem of PAC learning
  6. [09.24] (Aubrun) Geometry of high-dimensional spaces
  7. [09.27] (Garivier) Lower bounds for large deviations, and No Free Lunch theorems
  8. [10.01] (Aubrun) Gaussian Concentration
  9. [10.04](Gribonval) Dimensionality reduction
  10. [10.08] (Gribonval) Random projections
  11. [10.11] (Aubrun) Correction of the exercises
  12. [10.15] (Garivier) Negative association, Sequential analysis
  13. [10.18] (Garivier) Case study: optimal discovery and the Good-Turing estimator
  14. [10.22] (Aubrun) Chaining
  15. [10.25](Aubrun) Analysis of Boolean functions
  16. [10.29] (Aubrun) Hypercontractivity
  17. [11.08] student presentations / exam
  18. [11.12] student presentations / exam