M2 course, ENS Lyon:
Risk-aware Reinforcement Learning

Risk-aware Reinforcement Learning - Master 2 Informatique : Concepts et Applications

Course description

Beyond the classical theory, this course will cover some recent extensions of the theory of Markov Decision Processes (MDP) to the optimization of various risk measures. Starting from Bellman's Dynamic Programming, we will introduce the Distributional approach. We will then discuss how risk measures are usually defined, and discuss their properties. We will then show how classical algorithms based on backward induction can, or cannot, be adapted. The theory will be illustrated on classical or less classical benchmarks such as for example labyrinths or the inventory management problem. We will finally consider the foundations of reinforcement learning, starting with the vanilla bandit problem and then considering general approaches for MDPs. The course will be balanced between theory and practice, with exercice sessions dedicated to the implementation of the course’s algorithmic content.
Presentation slide

Weekly schedule 2025-2026

Monday 10:15-12:15 and Thursday 10:15-12:15
From September 11th to November 13th

Hands-On Sessions


The Hands-on homeworks will count 50% in total of the final note.

Class Notes by Yanis Dziki

Lecture 1: see [Sutton and Barto, chap. 1-2]

Lecture 2: inspired from this research paper

Lecture 3: see [Bertsekas, chap. 1,4]

Lecture 4: see [Bertsekas, chap. 4]

Lecture 5: see [Bellemare et al., chap. 3-6]

Lecture 6: see [Szepesvari, chap. 2]

Lectures 7,8,9: see [Szepesvari, chap. 4 and Sutton and Barto, chap. 9-11]

Outline:

  1. Classical planning in Markov Decision Processes
    • Bellman equations
    • Planning: Value Iteration, Policy Iteration
    • Distributional Reinforcment Learning
  2. Risk measures
    • How to measure a risk? Coherence, etc.
    • Classical risk measures
    • Entropic risk
  3. Planning for different risk measures
    • Extending the state space
    • Coherent risk measures and dynamic programming
    • Using the entropic Risk, directly and as a proxy
  4. Reinforcement Learning
    • The bandit problem and regret
    • Q-learning and optimist algorithms

Prerequisite

Notions of Machine Learning, elementary probability and statistics, elementary linear algebra.

Bibliography

  1. Algorithms for Reinforcement Learning, by Csaba Szepesvari (2010)
  2. Reinforcement Learning: Theory and Algorithms, by Alekh Agarwal Nan Jiang Sham M. Kakade Wen Sun (2024)
  3. Bandit Algorithms, by Tor Lattimore and Csaba Szepesvári (2016)
  4. Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin L. Puterman (2005)
  5. Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto (2018)
  6. Dynamic Programming and Optimal Control, by Dimitri Bertsekas
  7. Deep Reinforcement Learning, by Aske Plaat
  8. Distributional Reinforcement Learning, By Marc Bellemare, Will Dabney and Mark Rowland

Evaluation

50% Hands-On, 50% final exam.