M2 course, ENS Lyon:
Risk-aware Reinforcement Learning

Risk-aware Reinforcement Learning - Master 2 Informatique : Concepts et Applications

Course description

Beyond the classical theory, this course will cover some recent extensions of the theory of Markov Decision Processes (MDP) to the optimization of various risk measures. Starting from Bellman's Dynamic Programming, we will introduce the Distributional approach. We will then discuss how risk measures are usually defined, and discuss their properties. We will then show how classical algorithms based on backward induction can, or cannot, be adapted. The theory will be illustrated on classical or less classical benchmarks such as for example labyrinths or the inventory management problem. We will finally consider the foundations of reinforcement learning, starting with the vanilla bandit problem and then considering general approaches for MDPs. The course will be balanced between theory and practice, with exercice sessions dedicated to the implementation of the course’s algorithmic content.
Presentation slide

Weekly schedule 2025-2026

Monday 10:15-12:15 and Thursday 10:15-12:15
From September 11th to November 13th

Hands-On Sessions


Three homeworks count for 50% in total of the final note: they will be progressively available on the portail des études.

Outline:

  1. Classical planning in Markov Decision Processes
    • Bellman equations
    • Planning: Value Iteration, Policy Iteration
    • Distributional Reinforcment Learning
  2. Risk measures
    • How to measure a risk? Coherence, etc.
    • Classical risk measures
    • Entropic risk
  3. Planning for different risk measures
    • Extending the state space
    • Coherent risk measures and dynamic programming
    • Using the entropic Risk, directly and as a proxy
  4. Reinforcement Learning
    • The bandit problem and regret
    • Q-learning and optimist algorithms

Prerequisite

Notions of Machine Learning, elementary probability and statistics, elementary linear algebra.

Bibliography

  1. Algorithms for Reinforcement Learning, by Csaba Szepesvari (2010)
  2. Reinforcement Learning: Theory and Algorithms, by Alekh Agarwal Nan Jiang Sham M. Kakade Wen Sun (2024)
  3. Bandit Algorithms, by Tor Lattimore and Csaba Szepesvári (2016)
  4. Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin L. Puterman (2005)
  5. Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto (2018)

Evaluation

50% Hands-On, 50% final exam.

Research articles

Each group of 4 students chooses one among the following articles: