Risk-aware Reinforcement Learning - Master 2 Informatique : Concepts et Applications
Course description
Beyond the classical theory, this course will cover some recent extensions of the theory of Markov Decision Processes (MDP) to the optimization of various risk measures. Starting from Bellman's Dynamic Programming, we will introduce the Distributional approach. We will then discuss how risk measures are usually defined, and discuss their properties. We will then show how classical algorithms based on backward induction can, or cannot, be adapted. The theory will be illustrated on classical or less classical benchmarks such as for example labyrinths or the inventory management problem. We will finally consider the foundations of reinforcement learning, starting with the vanilla bandit problem and then considering general approaches for MDPs.
The course will be balanced between theory and practice, with exercice sessions dedicated to the implementation of the course’s algorithmic content.
Presentation slide
Weekly schedule 2025-2026
Monday 10:15-12:15 and Thursday 10:15-12:15
From September 11th to November 13th
Hands-On Sessions
Three homeworks count for 50% in total of the final note: they will be progressively available on the
portail des études.
Outline:
- Classical planning in Markov Decision Processes
Bellman equations
- Planning: Value Iteration, Policy Iteration
- Distributional Reinforcment Learning
- Risk measures
- How to measure a risk? Coherence, etc.
- Classical risk measures
- Entropic risk
- Planning for different risk measures
- Extending the state space
- Coherent risk measures and dynamic programming
- Using the entropic Risk, directly and as a proxy
- Reinforcement Learning
- The bandit problem and regret
- Q-learning and optimist algorithms
Prerequisite
Notions of Machine Learning, elementary probability and statistics, elementary linear algebra.
Bibliography
- Algorithms for Reinforcement Learning, by Csaba Szepesvari (2010)
- Reinforcement Learning: Theory and Algorithms, by Alekh Agarwal Nan Jiang Sham M. Kakade Wen Sun (2024)
- Bandit Algorithms, by Tor Lattimore and Csaba Szepesvári (2016)
- Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin L. Puterman (2005)
- Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto (2018)
Evaluation
50% Hands-On, 50% final exam.
Research articles
Each group of 4 students chooses one among the following articles: