M2 course, ENS Lyon:
Risk-aware Reinforcement Learning

Risk-aware Reinforcement Learning - Master 2 Informatique : Concepts et Applications

Course description

Beyond the classical theory, this course will cover some recent extensions of the theory of Markov Decision Processes (MDP) to the optimization of various risk measures. Starting from Bellman's Dynamic Programming, we will introduce the Distributional approach. We will then discuss how risk measures are usually defined, and discuss their properties. We will then show how classical algorithms based on backward induction can, or cannot, be adapted. The theory will be illustrated on classical or less classical benchmarks such as for example labyrinths or the inventory management problem. We will finally consider the foundations of reinforcement learning, starting with the vanilla bandit problem and then considering general approaches for MDPs. The course will be balanced between theory and practice, with exercice sessions dedicated to the implementation of the course’s algorithmic content.
Presentation slide

Weekly schedule 2025-2026

Monday 10:15-12:15 and Thursday 10:15-12:15
From September 11th to November 13th

Hands-On Sessions

Notebook 1: The Bakery
Here is a baseline submission to be evaluated in various scenarios - here is one
=> please upload your code here before Monday September 15th
Notebook 2: The Photo Booth
=> please upload your code here before Monday September 29th
Notebook 3: Windy Hike Around The Pit
=> please upload your code here before October 6th
Notebook 4: Retail Store Management
=> please upload your code here before October 13th
Notebook 5: Machine Replacement
=> please upload your code here before October 20th
Hands-On 6: The Cartpole - inspired from this code, propose a solution to another environment from the gymnasium library
=> please upload your code here before November 3rd
Notebook 7: Treatments
=> please upload your code here before November 10th

The Hands-on homeworks will count 50% in total of the final note.

Class Notes by Yanis Dziki

Lecture 1: see [Sutton and Barto, chap. 1-2]

Lecture 2: inspired from this research paper

Lecture 3: see [Bertsekas, chap. 1,4]

Lecture 4: see [Bertsekas, chap. 4]

Lecture 5: see [Bellemare et al., chap. 3-6]

Lecture 6: see [Szepesvari, chap. 2]

Lectures 7,8,9: see [Szepesvari, chap. 4 and Sutton and Barto, chap. 9-11]

Outline:

Classical planning in Markov Decision Processes

Planning: Value Iteration, Policy Iteration
Distributional Reinforcment Learning

Risk measures
- How to measure a risk? Coherence, etc.
- Classical risk measures
- Entropic risk
Planning for different risk measures
- Extending the state space
- Coherent risk measures and dynamic programming
- Using the entropic Risk, directly and as a proxy
Reinforcement Learning
- The bandit problem and regret
- Q-learning and optimist algorithms

Prerequisite

Notions of Machine Learning, elementary probability and statistics, elementary linear algebra.

Bibliography

Algorithms for Reinforcement Learning, by Csaba Szepesvari (2010)
Reinforcement Learning: Theory and Algorithms, by Alekh Agarwal Nan Jiang Sham M. Kakade Wen Sun (2024)
Bandit Algorithms, by Tor Lattimore and Csaba Szepesvári (2016)
Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin L. Puterman (2005)
Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto (2018)
Dynamic Programming and Optimal Control, by Dimitri Bertsekas
Deep Reinforcement Learning, by Aske Plaat
Distributional Reinforcement Learning, By Marc Bellemare, Will Dabney and Mark Rowland

Evaluation

50% Hands-On, 50% final exam.

Search form

Main menu

You are here

M2 course, ENS Lyon: Risk-aware Reinforcement Learning