This repository contains Python code for reproducing the benchmark experiments presented in the paper: "Semismooth Newton Methods for Risk-Averse Dynamic Programming" arXiv:2501.13612
Risk-Averse Dynamic Programming via Semismooth Newton Methods This repository provides a Python implementation of novel semismooth Newton algorithms to solve risk-averse Markov Decision Processes (MDPs) under Markovian risk measures—particularly the Conditional Value at Risk (CVaR) and Mean-Upper-Semideviation of order 1 (MUS1).
-
Implements three semismooth Newton methods for risk-averse MDPs:
- SNM I: Sequential risk-neutral MDP solves with perturbed transitions.
- SNM II: A risk-averse policy-iteration–style Newton solve.
- SNM III: A linearized, piecewise-smooth Newton update.
-
Includes risk-averse optimistic policy iteration as a comparative baseline.
-
Supports CVaR and MUS1 as risk measures, with theoretical guarantees of quadratic convergence under certain conditions.
-
Benchmarked on synthetic MDPs with varied sizes and discount factors, with empirical performance data included.
Follow these steps to set up your development environment and install the package:
-
Create and activate a Conda environment (recommended):
conda create -n riskaverse python=3.10 conda activate riskaverse
-
Install project dependencies:
pip install -r requirements.txt
-
Install the package in editable mode:
pip install -e . --use-pep517 -
Verify installation:
python -c "from riskaverse_DP.mdps import mdp; print('Success')"
This will ensure your riskaverse_DP package is installed and ready for development.
Basic usage pattern:
import numpy as np
from riskaverse_DP.mdps import mdp
n_states = 10
n_actions = 3
# Initialize with transition and stage-cost data
my_mdp = mdp(n_states, n_actions)
gamma_ = 0.9
alpha_ = 0.1
v_0 = np.zeros(n_states, )
from riskaverse_DP.CVaR_solvers import SNMI
my_SNMI = SNMI(gamma_, alpha_)
# Solve using one of the methods: "SNMI", "SNMII", "SNMIII", or "OPI"
v_opt, res = my_SNMI.solve(my_mdp, v_0)Check the examples/ folder for detailed scripts on synthetic MDPs.
Experiments on MDPs with sizes up to 100 states and diversified action spaces show:
- SNM methods reach convergence in fewer than 10 iterations, whereas risk-averse VI needs over 150.
- CPU times: SNM I and SNM III are most efficient; SNM II may face numeric issues depending on solver accuracy.
This project implements the Mean-Upper-Semideviation risk measure of order 1 (MUS1), a coherent risk measure used in risk-averse optimization and decision-making.
Given a random variable X and a parameter kappa in [0, 1], the MUS1 risk measure is defined as:
rho(X) = E[X] + kappa * E[(X - E[X])_+]
Here, (x)_+ denotes the positive part of x, i.e., max(x, 0). The parameter kappa controls the degree of risk sensitivity to values above the mean.
For a finite probability space with outcomes X_i and probabilities p_i, the MUS1 risk measure has the following dual representation as a linear program (LP):
Linear Program:
Maximize: sum_i p_i * xi_i * X_i
Subject to: sum_i p_i * xi_i = 1
0 <= xi_i <= 1 + kappa for all i
This LP computes the worst-case expected value under a distorted probability distribution defined by the weights xi_i * p_i, consistent with the MUS1 risk measure.
| MUS1 SNMs | MUS1 OPI |
|---|---|
![]() |
![]() |
For a full theoretical foundation—including proofs of convergence and algorithmic derivations—see the arXiv preprint “Semismooth Newton Methods for Risk-Averse Markov Decision Processes”.
Contributions are welcome! Feel free to open issues or pull requests for:
- New risk measures or benchmarks
- Extensions to large/continuous state spaces
- Performance optimizations or parallel implementations
This project is released under the MIT License. See LICENSE for complete details.

