This repository implements ORPO, a model-based offline RL algorithmic framework which can generate Optimistic model Rollouts for Pessimistic offline policy Optimization.
The implementation in this repositorory is used in the paper “Optimistic Model Rollouts for Pessimistic Offline Policy Optimization”, which has been accepted by AAAI 2024.
ORPO Implementation
This repository implements ORPO, a model-based offline RL algorithmic framework which can generate Optimistic model Rollouts for Pessimistic offline policy Optimization. The implementation in this repositorory is used in the paper “Optimistic Model Rollouts for Pessimistic Offline Policy Optimization”, which has been accepted by AAAI 2024.
Implemented Baselines
Environment Setup
Recommend to run code within a
conda
virtual environment. Create a virtual environment by:Activate the virtual environment by running:
Install the following dependencies:
Install dependencies by running the following command in the root directory of this repository (in the virtual environment):
Training Examples
Toy experiments
D4RL
Tasks requiring policies to generalize
Plotting Examples
References