目录
目录README.md

Meta-Critic Network in RL

PyTorch implementation of Online Meta-Critic Learning for Off-Policy Actor-Critic Methods.

Getting Started

Prerequisites

The environment can be run locally using conda, you need to have Miniconda3 installed. Also, most of our environments currently require a MuJoCo license.

cd ${Miniconda3_PATH}
bash Miniconda3-latest-Linux-x86_64.sh

Conda Installation

1 Download and install MuJoCo 1.31 (used for the environment of rllab) and 1.50 from the MuJoCo website. Moreover, for some experimental needs, you need to install the rllab environment rllab.

2 We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150 and ~/.mujoco/mjpro131). The version of GYM is ‘0.14.0’ and mujoco_py is ‘1.50.1.68’.

3 Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:

4 You need to edit your PYTHONPATH to include the rllab directory. You need to have the zip file for MuJoCo 1.31 and the license file ready.:

export PYTHONPATH=path_to_rllab:$PYTHONPATH
./scripts/setup_linux.sh
./scripts/setup_mujoco.sh

5 Create and activate conda environment, install meta-critic to enable command line interface.

cd ${Meta_Critic_PATH}
conda env create -f environment.yaml
conda activate meta_critic

Examples

Training and simulating policy agent of DDPG_MC

1 Enter the directory of TD3_DDPG_MC

cd ${TD3_DDPG_MC_PATH}

2 Different design of auxiliary loss network: hw(pi(s))

python main.py --env_name HalfCheetahEnv --method DDPG_MC

3 Different design of auxiliary loss network: hw(pi(s),s,a)

python main.py --env_name HalfCheetahEnv --method DDPG_MC_sa 

Training and simulating policy agent of TD3_MC

1 Enter the directory of TD3_DDPG_MC

cd ${TD3_DDPG_MC_PATH}

2 Different design of auxiliary loss network: hw(pi(s))

python main.py --env_name HalfCheetahEnv --method TD3_MC

3 Different design of auxiliary loss network: hw(pi(s),s,a)

python main.py --env_name HalfCheetahEnv --method TD3_MC_sa 

Training and simulating policy agent of SAC_MC

1 Enter the directory of SAC_MC

cd ${SAC_MC_PATH}

2 Different design of auxiliary loss network: hw(pi(s))

python main.py --env_name HalfCheetahEnv --method SAC_MC

3 Different design of auxiliary loss network: hw(pi(s),s,a)

python main.py --env_name HalfCheetahEnv --method SAC_MC_sa 
关于

NeurIPS 20 论文'Online Meta-Critic Learning for Off-Policy Actor-Critic Methods'的PyTorch 框架版本。

47.0 KB
邀请码