diffusion-ppo.github.io - Diffusion Policy Policy Optimization

Description: Diffusion Policy Policy Optimization

Example domain paragraphs

TL;DR: We introduce DPPO, an algorithmic framework and set of best practices for fine-tuning diffusion-based policies in continuous control and robot learning tasks. DPPO shows marked improvements over diffusion and non-diffusion baselines alike, across a variety of tasks and sim-to-real transfer.

DPPO introduces a two-layer Diffusion Policy MDP with the inner MDP representing the denoising process and the outer MDP representing the environment --- each step of the entire MDP involves Gaussian likelihood and thus can be optimized with policy gradient. DPPO builds upon Proximal Policy Optimization (PPO) and proposes a set of best practices including modifications to the denoising schedule to ensure fine-tuning efficiency and stability.

DPPO yields consistent and marked improvements in training stability and final performance compared to other diffusion-based RL algorithms and common policy parameterizations such as Gaussian and Gaussian Mixture. Most remarkably, DPPO achieves robust zero-shot sim-to-real transfer (no usage of real data) in a state-based, long-horizon assembly task, while Gaussian policy shows a significant sim-to-real gap and consistently causes hardware error.

Links to diffusion-ppo.github.io (2)