We propose a straightforward algorithm to train general locomotion controllers for multiple humanoid robots.
Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each r
RL-based policies are prone to producing jittery behaviors , Lipschitz continuous is a way to characterize the smoothness of a function. We propose L ipschitz- C onstrained- P olicies (LCP), a simple method to train policies that produce smooth behaviors by enforcing a Lipschitz constraint on the policy. This constraint is implemented as a gradient penalty, which is differentiable and can be easily integrated into existing RL training pipelines with only a few lines of code. LCP provides a simple and effect