Y

YouLibs

Remove Touch Overlay

Trajectory-based Probabilistic Policy Gradient for Learning Locomotion Behaviors

Duration: 00:57Views: 8.8KLikes: 0Date Created: May, 2019

Tags: parks deep latent environments disney locomotion gradient resorts research policy imagineering

Description: We propose a trajectory-based reinforcement learning method named deep latent policy gradient (DLPG) for learning locomotion skills. We define the policy function as a probability distribution over trajectories and train the policy using a deep latent variable model to achieve sample efficient skill learning. We first evaluate the sample efficiency of DLPG compared to the state-of-the-art reinforcement learning methods in simulated environments. Then, we apply the proposed method to a four-legged walking robot named Snapbot to learn three basic locomotion skills of turn left, go straight, and turn right. We demonstrate that, by properly designing two reward functions for curriculum learning, Snapbot successfully learns the desired locomotion skills with moderate sample complexity. Link to publication: la.disneyresearch.com/publication/trajectory-based-probabilistic-policy-gradient-for-learning-locomotion-behaviors

Swipe Gestures On Overlay

Home