Skip to content

Stochastic Variance Reduction Policy Gradient Estimation

License

Notifications You must be signed in to change notification settings

tianbingsz/SVRG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SVRG

  • Contributors and Collaborators: Tianbing Xu (Baidu Research, CA), Qiang Liu (UT, Austin), Jian Peng (UIUC)

Contributions:

The variance of the policy gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to improve the sample-efficiency. The SVRG estimation is incorporated into a trust-region Newton conjugate gradient framework for the policy optimization.

Dependencies

Refer to requirements.txt for more details.

Running Command

  • After launching the virtual env, set up PYTHONPATH and Mujoco PATH,
source start.sh
  • Run experiment
cd sandbox/rocky/tf/launchers/
python trpo_gym_swimmer.py

Results (MuJoco Robotics Tasks)

hopper_svrg Half-Cheetah Hopper

Reference

  • Tianbing Xu, Qiang Liu, Jian Peng, "Stochastic Variance Reduction for Policy Gradient Estimation", arXiv, 2017
  • S. S. Du, J. Chen, L. Li, L. Xiao, and D. Zhou, “Stochastic variance reduction methods for policy evaluation,”ICML, 2017
  • R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction, NIPS 2013
  • A. Owen and Y. Zhou, “Safe and effective importance sampling,”JASA, 2000
  • Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel. "Benchmarking Deep Reinforcement Learning for Continuous Control". ICML 2016

About

Stochastic Variance Reduction Policy Gradient Estimation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published