-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rollout_buffer_class parameter to on-policy algorithms #1720
Conversation
d424fbe
to
d1e30bb
Compare
ceca087
to
0379421
Compare
Hi, I need this to be able to decouple the device of the rollout_buffer (cpu) and the batch of training data (gpu). Would ths be possible with this PR? Does anybody know what is left for merging this? |
this is already possible, the rollout buffer uses numpy, so it is always on the cpu. stable-baselines3/stable_baselines3/common/buffers.py Lines 503 to 520 in aab5459
|
d300b99
to
2d46250
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look good, missing doc update, probably a quick test and similar PR for SB3 contrib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks =)
Could you do the same PR to SB3 contrib?
* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm * Add rollout_buffer_class and rollout_buffer_kwargs to PPO. * Add rollout_buffer_class and rollout_buffer_kwargs to A2C. * Make use of the rollout buffer kwargs. * Update version * Add test and update doc --------- Co-authored-by: Antonin Raffin <[email protected]>
Description
This PR adds a
rollout_buffer_class
and arollout_buffer_kwargs
parameter to the on-policy algorithms.Motivation and Context
This feature allows to inject a custom rollout buffer into the on-policy algorithms.
The off-policy algorithms already allows this with the
replay_buffer_class
parameter, so why should the on-policy algorithms not support it too?Concretely we need this for an implementation of a preference learning algorithm.
Types of changes
Checklist
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)make doc
(required)Note: You can run most of the checks using
make commit-checks
.Note: we are using a maximum length of 127 characters per line