Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rollout_buffer_class parameter to on-policy algorithms #1720

Merged
merged 6 commits into from
Oct 27, 2023

Conversation

ernestum
Copy link
Collaborator

Description

This PR adds a rollout_buffer_class and a rollout_buffer_kwargs parameter to the on-policy algorithms.

Motivation and Context

This feature allows to inject a custom rollout buffer into the on-policy algorithms.
The off-policy algorithms already allows this with the replay_buffer_class parameter, so why should the on-policy algorithms not support it too?
Concretely we need this for an implementation of a preference learning algorithm.

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have opened an associated PR on the SB3-Contrib repository (if necessary)
  • I have opened an associated PR on the RL-Zoo3 repository (if necessary)
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

@ernestum ernestum force-pushed the add-buffer-param-to-ppo branch 2 times, most recently from d424fbe to d1e30bb Compare October 20, 2023 19:39
@ernestum
Copy link
Collaborator Author

We seem to have a cyclic dependency here, though it is unclear why the changes from this PR introduced them, since they seem to be in different parts of the code:

image

@ernestum ernestum force-pushed the add-buffer-param-to-ppo branch 3 times, most recently from ceca087 to 0379421 Compare October 23, 2023 13:37
@EloyAnguiano
Copy link

EloyAnguiano commented Oct 24, 2023

Hi, I need this to be able to decouple the device of the rollout_buffer (cpu) and the batch of training data (gpu). Would ths be possible with this PR? Does anybody know what is left for merging this?

@araffin
Copy link
Member

araffin commented Oct 24, 2023

rollout_buffer (cpu) and the batch of training data (gpu). Would ths be possible with this PR? Does anybody know what is left for merging this?

this is already possible, the rollout buffer uses numpy, so it is always on the cpu.

while start_idx < self.buffer_size * self.n_envs:
yield self._get_samples(indices[start_idx : start_idx + batch_size])
start_idx += batch_size
def _get_samples(
self,
batch_inds: np.ndarray,
env: Optional[VecNormalize] = None,
) -> RolloutBufferSamples:
data = (
self.observations[batch_inds],
self.actions[batch_inds],
self.values[batch_inds].flatten(),
self.log_probs[batch_inds].flatten(),
self.advantages[batch_inds].flatten(),
self.returns[batch_inds].flatten(),
)
return RolloutBufferSamples(*tuple(map(self.to_torch, data)))

Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look good, missing doc update, probably a quick test and similar PR for SB3 contrib

Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks =)

Could you do the same PR to SB3 contrib?

@araffin araffin merged commit 69afefc into master Oct 27, 2023
4 checks passed
@araffin araffin deleted the add-buffer-param-to-ppo branch October 27, 2023 15:36
friedeggs pushed a commit to friedeggs/stable-baselines3 that referenced this pull request Jul 22, 2024
* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm

* Add rollout_buffer_class and rollout_buffer_kwargs to PPO.

* Add rollout_buffer_class and rollout_buffer_kwargs to A2C.

* Make use of the rollout buffer kwargs.

* Update version

* Add test and update doc

---------

Co-authored-by: Antonin Raffin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants