Add rollout_buffer_class parameter to on-policy algorithms #1720

ernestum · 2023-10-20T19:20:03Z

Description

This PR adds a rollout_buffer_class and a rollout_buffer_kwargs parameter to the on-policy algorithms.

Motivation and Context

This feature allows to inject a custom rollout buffer into the on-policy algorithms.
The off-policy algorithms already allows this with the replay_buffer_class parameter, so why should the on-policy algorithms not support it too?
Concretely we need this for an implementation of a preference learning algorithm.

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

ernestum · 2023-10-23T08:52:57Z

We seem to have a cyclic dependency here, though it is unclear why the changes from this PR introduced them, since they seem to be in different parts of the code:

stable_baselines3/common/on_policy_algorithm.py

EloyAnguiano · 2023-10-24T07:33:39Z

Hi, I need this to be able to decouple the device of the rollout_buffer (cpu) and the batch of training data (gpu). Would ths be possible with this PR? Does anybody know what is left for merging this?

araffin · 2023-10-24T07:52:10Z

rollout_buffer (cpu) and the batch of training data (gpu). Would ths be possible with this PR? Does anybody know what is left for merging this?

this is already possible, the rollout buffer uses numpy, so it is always on the cpu.

stable-baselines3/stable_baselines3/common/buffers.py

Lines 503 to 520 in aab5459

    
               while start_idx < self.buffer_size * self.n_envs: 
        
                   yield self._get_samples(indices[start_idx : start_idx + batch_size]) 
        
                   start_idx += batch_size 
        
           def _get_samples( 
        
               self, 
        
               batch_inds: np.ndarray, 
        
               env: Optional[VecNormalize] = None, 
        
           ) -> RolloutBufferSamples: 
        
               data = ( 
        
                   self.observations[batch_inds], 
        
                   self.actions[batch_inds], 
        
                   self.values[batch_inds].flatten(), 
        
                   self.log_probs[batch_inds].flatten(), 
        
                   self.advantages[batch_inds].flatten(), 
        
                   self.returns[batch_inds].flatten(), 
        
               ) 
        
               return RolloutBufferSamples(*tuple(map(self.to_torch, data)))

…licyAlgorithm

araffin

look good, missing doc update, probably a quick test and similar PR for SB3 contrib

araffin

LGTM, thanks =)

Could you do the same PR to SB3 contrib?

* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm * Add rollout_buffer_class and rollout_buffer_kwargs to PPO. * Add rollout_buffer_class and rollout_buffer_kwargs to A2C. * Make use of the rollout buffer kwargs. * Update version * Add test and update doc --------- Co-authored-by: Antonin Raffin <[email protected]>

ernestum force-pushed the add-buffer-param-to-ppo branch 2 times, most recently from d424fbe to d1e30bb Compare October 20, 2023 19:39

ernestum force-pushed the add-buffer-param-to-ppo branch 3 times, most recently from ceca087 to 0379421 Compare October 23, 2023 13:37

qgallouedec reviewed Oct 23, 2023

View reviewed changes

stable_baselines3/common/on_policy_algorithm.py Show resolved Hide resolved

araffin mentioned this pull request Oct 23, 2023

Update dependencies (shimmy, sphinx), remove sphinx_autodoc_typehints #1724

Merged

16 tasks

araffin mentioned this pull request Oct 24, 2023

[Question] I do not understand the GPU and memory usage of SB3 #1630

Open

4 tasks

ernestum added 4 commits October 24, 2023 21:29

Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPo…

c1cc523

…licyAlgorithm

Add rollout_buffer_class and rollout_buffer_kwargs to PPO.

07385d7

Add rollout_buffer_class and rollout_buffer_kwargs to A2C.

35b92a1

Make use of the rollout buffer kwargs.

2d46250

ernestum force-pushed the add-buffer-param-to-ppo branch from d300b99 to 2d46250 Compare October 24, 2023 19:29

araffin reviewed Oct 24, 2023

View reviewed changes

araffin added 2 commits October 25, 2023 10:19

Update version

cf8ad5d

Add test and update doc

4961b53

araffin approved these changes Oct 27, 2023

View reviewed changes

araffin merged commit 69afefc into master Oct 27, 2023
4 checks passed

araffin deleted the add-buffer-param-to-ppo branch October 27, 2023 15:36

ernestum mentioned this pull request Oct 30, 2023

Add rollout_buffer_class to TRPO Stable-Baselines-Team/stable-baselines3-contrib#214

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rollout_buffer_class parameter to on-policy algorithms #1720

Add rollout_buffer_class parameter to on-policy algorithms #1720

ernestum commented Oct 20, 2023

ernestum commented Oct 23, 2023

EloyAnguiano commented Oct 24, 2023 •

edited

Loading

araffin commented Oct 24, 2023 •

edited

Loading

araffin left a comment

araffin left a comment

Add rollout_buffer_class parameter to on-policy algorithms #1720

Add rollout_buffer_class parameter to on-policy algorithms #1720

Conversation

ernestum commented Oct 20, 2023

Description

Motivation and Context

Types of changes

Checklist

ernestum commented Oct 23, 2023

EloyAnguiano commented Oct 24, 2023 • edited Loading

araffin commented Oct 24, 2023 • edited Loading

araffin left a comment

Choose a reason for hiding this comment

araffin left a comment

Choose a reason for hiding this comment

EloyAnguiano commented Oct 24, 2023 •

edited

Loading

araffin commented Oct 24, 2023 •

edited

Loading