Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear sentence about the net_arch argument for on policy algorithms #1408

Closed
2 tasks done
EBoguslawski opened this issue Mar 24, 2023 · 2 comments
Closed
2 tasks done
Labels
documentation Improvements or additions to documentation

Comments

@EBoguslawski
Copy link

📚 Documentation

Hello,
I was reading the documentation about how to customize the networks' architecture here:
https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#on-policy-algorithms

  1. And I found this sentence:
    "Otherwise, to have actor and critic that share the same network architecture, you only need to specify net_arch=[128, 128] (here, two hidden layers of 128 units each, this is equivalent to net_arch=dict(pi=[128, 128], vf=[128, 128]))."
    I think it might induce confusion because (unless I miss something) weights are shared by the actor and critic in the 1st case (with net_arch=[128, 128]) but not in the 2nd one (with net_arch=dict(pi=[128, 128], vf=[128, 128]))). Thus, these syntaxes are not really equivalent.

  2. The example just below the sentence gives the impression that weights are not shared when using net_arch=[128, 128].
    "Same architecture for actor and critic with two layers of size 128: net_arch=[128, 128]"

        obs
   /            \
 <128>          <128>
  |              |
 <128>          <128>
  |              |
action         value

Maybe you could use:

        obs
         |
        <128>
         |
        <128>
   /            \
action         value

Hoping I did not misunderstand something.
Eva

Checklist

@EBoguslawski EBoguslawski added the documentation Improvements or additions to documentation label Mar 24, 2023
@araffin
Copy link
Member

araffin commented Mar 24, 2023

The example just below the sentence gives the impression that weights are not shared when using

It's not an impression, it is the case. We changed the behavior in SB3 v1.8.0+ to match the offpolicy algorithms and simplify the code: #1292 and #1252

@EBoguslawski
Copy link
Author

Indeed I use SB3 v1.7.0. Thank you for your answer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants