Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashicorp Vault support #209

Merged
merged 20 commits into from
Nov 23, 2021
Merged

Hashicorp Vault support #209

merged 20 commits into from
Nov 23, 2021

Conversation

zivkovicmilos
Copy link
Contributor

@zivkovicmilos zivkovicmilos commented Nov 11, 2021

Description

This PR adds support for various KMS integrations, such as Hashicorp Vault, AWS SecretsManager and Azure Key Vault.

SecretsManager

SecretsManagerMain

The SecretsManager is the central place for keeping secrets.

Currently, the Polygon SDK is concerned with keeping 2 major runtime secrets:

  • The validator private key used by the node, if the node is a validator
  • The networking private key used by libp2p, for participating and communicating with other peers

The modules of the Polygon SDK should not need to know how to keep secrets. Ultimately, a module should not care if a secret is stored on a far-away server or locally on the node's disk.

Everything a module needs to know about secret keeping is knowing to use the secret, knowing which secrets to get or save. The finer implementation details of these operations are delegated away to the SecretsManager, which of course is an abstraction.

The node operator that's starting the Polygon SDK can now specify which secrets manager they want to use, and as soon as the correct secrets manager is instantiated, the modules deal with the secrets through the mentioned interface - without caring if the secrets are stored on disk or on a server.

Interface methods

  • Setup - used for the initial setup of the secrets manager
  • SetSecret - used for setting a secret by name to a specific value
  • GetSecret - used for grabbing the secret by name from a SecretsManager
  • HasSecret - used for checking if the SecretsManager has the secret by name
  • RemoveSecret - used for removing a secret by name from the SecretsManager

Local Secrets Manager

Prior to this PR, the Polygon SDK kept all secrets on disk. The paths were hard-coded, and the SDK modules that wanted to use secrets needed to know how to properly manage them. This lead to the situation where the networking package for example needed to know how to do disk reads / writes, or the crypto package needing to interact with the disk to do crypto operations.

The LocalSecretsManager is this implementation, but with the logic moved out to a more appropriate place.
It stores secrets on disk, in the familiar folder structure that has already existed within the SDK.
This mode of secrets management is default, meaning node operators don't need to change anything in their setup procedure to continue using it.

Hashicorp Vault Secrets Manager

A new addition to secrets management is the inclusion of Hashicorp Vault support.
Now a node operator can set up a Hashicorp Vault server, and allow the Polygon SDK to store sensitive information (the previously mentioned keys) securely.

The flow for setting up the Polygon SDK to work with a Hashicorp Vault server is outlined in the following documentation.

To facilitate the support of not just Hashicorp Vault, but future services such as AWS Secrets manager and Azure Key Vault, the Polygon SDK has added a new command:
secrets-manager generate

The generate command takes in different arguments in order to generate a configuration JSON file, which can be used when initializing the IBFT / networking keys (ibft init command) and when starting the Polygon SDK (server command).

Specifically for Hashicorp Vault, the node operator needs to provide the following information:

  • Access token - the generated token the SDK can use to access the Hashicorp Vault instance
  • Server URL - the URL of the Hashicorp Vault server instance
  • Node name - the name of the current node being set up, used for prefixing and managing Vault storage

Of course, the node operator needs to have already set up a Vault instance, generated the required access token and enabled key-value storage (KV-V2).

Support for other services

Support for other KMS services is planned for the future (namely Azure Key Vault and AWS Secrets Manager), as only the methods listed in the SecretsManager would need to be implemented and the service would be ready to go 🚀

Changes include

  • Bugfix (non-breaking change that solves an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)

Checklist

  • I have assigned this PR to myself
  • I have added at least 1 reviewer
  • I have tested this code
  • I have updated the README and other relevant documents (guides...)
  • I have added sufficient documentation both in code, as well as in the READMEs

Additional Comments

As the previous method of secrets storage was storing the values on disk, a large chunk of the codebase that deals with secrets needed to be refactored to use a SecretsManager.
The testing framework has been adapted to use the LocalSecretsManager.

Docs PR

The documentation PR is opened here.

@zivkovicmilos zivkovicmilos self-assigned this Nov 11, 2021
@ferranbt
Copy link
Contributor

I will give my thoughts here since this is something I have been looking at lately.

I think using the secrets module to store the libp2p key is a bit of an over killing. That identity is considered something temporal and it does not require big security measures. Actually, a new one could be generated every time the node starts and neither the network nor the node would suffer. This was not done initially because it would complicate deterministic deployments and tests.

Then, it is a matter of securing signing private keys for the validators/sealers, which is the core issue. Then, it should be really important to introduce a threat model regarding all the possible security implications of the design. For example, does it require TLS to connect with Vault securely? where is the private key stored once is downloaded from the KMS?

Regarding the interface, I think it would make sense to only support the getter methods from the Secret providers: getSecret, hasSecret. It should be up to another external workflow to set up this secrets in AWS, Vault... Consider that most likely the user running the client might not be the one with access to the KMS. Besides, this would avoid adding a lot of complex user workflows to set the keys in the sdk.

Copy link
Contributor

@lazartravica lazartravica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at my comments.

My biggest concern is that we haven't achieved any kind of separation of concern with this PR, since parts of the system that are reading/generating the secrets for some reason need to be aware of the underlying implementation which should not be the case.

secrets/hashicorpvault/hashicorpVault.go Outdated Show resolved Hide resolved
command/secretsManager/secret_manager.go Outdated Show resolved Hide resolved
network/server.go Outdated Show resolved Hide resolved
consensus/consensus.go Show resolved Hide resolved
command/ibft/ibft_init.go Outdated Show resolved Hide resolved
@lazartravica
Copy link
Contributor

I will give my thoughts here since this is something I have been looking at lately.

I think using the secrets module to store the libp2p key is a bit of an over killing. That identity is considered something temporal and it does not require big security measures. Actually, a new one could be generated every time the node starts and neither the network nor the node would suffer. This was not done initially because it would complicate deterministic deployments and tests.

Then, it is a matter of securing signing private keys for the validators/sealers, which is the core issue. Then, it should be really important to introduce a threat model regarding all the possible security implications of the design. For example, does it require TLS to connect with Vault securely? where is the private key stored once is downloaded from the KMS?

Regarding the interface, I think it would make sense to only support the getter methods from the Secret providers: getSecret, hasSecret. It should be up to another external workflow to set up this secrets in AWS, Vault... Consider that most likely the user running the client might not be the one with access to the KMS. Besides, this would avoid adding a lot of complex user workflows to set the keys in the sdk.

@zivkovicmilos please see Ferran's comments around the threeat model, to better understand which parts are missing in the docs.


I agree about the Libp2p key, although it is something due to the way private networks are setup we need in order to authenticate the specified bootnodes. There are pros and cons as for everything :)


As for the software not reading and writing: That's excellent insight, if the user who is deploying the network has a good KMS setup, he will not allow the token for Polygon-SDK nodes to read and write the keys, but only to read them.

That's why we need to start treating the ibft init as something that's most commonly being run once from a different environment altogether, when setting up the network.

I'd go right ahead and start discussing a small refactor of moving the ibft init secret generation logic to the secret subcommand, basically the secret subcommand (using this name secret for illustration purposes) would be the utility that can:

  • Generate and persist secrets to any backend (Vault / local fs)
  • Generate the secretmanager config for use by the polygon-sdk, as it currently does

Let's discuss this @zivkovicmilos , and if we see this is too much work, let's establish a compromise and create a task for a later date.

@zivkovicmilos
Copy link
Contributor Author

Hey @ferranbt ,

Thank you for the feedback and for raising some concerns regarding the way the SecretsManager is currently organized.

I just wanted to address a few of these concerns, namely the reason why the networking key is considered a secret, and how the SecretsManager implementation interacts with the Vault server.

There is nothing temporal about the networking key, in a sense that it is used in 2 major mechanisms within the Polygon SDK:

  • Bootnode connection
  • Peer reconnect (connection lost redial mechanism)

The libp2p peer identity is directly derived (and tied to) the private key the Polygon SDK generates / reads upon setting up the networking layer, and it needs to stay fixed if users expect the initial bootnode startup dial to work (hardcoded in the genesis file) and for peer reconnection to succeed quickly, instead of just relying on discovery (which in the case of a single bootnode that is not inaccessible is completely useless).
This is the reason why this private key is kept secret, as it is in fact something that contributes to the identity of the peer in the network, and as such is used by core modules.

The second part I wanted to address was the way the SecretsManager implementation interacts with the corresponding service (be it Hashicorp Vault, Azure Key Vault, AWS Secrets Manager..).
Essentially, it boils down to a single thing -> all of these services, apart from exposing their APIs for queries, also provide a custom built package that allows any 3rd party app (such as the Polygon SDK) to interact with their services. This means that the package handles any kind of calls and security measures needed for successful interaction, while the 3rd party app only needs to authenticate.

Hashicorp Vault is no exception.

The Polygon SDK doesn’t construct any custom calls to the Vault server, but instead relies on the provided package to interact with Vault.
The user only needs to provide an address and an access token, and the package takes care of the rest.

An additional thing I’d like to point out is that the Polygon SDK doesn’t ever store these keys while running locally, even temporarily (longer than a few nano seconds) in memory. The modules that require secret reads always interact with the SecretsManager directly, and fetch the secret which is used instantly and not stored previously for easier grabs.
This means that the SDK always goes through the process of authentication with the provided high level api package in order to grab the secret in memory, which is forgotten as soon as the method finishes executing.

Of course, the Local Secrets Manager works directly with the FS, so it stores the data there as it has been doing for the past year.

Regarding the actual interface methods of the SecretsManager, the user using the Polygon SDK expects the SDK to provide a way to generate the required keys automatically. If the user wishes, they can skip the initial ibft init step (which simply generates 2 keys) and just start the SDK with a secrets manager configuration - provided they have set up the keys manually on the external service such as Vault.

The addition of this feature adds 0 complexity to the current work flow (which stays exactly the same if you want to store secrets locally), and improves the codebase in a sense where the entire SDK doesn't need to rely anymore on 2 powerhouse methods to instantiate core modules (ReadLibp2pKey in the networking package and GenerateOrReadPrivateKey in crypto).

@ferranbt
Copy link
Contributor

There is nothing temporal about the networking key, in a sense that it is used in 2 major mechanisms within the Polygon SDK:

IMHO this is not correct and it is not considered like that in any other Ethereum P2P network.

Bootnodes are a special case, specific nodes that are controlled by a centralized entity and should change as less as possible. Even with that, there are other alternatives to Bootnodes (like DNS discovery in Ethereum) that remove the need to have a fixed identity.

Regarding the other, I understand that having a fixed P2P identity provides a better UX discovery when you want to retry again but even if keys would change all the time (which is not the case though), worse case scenario you just discover other nodes during the discovery protocol like Ethereum does.

Again, I am not saying you should replace P2P keys all the time, that is only the extreme case to make the point that P2P keys do not require a complex security setup. Then, if we only thing about protecting Ethereum keys, the SecretsManager interface could be simplified a lot.

@ferranbt
Copy link
Contributor

BTW, I also remembered this, really important. It is a common wrapper around many of the KMS providers https://github.com/hashicorp/go-kms-wrapping This can streamline the development for other providers and help on the interface.

@zivkovicmilos
Copy link
Contributor Author

zivkovicmilos commented Nov 17, 2021

@lazartravica

I've done the mini refactor you've suggested: moving out the ibft init logic to a more appropriate place (secrets init command).
Also, I've moved out initialization logic to the implementation of the Secrets Manager itself, so modules who request a factory don't need to do additional setup for config / params.

I'll update the documentation and post the commit link here shortly 👍

9ce1965

EDIT:
Updated the docs PR to reflect these changes

@lazartravica
Copy link
Contributor

Thanks @zivkovicmilos! I need to take this PR for a spin before merging any changes just to be double sure that the UX is the same as it was.

Copy link
Contributor

@lazartravica lazartravica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zivkovicmilos Please solve the conflicts and we can go forward with merging this.

@zivkovicmilos zivkovicmilos merged commit bbe108c into develop Nov 23, 2021
@zivkovicmilos zivkovicmilos deleted the feature/vault-support branch November 27, 2021 23:43
peterbroadhurst added a commit to kaleido-io/polygon-edge that referenced this pull request Jul 21, 2022
peterbroadhurst added a commit to kaleido-io/polygon-edge that referenced this pull request Jul 21, 2022
igorcrevar pushed a commit that referenced this pull request Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants