-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for obtaining deposit snapshots during trustedNodeSync #4303
Conversation
beacon_chain/eth1/eth1_monitor.nim
Outdated
fatal "Failed to initialize web3 provider", | ||
depositContract = depositContractAddress, | ||
web3Url, err = providerRes.error | ||
quit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we close the db when quit
runs? also, this prevents starting eth1 and eth2 in "random" order which is not nice - ie I should be able to start the two services in any order and they should be "prone" to resume as soon as the "other" is up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in general with the suggestions, but this requires more invasive changes to the Eth1Monitor that will be delivered in another PR.
depositContract = depositContractAddress, | ||
deploymentBlock = $depositContractDeployedAt, | ||
err = e.msg | ||
quit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto... and how does this work when you start nimbus with an unsynced eth1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This merely maintains the Status quo as Nimbus would already exit at start-up when paired with an EL that hasn't synced the deposit contract deployment block.
I've improved the behavior in the multi EL branch where the BN is able to wait for the EL to sync before the rest of the Eth1Monitor activities start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you mean, shut down? we can run the BN without an EL entirely (in optmistic mode). Also, geth doesn't even start syncing without a BN. Since when do we require a synced EL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, we're quite buggy in this aspect: we run the deposit monitoring even when we know the EL is not synced - see #4057 - this causes many support requests in general because of all the error / warning logging that ensues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deposit contract deployment block is part of the PoW history that Geth will sync on its own.
Otherwise, yes, the general behavior can be improved, but as I explained these are more invasive changes that are currently part of another PR.
beacon_chain/trusted_node_sync.nim
Outdated
@@ -229,6 +267,16 @@ proc doTrustedNodeSync*( | |||
databaseDir, head = shortLog(dbHead.get()) | |||
(headSlot, dbHead.get()) | |||
|
|||
# Fetch deposit snapshot. This API endpoint is still optional. | |||
let depositSnapshot = await fetchDepositSnapshot(client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better to put behind a flag (default false) while v0 is a thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure is relatively benign. Only a INFO
level message is printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonetheless, it's a draft feature - also, disabling it completely is useful in case something breaks - this was part of the rationale for moving TNS to a finalized state alone: this is the minimal set of data needed to get started, and we should keep the possibility to start like this regardless of what the server supports - blocks and deposit snapshots are optional extras.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the suggested CLI flag, but I'm contemplating removing it since the API is now officially in the spec.
since no snapshot sources support the new snapshots, isn't this a regression? it seems reasonable to keep the old support until the new version has reached some reasonable penetration (basically when it's supported by https://eth-clients.github.io/checkpoint-sync-endpoints/) |
The old support was baking a snapshot in the binary. I haven't been updating these snapshots in a while and every running client will have a more recent record in its database. New clients that didn't obtain a snapshot from trusted node sync will be initialized to the deposit contract deployment block which seems reasonable (i.e. this is the behavior of all other clients) |
this is the regression indeed - there's no support in servers for the eip, and there won't be fore the foreseeable future - since the code for reading the baked-in snapshot is still there, it seems reasonable to keep using it until the eip has reached penetration (which can take anything from days to months) - I'd prefer seeing this feature as a pure addon until the standard has settled (both EIP and beacon API are drafts at this point - it's quite possible there will be changes in the format meaning that we might have to go through a potentially painful DB migration, so a way out is needed) |
beacon_chain/trusted_node_sync.nim
Outdated
# Fetch deposit snapshot. This API endpoint is still optional. | ||
let depositSnapshot = await fetchDepositSnapshot(client) | ||
if depositSnapshot.isOk: | ||
info "Writing deposit contracts snapshot", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the snapshot be validated against the state before writing to disk?
I still don't think this is a regression. The non-genesis snapshots were tested only in the testnets. The metadata for mainnet was never changed after the genesis event. |
the difference would be that we don't need to download deposits for the genesis validators, no? |
d4b3288
to
d0a208d
Compare
37d6c7c
to
051df25
Compare
ce12332
to
6c9b2be
Compare
else: | ||
warn "The downloaded deposit snapshot does not agree with the downloaded state" | ||
else: | ||
warn "Deposit tree snapshot was not imported", reason = depositSnapshot.error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the download is optional and nothing will happen if it doesn't work except a small delay, so this is a notice at most
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, with the new CLI option the user explicitly requested the download of a deposit snapshot and it didn't work. I think a warning is reasonable. The user should check their configuration.
return await client.getBlockV2(BlockIdent.init(slot), cfg) | ||
return awaitWithTimeout(client.getBlockV2(BlockIdent.init(slot), cfg), | ||
smallRequestsTimeout): | ||
raise newException(CatchableError, "Request timed out") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's a timeout, we should reattempt - that's what the retries are for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be retry on this error. An exception is raised here, so it can be handled in the except block below, which will produce the following message:
warn "Retrying download of block", slot, err = "Request timed out"
I didn't want to duplicate the logic in multiple places and the cost of raising an exception is not that high in this infrequent code path.
Other changes: * More optimal search for TTD block. * Add timeouts to all REST requests during trusted node sync. Fixes #4037 * Removed support for storing a deposit snapshot in the network metadata.
6c9b2be
to
bd3554a
Compare
EIP-4881 was never correctly implemented, the `DepositTreeSnapshot` structure has nothing to do with its actual definition. Reflect that by renaming the type to a Nimbus-specific `DepositContractSnapshot`, so that an actual EIP-4881 implementation can use the correct names. - https://eips.ethereum.org/EIPS/eip-4881#specification Notably, `DepositTreeSnapshot` contains a compressed sequence in `finalized`, only containing the minimally required intermediate roots. That also explains the incorrect REST response reported in #5508. The non-canonical representation was introduced in #4303 and is also persisted in the database. We'll have to maintain it for a while.
EIP-4881 was never correctly implemented, the `DepositTreeSnapshot` structure has nothing to do with its actual definition. Reflect that by renaming the type to a Nimbus-specific `DepositContractSnapshot`, so that an actual EIP-4881 implementation can use the correct names. - https://eips.ethereum.org/EIPS/eip-4881#specification Notably, `DepositTreeSnapshot` contains a compressed sequence in `finalized`, only containing the minimally required intermediate roots. That also explains the incorrect REST response reported in #5508. The non-canonical representation was introduced in #4303 and is also persisted in the database. We'll have to maintain it for a while.
Other changes:
More optimal search for TTD block.
Add timeouts to all REST requests during trusted node sync. Fixes trustedNodeSync may hang forever due to lack of timeout handling in all REST requests #4037
Removed support for storing a deposit snapshot in the network metadata.