Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use LRU strategy for shuffling/epoch caches #4196

Merged
merged 3 commits into from
Sep 29, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions beacon_chain/consensus_object_pools/block_pools_types.nim
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ type
# unnecessary overhead.
data*: BlockRef

LRUCache*[I: static[int], T] = object
entries*: array[I, tuple[value: T, lastUsed: uint32]]
timestamp*: uint32

ChainDAGRef* = ref object
## ChainDAG validates, stores and serves chain history of valid blocks
## according to the beacon chain state transtion. From genesis to the
Expand Down Expand Up @@ -189,9 +193,9 @@ type

cfg*: RuntimeConfig

shufflingRefs*: array[16, ShufflingRef]
shufflingRefs*: LRUCache[16, ShufflingRef]

epochRefs*: array[32, EpochRef]
epochRefs*: LRUCache[32, EpochRef]
## Cached information about a particular epoch ending with the given
## block - we limit the number of held EpochRefs to put a cap on
## memory usage
Expand Down
104 changes: 56 additions & 48 deletions beacon_chain/consensus_object_pools/blockchain_dag.nim
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,55 @@ func atSlot*(dag: ChainDAGRef, bid: BlockId, slot: Slot): Opt[BlockSlotId] =
else:
dag.getBlockIdAtSlot(slot)

func nextTimestamp[I, T](cache: var LRUCache[I, T]): uint32 =
if cache.timestamp == uint32.high:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of us lazy people would have gone with a uint64 and no reset loop :)

for i in 0 ..< I:
template e: untyped = cache.entries[i]
if e.lastUsed != 0:
e.lastUsed = 1
cache.timestamp = 1
inc cache.timestamp
cache.timestamp

template findIt[I, T](cache: var LRUCache[I, T], predicate: untyped): Opt[T] =
block:
var res: Opt[T]
for i in 0 ..< I:
template e: untyped = cache.entries[i]
template it: untyped {.inject, used.} = e.value
if e.lastUsed != 0 and predicate:
e.lastUsed = cache.nextTimestamp
res.ok it
break
res

template delIt[I, T](cache: var LRUCache[I, T], predicate: untyped) =
block:
for i in 0 ..< I:
template e: untyped = cache.entries[i]
template it: untyped {.inject, used.} = e.value
if e.lastUsed != 0 and predicate:
e.reset()

func put[I, T](cache: var LRUCache[I, T], value: T) =
var lru = 0
block:
var min = uint32.high
for i in 0 ..< I:
template e: untyped = cache.entries[i]
if e.lastUsed < min:
min = e.lastUsed
lru = i
if min == 0:
break
if min != 0:
{.noSideEffect.}:
debug "Cache full - evicting LRU", cache = typeof(T).name, capacity = I
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing it for both ShufflingRef and EpochRef on Raspi around ~10 minutes behind wall clock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, means there's deep forks happening at a high rate or we're creating shufflings / states unnecessarily .. is this mainnet? I don't remember seeing that much churn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie might be worth looking into why we're even creating these shufflings / epochrefs

Copy link
Contributor Author

@etan-status etan-status Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainnet with in-BN validators and Besu (slow processing / pessimistic sync)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but why the shufflings? until we're synced, we shouldn't be making shufflings and after, we should only make them for legit forks

Copy link
Contributor Author

@etan-status etan-status Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We're synced" depends on syncHorizon, so there are 3 parallel tasks going on:

  1. Catching up, this is around ~10 minutes behind when full caches are observed
  2. Gossip, validating those triggers new EpochRef to get proposerkey, and as part of epochref init it triggers shuffling
  3. Validator duties, based on however far it is caught up, but forward propagated to current slot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching up, this is around ~10 minutes behind when full caches are observed

This should certainly not trigger epochref starvation - ie when were syncing it's a single history where the time-based eviction should work just fine.

Gossip, validating those triggers new EpochRef to get proposerkey, and as part of epochref init it triggers shuffling

does this trigger before we've validated the parent? because if we've validated the parent, we shouldn't need a new shuffling except once per epoch.

Ditto attestations: we can't gossip-check those until the block being voted for has been synced so head-compatible votes shouldn't cause epochrefs

Validator duties, based on however far it is caught up, but forward propagated to current slot

this one is uglier indeed - computing proposers while syncing is messy but necessary, but why isn't the epoch-based cache working out here? it should be roughly equivalent to LRU...


template e: untyped = cache.entries[lru]
e.value = value
e.lastUsed = cache.nextTimestamp

func epochAncestor(dag: ChainDAGRef, bid: BlockId, epoch: Epoch):
Opt[BlockSlotId] =
## The epoch ancestor is the last block that has an effect on the epoch-
Expand Down Expand Up @@ -314,11 +363,8 @@ func findShufflingRef*(
dependent_bsi = dag.atSlot(bid, dependent_slot).valueOr:
return Opt.none(ShufflingRef)

for s in dag.shufflingRefs:
if s == nil: continue
if s.epoch == epoch and dependent_bsi.bid.root == s.attester_dependent_root:
return Opt.some s
Opt.none(ShufflingRef)
dag.shufflingRefs.findIt(
it.epoch == epoch and dependent_bsi.bid.root == it.attester_dependent_root)

func putShufflingRef*(dag: ChainDAGRef, shufflingRef: ShufflingRef) =
## Store shuffling in the cache
Expand All @@ -327,55 +373,23 @@ func putShufflingRef*(dag: ChainDAGRef, shufflingRef: ShufflingRef) =
# are seldomly used (ie RPC), so no need to cache
return

# Because we put a cap on the number of shufflingRef we store, we want to
# prune the least useful state - for now, we'll assume that to be the
# oldest shufflingRef we know about.
var
oldest = 0
for x in 0..<dag.shufflingRefs.len:
let candidate = dag.shufflingRefs[x]
if candidate == nil:
oldest = x
break
if candidate.epoch < dag.shufflingRefs[oldest].epoch:
oldest = x

dag.shufflingRefs[oldest] = shufflingRef
dag.shufflingRefs.put shufflingRef

func findEpochRef*(
dag: ChainDAGRef, bid: BlockId, epoch: Epoch): Opt[EpochRef] =
## Lookup an EpochRef in the cache, returning `none` if it's not present - see
## `getEpochRef` for a version that creates a new instance if it's missing
let key = ? dag.epochKey(bid, epoch)

for e in dag.epochRefs:
if e == nil: continue
if e.key == key:
return Opt.some e

Opt.none(EpochRef)
dag.epochRefs.findIt(it.key == key)

func putEpochRef(dag: ChainDAGRef, epochRef: EpochRef) =
if epochRef.epoch < dag.finalizedHead.slot.epoch():
# Only cache epoch information for unfinalized blocks - earlier states
# are seldomly used (ie RPC), so no need to cache
return

# Because we put a cap on the number of epochRefs we store, we want to
# prune the least useful state - for now, we'll assume that to be the
# oldest epochRef we know about.

var
oldest = 0
for x in 0..<dag.epochRefs.len:
let candidate = dag.epochRefs[x]
if candidate == nil:
oldest = x
break
if candidate.key.epoch < dag.epochRefs[oldest].epoch:
oldest = x

dag.epochRefs[oldest] = epochRef
dag.epochRefs.put epochRef

func init*(
T: type ShufflingRef, state: ForkedHashedBeaconState,
Expand Down Expand Up @@ -1666,14 +1680,8 @@ proc pruneStateCachesDAG*(dag: ChainDAGRef) =
block: # Clean up old EpochRef instances
# After finalization, we can clear up the epoch cache and save memory -
# it will be recomputed if needed
for i in 0..<dag.epochRefs.len:
if dag.epochRefs[i] != nil and
dag.epochRefs[i].epoch < dag.finalizedHead.slot.epoch:
dag.epochRefs[i] = nil
for i in 0..<dag.shufflingRefs.len:
if dag.shufflingRefs[i] != nil and
dag.shufflingRefs[i].epoch < dag.finalizedHead.slot.epoch:
dag.shufflingRefs[i] = nil
dag.epochRefs.delIt(it.epoch < dag.finalizedHead.slot.epoch)
dag.shufflingRefs.delIt(it.epoch < dag.finalizedHead.slot.epoch)

let epochRefPruneTick = Moment.now()

Expand Down
4 changes: 2 additions & 2 deletions tests/test_blockchain_dag.nim
Original file line number Diff line number Diff line change
Expand Up @@ -525,8 +525,8 @@ suite "chain DAG finalization tests" & preset():
not finalER.isErr()

block:
for er in dag.epochRefs:
check: er == nil or er.epoch >= dag.finalizedHead.slot.epoch
for er in dag.epochRefs.entries:
check: er.value == nil or er.value.epoch >= dag.finalizedHead.slot.epoch

block:
let tmpStateData = assignClone(dag.headState)
Expand Down