Background

Today, we store beacon blocks in Prysm’s database across the different Ethereum beacon chain fork versions. It is critical to persist blocks for all kinds of reasons including, but not limited to:

serving peers via p2p networking with requests for blocks during their sync progress
starting the chain from a certain state/block when a node is restarted
responding to API requests for historical data
replaying blocks when using state generation
replaying state transitions to a new chain in case the current chain goes through a reorg

Previously, we had Phase0 beacon blocks and then Altair beacon blocks. The Ethereum “merge” is coming during the Bellatrix hard fork for the beacon chain, which will introduce major changes to the beacon block type. Namely, a beacon block will now embed execution payloads which come from execution clients such as go-ethereum. This data structure contains raw transaction data, which encompasses significant growth to the size of beacon blocks in Prysm.

type BeaconBlockBellatrix struct {
	Slot uint64
	...
	ExecutionPayload struct {
		Transactions [][]byte // Can be huge!
	}
}

The issue is that go-ethereum’s db growth is also huge, in part due to having to store all blocks and their transactions. Post-merge, we will be storing the same transaction data across Prysm and its associated execution client. We think this is untenable and unreasonable, and there must be a better approach. If this is not resolved in Prysm before the merge, we foresee disk requirements to increase significantly. This document explores our approach towards solving this issue.

Solution

Our solution involves converting Bellatrix beacon blocks to a format that only contains the SSZ hash tree root of the transactions before storing them in our beacon node’s database. This format is officially referred as a blinded beacon block, as it is used in the Ethereum specification when also dealing with the proposer-builder separation proposal.

type BlindedBeaconBlock struct {
	Slot uint64
	...
	ExecutionPayloadHeader struct {
		TransactionsRoot [32]byte // Fixed to 32 bytes: hash_tree_root(transactions)
	}
}

This bounds our beacon block size to a predictably small amount compared to storing full transaction data in our database. However, following this approach, we need to ask the following questions:

How do we use full, Bellatrix beacon blocks in our codebase?

We define full Bellatrix beacon blocks as blocks containing complete execution payloads with transaction data in them.

We need full, Bellatrix beacon blocks when we broadcast them to beacon chain peers via p2p network gossipsub. That is, we receive a new Bellatrix beacon block with a full execution payload, we process it through the state transition function, and then we broadcast it.

<aside> 💡 Why do we need to gossip full blocks and not just blinded ones?

Broadcasting full Bellatrix blocks over the network is critical because gossiped data represents cryptoeconomic commitments from validators over the entire contents of the data within. For example, broadcasting a simple blockhash does not represent a validator’s full confidence in a block compared to broadcasting complete data, such as a complete execution payload. We have confidence that the data honest beacon nodes propagate over the network has been cryptographically verified before doing so. Gossiping full Bellatrix blocks gives confidence to the data within.

</aside>

Additionally, there are two parts of the codebase that require providing full, Bellatrix beacon blocks retrieved from the database.

P2P blocks by range and blocks by root requests from other beacon node peers