05.07.2024|Storm SlivkoffGeorgios Konstantopoulos
History growth is currently the biggest bottleneck for scaling Ethereum. Somewhat unexpectedly, history growth has become a much larger problem than state growth. Within a couple years, history data will exceed the storage capacity of many Ethereum nodes.
The good news is that:
In this post we continue our investigation of Ethereum scaling from Part 1, now turning our attention from state growth to history growth. Using high resolution datasets, our goal is to 1) build a technical understanding of Ethereum’s scaling bottlenecks, and 2) help frame the discussion around what Ethereum gas limit is optimal.
This article is part 2 in a blogpost series about Ethereum scaling. Part 1 is about state growth, part 2 is about history growth, part 3 is about state access, and part 4 is about the gas limit.
History is the set of all blocks and transactions that Ethereum has executed throughout its lifetime. This is the data needed to sync the chain from the Genesis block to the current tip of the chain. History growth is the accumulation of new blocks and new transactions over time.
Figure 1 shows how history growth relates to various protocol metrics and Ethereum node hardware constraints. History growth is limited by a different set of hardware constraints than state growth. History growth puts stress on Network IO, because new blocks and transactions must be transmitted throughout the network. History growth also puts stress on a node’s Storage Space because every Ethereum node stores a complete copy of the history. If history grows quickly enough to exceed these hardware constraints, a node will no longer be able to achieve stable consensus with its peers. Refer to Part 1 of this article series for an overview of state growth and other scaling bottlenecks.
Until recently, the majority of each node’s network throughput was used for transmitting history (e.g. new blocks and transactions). This situation has changed with the introduction of blobs in the Dencun hard fork. Blobs now occupy a significant portion of a node’s network activity. However, blobs are not considered part of history because 1) they are only stored by a node for 2 weeks before being discarded and 2) they are not needed for replaying the chain from Genesis. Thanks to (1), blobs do not significantly contribute to the storage burden of each Ethereum node. We will discuss blobs in a later section of this post.
In this article we will focus on history growth and also touch on the relationship between history and state. Since state growth and history growth share some overlapping hardware constraints, they are related problems, and addressing one problem can help address the other.
Figure 2 shows the history growth rate over time since Ethereum’s Genesis. Each vertical bar represents one month of growth. The y-axis represents the number of gigabytes that history grew during that month. Transactions are categorized by their “to address” and sized using their RLP byte representation. Contracts that could not be easily identified are categorized as “Unknown”. The “Other” category includes a long tail of small categories such as infrastructure and gaming.
Figure 2: Ethereum history growth rate over time
Double click the legend to filter
A few key takeaways from this chart:
The amount of history generated by each contract category reveals how Ethereum usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories. This is the same data as Figure 2, normalized to 100%.
Figure 3: Contributions to history growth
Double click the legend to filter
This data reveals four distinct epochs of Ethereum usage patterns:
Each era represents a more complex Ethereum usage pattern than the one before it. Complexification over time can be seen as a form of Ethereum scaling that is not captured by simple metrics like transactions per second.
In the most recent month of data, April 2024, rollups are no longer generating the majority of history. It is unclear whether future history will originate from DEX’s and DeFi, or some new pattern of usage will emerge.
The introduction of blobs in the Dencun hardfork significantly altered history growth dynamics by allowing rollups to post their data using cheap blobs instead of history. Figure 4 zooms into the history growth rate around the date of the Dencun upgrade. The chart is similar to Figure 2, except each vertical bar represents one day instead of one month.
Figure 4: Effect of Dencun on history growth
Double click the legend to filter
A couple key takeaways from this graph:
Although blobs have reduced history growth, they are still a recent addition to Ethereum. It’s unclear where history growth will stabilize in the presence of blobs.
Raising the gas limit will increase the history growth rate. Proposals to raise the gas limit (e.g. Pump the Gas) must therefore account for the relationship between history growth and each node’s hardware bottlenecks.
To figure out an acceptable rate of history growth, it is helpful to start by examining how long the current status quo can be maintained by modern node hardware for networking and storage. Networking hardware can probably sustain the status quo indefinitely, because the history growth rate is unlikely to return to its pre-Dencun peak until the gas limit is increased. However, the storage burden of history continually increases over time. Under current storage policies it is inevitable that each node’s storage drives eventually become filled by history.
Figure 5 shows Ethereum node’s storage burden over time, and it also projects how this storage burden may grow over the next 3 years. Projections were made using the April 2024 growth rate. It is possible that this rate may rise or fall with future changes to usage patterns or the gas limit.
Figure 5: Size of history, state, and total full node storage burden
A few key takeaways from this figure:
Unlike state data, history data is append-only and is accessed much less aggressively. Thus it is theoretically possible to store history data separately from state data on cheaper storage media. This can be done with some clients like geth.
Beyond storage capacity, Network IO is the other main hardware constraint on history growth. Unlike storage capacity, network IO limitations will not cause problems for nodes in the short term, but these limitations will become important for future increases to the gas limit.
To know how much history growth can be supported by a typical Ethereum node’s network capacity, it is necessary to characterize the relationship between history growth and various network health metrics such as reorg rate, slot misses, finality misses, attestation misses, sync committee misses, and block submission delays. Analysis of these metrics is beyond the scope of this post, but more information can be found in previous investigations of consensus layer health [1] [2] [3] [4]. Additionally, the Ethereum Foundation’s Xatu project has been building public datasets that should expedite these types of analyses.
History growth is an easier problem than state growth. It is solved almost entirely by the candidate proposal EIP-4444. This EIP changes each node from preserving the entire Ethereum history to just preserving one year of history. After EIP-4444 is implemented, data storage will no longer be a bottleneck on Ethereum scaling, even in the long term with substantial gas limit increases. EIP-4444 is necessary for the long term sustainability of the network, because otherwise the history will grow fast enough to require regular hardware updates in network nodes.
Figure 6 shows how EIP-4444 affects each node’s storage burden over the next 3 years. This is the same as Figure 4, with the added lighter lines representing storage burdens post-EIP-4444.
Figure 6: Effect of EIP-4444 on Ethereum node storage burden
Some key takeaways from this figure:
After EIP-4444 has been implemented, history growth will still impose some amount of storage burden because nodes will store a year’s worth of history. However, this burden will not be difficult to address, even as Ethereum reaches global scale. The year-long expiration time of EIP-4444 can likely be reduced to months, weeks, or even shorter once the history preservation approaches are shown to be reliable.
EIP-4444 raises the question of how history should be preserved if not by the Ethereum nodes themselves. History plays a central role in the validation, accounting, and analysis of Ethereum, and so it is vital that it be preserved. Luckily, history preservation is an easy problem that requires only 1/n honest data providers. This is in contrast to state consensus problems that require between 1/3 and 2/3 of data participants to be honest. A node operator can validate the authenticity of any history dataset by 1) replaying all of its transactions from Genesis and 2) checking whether those transasctions reproduce the same state root as the current chain tip.
There are multiple approaches for preserving history. Each of these should probably be deployed in parallel to maximize the likelihood of preservation.
The remaining implementation challenges are more social than technical. The Ethereum community needs to coordinate around specific implementation details so that they can be directly integrated into each node client. In particular, performing a full sync from Genesis (not a snap sync) will then require retrieving the history from history providers instead of Ethereum nodes. These changes do not technically require a hard fork, and so they could be implemented sooner than Ethereum’s next hard fork, Pectra.
All of these history preservation approaches could also be used by L2’s to preserve the blob data they post to mainnet. Compared to history preservation, blob preservation is 1) more difficult due to the total data size being much larger and 2) less important because blobs are not necessary for replaying mainnet history. However, blob preservation is still necessary for each L2 to replay their own history. Thus, some form of blob preservation will be important to the Ethereum ecosystem as a whole. Additionally, if L2’s develop robust blob storage infrastructure, they may also be able to easily store L1 history data.
It is helpful to directly compare the datasets stored by various node configurations before and after EIP-4444. Figure 7 shows the storage burden across Ethereum node types. State Data is accounts and contracts, History Data is blocks and transactions, and Archive Data is a set of optional data indices. The byte counts in this table are based off of a recent reth snapshot, but numbers for other node clients should be roughly comparable.
Figure 7: Storage burden across Ethereum node types
To put this into words,
Finally, there are some additional EIP’s that would limit the history growth rate rather than merely accommodating the current rate. This would be helpful both in the short term for staying within network IO constraints and in the long term for staying within storage constraints. Although EIP-4444 is still necessary for the long term sustainability of the network, these other EIP’s would help Ethereum scale more efficiently in the future:
These EIP’s are easier to implement than EIP-4444, so they may be useful as a short-term stopgap until EIP-4444 is ready for production.
The goal of this article is to develop a data-driven understanding of 1) how history growth works and 2) what can be done to solve it. Much of the data in this article has traditionally been difficult to access, and so we hope that making it available will offer some novel insight into the history growth problem.
History growth has not received enough attention as a bottleneck on Ethereum scaling. Even without gas limit increases, Ethereum’s current conventions for preserving history will force many nodes to upgrade their hardware within a few years. Luckily, this is not a difficult problem to solve. There is already a clear solution in EIP-4444. We believe the implementation of this EIP should be expedited in order to make room for future gas limit increases.
If you are excited about research in Ethereum scaling, reach out to storm@paradigm.xyz and georgios@paradigm.xyz. We’d love to hear about how you are thinking about the problem and potentially collaborate. The data and code used for this article can be found on Github here.
Thank you to Thomas Thiery, Tim Beiko, Toni Wahrstaetter, Oliver Nordbjerg, and Roman Krasiuk for review and feedback. Thank you to Achal Srinivasan for the Figure 1 and Figure 7 graphics.
Copyright © 2024 Paradigm Operations LP All rights reserved. “Paradigm” is a trademark, and the triangular mobius symbol is a registered trademark of Paradigm Operations LP