Cutting blockchain tail with NeoGo
The problem of the ever-growing blockchain state is well-known. Even though some people want to always have a complete archive available on their nodes, in many cases it’s not needed, and applications can work just fine using only some subset of recent data. But what and how can be done to provide that depends both on the protocol and implementation. Luckily, NeoGo has everything needed, and in this article we’ll explain what your options are and what to expect from them.
State components and options
The state stored in the DB can be roughly divided into several main categories:
- contract storage
- block headers
- block transactions
- MPT data
- transfer data
- auxiliary data (depends on implementation, but negligible in volume)
Contract storage is the essence of the system. While it grows over time, this growth reflects the use of the system and can’t be avoided. Its growth at the same time is quite modest and mostly associated with new accounts/contracts because shuffling the existing assets between old participants only changes the records in the DB but does not create them.
Block headers are always stored in the DB, that’s where the block chain comes from, so they’re crucial. In Neo N3 a header is roughly 700 bytes, so for some 1–2M of blocks it’s not a lot (700–1400 MB), but of course, the number inevitably grows over the years of blockchain life.
What’s more interesting is transactions that are a part of a complete block. Neo N3 has two important protocol parameters related to them: MaxValidUntilBlockIncrement and MaxTraceableBlocks. MaxValidUntilBlockIncrement is tightly related to transaction’s ValidUntilBlock field that makes transaction incorrect after the height specified there; it specifies the maximum ValidUntilBlock value difference from the current blockchain height for a transaction to be considered correct. This is important for transaction validity checks, any new incoming transaction technically should be checked for duplicates only against the set of transactions from the previous MaxValidUntilBlockIncrement number of blocks, older ones just can’t have this transaction. On current Neo public networks (mainnet/testnet) this parameter is set to 5760, which is one day of 15-seconds blocks.
But this doesn’t mean any older transactions can be removed. When a transaction is run, it can access some data from the older blocks/transactions and the depth at which these requests can look is controlled by the MaxTraceableBlocks network setting. Neo Legacy never had that, it wasn’t possible with the UTXO model, but for N3 that’s one of the core settings. On current public networks it’s set to 2102400, which is one full year of blockchain life with 15-seconds blocks.
The key N3 protocol feature coming from the combination of these two parameters is that once a transaction becomes older than MaxValidUntilBlockIncrement and MaxTraceableBlocks relative to the current height, it can be safely deleted from the storage. Given the difference in defaults and the expected usage scenarios, usually it’s the MaxTraceableBlocks setting that is the most important regarding the ability to drop old data.
Another and in fact very important source of data is Merkle-Patricia Trie that in some ways duplicates contract storage because it stores the same key-value set but in a radically different way. It allows computing a single hash value known as “state root” corresponding to a block. C# node implementation doesn’t have it by default, it’s implemented by the StateService plugin that needs to be installed, but NeoGo team believes that this data is so important (allowing to detect any state inconsistencies between nodes quickly) that MPT is always calculated and stored there without an option to turn it off (yes, it works even when we’re setting new performance records).
Given that any block does some state changes, MPT also changes with every block. Each key-value pair change affects all MPT nodes from the leaf changed to the top, for the current tries it’s roughly 10 nodes. This means that many new nodes get created with every block and if you don’t do anything, the DB grows very quickly. In fact, when this functionality was introduced into Neo Legacy (0.76.0 version on NeoGo timeline) the DB size exploded more than 2-fold, and very soon everyone realized that storing all of MPT data is quite expensive.
Removing any of this data is not easy, any trie for any block has some nodes of its own and also references a big set of nodes from previous block’s MPT, while some nodes can be reused even in the same MPT. Back then, the NeoGo team did some explorations around garbage collection schemes, but they performed very poorly, rendering the system hardly usable. The main problem was that we needed to do a full MPT traversal during the GC cycle and it can’t be done quickly. C# developers at the same time tried reference counting; and this approach turned out to be more viable, resulting in the implementation of KeepOnlyLatest option (NeoGo introduced it in version 0.78.2). This option makes the node store only one MPT (instead of the whole history of MPTs), in many ways like the node stores only the latest contract storage data. The DB size dropped back almost to where it had been before state tracking was implemented (but Legacy had UTXO, so many transactions never changed contract storage).
The same option is available in the current NeoGo versions, and its logic hasn’t changed, KeepOnlyLatest allows to drop all of old MPT data but the latest one. At the same time, additional logic was implemented in NeoGo 0.98.2, more on that below.
Tracking asset transfers is also one of the key things expected from most nodes. C# node provides that with TokensTracker plugin, which tends to be used a lot, while NeoGo has that built-in and always enabled. The log of transfers grows as transactions get accepted and processed, so we can’t ignore it too.
The evolution of RemoveUntraceableBlocks
The main option (not counting MPT-specific KeepOnlyLatest) to deal with old data was introduced in version 0.92.0 of NeoGo. It was named RemoveUntraceableBlocks, and it does exactly what you might expect of it — removes block data that is unreachable (as of MaxTraceableBlocks setting) from the current transactions.
As you might already have noted, this only allowed removing one part of the old DB data, but we also had the KeepOnlyLatest option to deal with MPT data, so the only thing left in this case was transfer data. While it was kept forever, it at the same time wasn’t very big in volume, at least not requiring a lot of attention for the chains we had back then, so this combination was sufficient in many ways.
Things have changed over time, the chains grew, but more importantly NeoGo has also introduced another extension in version 0.97.3 that allows synchronizing contract storage state over the P2P network (P2PStateExchangeExtensions option). This extension relies on MPT data and another NeoGo extension that puts state root hash into the block header (just to remind everyone — none of these can be used on current public networks). The problem was that this functionality requires keeping at least three MPTs at different heights, so it’s inherently incompatible with the KeepOnlyLatest setting. But, of course, we wanted to drop old MPT data from the storage while still using these extensions.
That spurred another round of experimentation, but the problem turned out to be tougher than originally expected. It was trivial to keep all MPT data, it was very easy to keep just one MPT, but keeping a set of MPTs was a real challenge. They couldn’t be reference-counted (that’d mean traversing the whole MPT for every block to increment all counters), and our previous garbage collection attempts failed to reach adequate performance.
It required us several months of trying various schemes and optimizations to finally find the way to do this. All required DB changes and logic were released in NeoGo version 0.98.2. The base of this approach is still garbage collection, but the changes made to the DB allowed us to avoid MPT traversals even for GC cycle. We’ve also implemented garbage collection for transfer data, so starting from version 0.98.2 NeoGo can now drop all obsolete data in any configuration.
Setting RemoveUntraceableBlocks now activates all of the logic responsible for dropping old data. There is also an associated configuration parameter, GarbageCollectionPeriod that defaults to 10000 and specifies how often (in blocks) garbage collection is performed.
What to expect from these options on public networks? That’s what we wanted to find out, so a test was set up using NeoGo 0.98.3 on two machines using block dumps from mainnet (1508926 blocks) and testnet (1599728 blocks). The test was performing offline synchronization from these dumps using the `db restore` command.
The two machines used are (both Linux-based):
- Core i7–8565U with 16 GB of memory
- Ryzen 9 5950X with 64 GB of memory
During our long quest for an optimal GC solution, we’ve also noticed that different databases react differently to variations in loads and schemes. Thus, we’ve performed the test on both databases NeoGo supports: LevelDB and BoltDB.
And we tested the following node configurations:
- default one (keeps everything)
- KeepOnlyLatest enabled (only one MPT stored, everything else untouched)
- RemoveUntraceableBlocks enabled (and default 2M+ MaxTraceableBlocks, for both networks this means that nothing is actually removed, but the DB is prepared for removal at 2M+ heights)
- MaxTraceableBlocks set to 100K blocks (and RemoveUntraceableBlocks enabled)
- MaxTraceableBlocks set to 10K blocks (and RemoveUntraceableBlocks enabled)
Warning! Changing MaxTraceableBlocks can make your node incompatible with the network, it’s a network-wide protocol parameter, not something to be changed by a node. Here we’re doing it just for the test; but even in this case it wasn’t possible for us to test 10K MaxTraceableBlocks setting for the testnet, some transaction there looks back 22K blocks, so it fails and then leads to a state that is incompatible with the subsequent transactions.
- KeepOnlyLatest (KOL)
- RemoveUntraceable (RUB)
- MaxTraceableBlocks (MTB)
LevelDB/Ryzen 9 5950X
BoltDB/Ryzen 9 5950X
LevelDB/Ryzen 9 5950X
BoltDB/Ryzen 9 5950X
First of all, we can clearly see that all options work as expected with respect to the database size, removing all MPTs but one or keeping less and less of a tail makes the DB much more compact. The difference will only grow with time because the chain dumps we used don’t have even one full year of blockchain life.
At the same time, there is a clear performance overhead for garbage collection or reference counting which in many cases can be compensated by the effectiveness of the smaller DB. Enabling RemoveUntraceableBlocks with the network settings we have now clearly hits the performance in all cases, but if it was allowed to cut the tail more aggressively it could not only save disk space, but also save time.
Regarding the DB difference, LevelDB is still a good choice for archive nodes that want to keep everything, BoltDB tends to use more disk space and is slower when it comes to write-intensive loads. However, it performs significantly better in any tail-cutting scenario and we expect space-preserving options to be used a lot in production. Also it’s better for read-intensive loads in general and while what we’re doing here is just processing the chain, any real node will also serve a lot of other requests related to P2P/RPC activity and we expect these to be handled faster with BoltDB (like it works better in our TPS benchmarks).
As of today, RemoveUntraceableBlocks does not change much for public networks. This will definitely change as these chains grow, but private ones with more aggressive MaxTraceableBlocks settings already can win a lot by using it. And KeepOnlyLatest is still there and still is very effective (especially if used with BoltDB). So, all the options are there to slim down your node and they will be more and more relevant in future.