Up in the mountains: reaching 50K TPS with Neo
Careful readers might have noticed that NeoSPCC has spent some time improving Neo performance for the last two years. Starting with the early steps of benchmarking tool creation, going through N3 preview phase up to preview4 and then to the final N3 version. While the performance characteristics we have now can be considered as sufficient, we are still slowly introducing some improvements into NeoGo codebase. The recent 0.98.0 release contains a bunch of such changes, so let’s take a look at what can be achieved with it.
We’re using neo-bench b0d9cb2c83fda5fef741e19df3497109754b8323, but this time we’ve decided to try more combinations of various parameters (more on that later) and stock NeoGo 0.98.0 on the same machine as for the recent tests (Ryzen 9 5950X, 64 GB RAM, SSD).
First, we’ll repeat the same test as it was done with 0.97.2, using 50K mempool, 10 worker threads, and 1M transactions.
34800 TPS using LevelDB is already a 15% improvement over the latest result (if we’re to compare apples to apples), but BoltDB allows to push even more than that with an average of 38700 TPS. Back in the 0.97.2 test, we had not even shown the results for BoltDB because there was nothing interesting to see there, it was about the same as with LevelDB. But recent changes made to the node tend to favor this type of DB now, so it’s nice to have something to choose from.
As you can see, we still have a sawtooth-like pattern in this mode, and we know exactly what causes it since preview4 days. Even though we get better and better average results, the mempool size still limits what can be achieved — the node can’t put more transactions in a block than it has in the mempool and it can’t accept new transactions until current ones are completely persisted. Technically, the maximum number of transactions we can fit in a block is 65,535, so given this limitation a natural choice for a mempool size would be 131,070, that is two full blocks.
To ensure transactions come into the node at the maximum possible rate, we’ve also increased the number of threads pushing them via RPC from 10 to 30. We’ve used these setting previously; depending on the machine and node version, it can either allow to squeeze a bit more performance or just not be noticeable at all.
At the same time, we’ve noticed that even with 50K mempool, single-node test now finishes in less than 30 seconds. Thus we’ve decided to up the ante and push three times more transactions into the node: 3,000,000 of them.
Below are the results with all of these changes (that are mostly just proper node configuration for the test).
Proper node configuration changes the behavior radically and allows to reach 42400 TPS using LevelDB and 52100 with BoltDB. There is no tick-tock pattern anymore, and most of the time the system stays close to 60K TPS, but still there are some occasional dropouts that bring the average down.
The reason for this behavior is an increased block time; and that happens because of disk synchronization that takes more time than expected. NeoGo flushes changes to disk every second, most of the time it takes some milliseconds to do that, but depending on the DB and disk state, some writes can take a second or even more, that’s where the node can slow down (to recover quickly). That’s a clear area for future improvements for us, but still more than 50K average with 3M transactions looks like a good result.
One more thing should be noted about memory usage. Technically, it’s about the same as in the 0.97.2 version, but BoltDB shows much higher memory usage and the reason is that it maps whole DB file into memory. So while it can work with less than that available, for optimal performance it’s better to have enough physical RAM to fit all of it.
Our previous test clearly reached the settings limit, so just increasing the mempool wouldn’t be enough for this network. Hence, we’ve also decreased block time in this setup from five seconds to just two. While it seems very aggressive (four nodes need to agree on a quite big block during this timeframe), it actually works fine.
We can reach 12500 TPS using LevelDB here and 15400 TPS with BoltDB. It’s much more interesting to see how a number of nodes cope with this load and the behavior is more complex — blocks tend to vary both in size and time. Yet, the system deals with it and processes all 3M of transactions.
We should emphasize that while the system we’re running these tests on is quite capable, it’s probably not the top-notch hardware by the 2021 (soon to be 2022) standards. It lacks any fancy GPU, it’s not overclocked, and even the amount of RAM it has is somewhat unnecessary, it’s just that the machine has it, but all of these tests could easily fit into 16GB. So, properly configured N3 networks can deliver a lot of raw performance on commodity hardware, 50K TPS are possible for a single node and 15K for four nodes.
The question that is often being asked is “where is the limit”? And frankly, we don’t know yet. Although we’ve done most of the obvious optimizations and every additional percent costs more and more in terms of development and testing efforts, the Neo N3 protocol itself only stabilized relatively recently, so there are still some known areas for improvement. In any event, Neo N3 networks are there, they’re accepting more and more transactions every day, and we know that Neo will handle them.