10K TPS on Neo and beyond

5 min readSep 17, 2020

The figure of 10K TPS has been long known as theoretically possible for Neo, but claiming to know kung fu is not the same as showing it. That’s why after the Neo 3 preview3-compatible release of NeoGo 0.91.0 in August and some associated performance measurements we’ve asked ourselves — can we do better than that and how far away that 10K TPS is? NeoGo had never been truly optimized for TPS before (we had a lot of other things to do), so we started looking into ways to improve it and now we have something to show.

Testing setup

We’re constantly improving neo-bench which is the tool we use for benchmarking and there have been some changes made since the preview3 version of it. Apart from the new functionality provided (such as building C# node from the source code), there were fixes for the RATE mode and some other corrections. It also now saves a bit more data and can draw more graphics for you.

Meanwhile, one change stands out, even though it’s just a one-liner. We are configuring the network for performance testing, which means that regular policy limits are too low for us, but for quite some time we were fine with the limit of 40K transactions per block. Until after some optimizations we saw how the node fills one block after another with exactly 40K transactions. So, this policy became a bottleneck. Of course, we have tried to increase that value, but it turned out that the current Neo 3 protocol just doesn’t allow any values of more than 64K. And even we’re to break the protocol here we could only go twice higher before hitting another limit, so it’s not that easy and an increase in transactions per block is not the same as an increase in transactions per second because these huge blocks incur more processing overhead. Thus, for now we’re just setting it to 64K.

Moreover, we no longer experiment with VerifyTransactions configuration. After all the changes done to the node, it does not affect the performance that much to care about it (even though it still does what it’s intended to). At the same time, we’ve started experimenting with different DB backends (and we have four of them supported). Even though the main focus has always been on LevelDB that is the default, BadgerDB showed some interesting results, so we added it into the mix.

Other than that, it’s the same tool with the same single-node and four-nodes setups running on a commodity laptop with i7–8565U CPU and 16 GB of RAM. For this post we’ve used commit efb3856252fcc924de9c7d6b4d75b78478f6816c of neo-bench and tested version 1dc93216eeca06880f10e435dd56a32840530e7e of NeoGo (which has some post-preview3 functional changes).

Single-node performance

We’ve concentrated on the 30 worker threads mode because the 10 worker threads mode tends to underutilize the node a bit and the ‘100’ mode usually doesn’t add much. Below is what we have for a single node with 1M transactions pushed into it:

Average TPS values for the test are 10,374 for LevelDB and 11,981 for BadgerDB. But if the curve shape looks a little strange to you — you’re not alone and here’s why it happens:

The node starts packing transactions into blocks and with a third block it hits the 64K limit we’ve talked about earlier. It can’t add more transactions into a block, but neo-bench worker threads keep pushing transactions into mempool (that has 500K capacity in this mode) and it absorbs them until workers run out of transactions (they only have 1M of them) around block 11–12. That frees up some resources and the node starts packing the same 64K transactions into blocks faster, which leads to an increase in per-block TPS value. See the CPU utilization, there is an obvious drop when all remaining transactions settle in the mempool:

So what we have is a very capable and scalable RPC and mempool combination that validates and stores transactions faster than consensus and persisting logic can create and process blocks. Still, that’s 1M transactions packed into blocks with a very nice resulting average TPS and obviously we now know well where to look in an endless quest for even better performance values.

Four-nodes performance

But single node is single node, let’s also look into the four-nodes setup that has improved too:

We have an average of 1759 TPS for LevelDB and 2000 for BadgerDB, but we have to admit that these values are less stable and there is a simple reason for that. We run tests on a single machine with just four physical (plus four HT) cores, and in this mode we run five (4 CNs plus 1 RPC) nodes there that accept RPC, distribute transactions via P2P, approve blocks in consensus, and process them. All of these nodes try to get as much CPU power and disk I/O as they can, but the machine is not that powerful. Yet improvements are still clear with the 2K TPS reached using BadgerDB.

RATE mode stability

Remember that shaky TPS graphic for the RATE mode with preview3-compatible NeoGo that jumped from 18 Tx/s to 38 Tx/s and back with every block? Well, it’s not a problem any more, now, with 50 RPS it looks like this for a single node:

And like this for four nodes (not very different, really):

Thus, this problem has been solved as well, the behavior is much more stable for different fixed rates.

Conclusion

10K TPS on Neo is a reality for a single node
2K TPS is a reality for four nodes setup (and probably more if one is to use a better machine to run these tests on)
there is still some potential for improvement, but now it mostly depends on block processing optimization rather than on transaction validation/pooling

Nevertheless, the most important thing probably is that all of these improvements would be impossible without an open benchmarking platform that provides reproducible results and can work with any node implementation. Because it is exactly what makes the difference between thinking you are and knowing you are.