Time flies fast and a number of important things have happened since our previous post on benchmarking. Neo project went through v3.0.0-preview2 and v3.0.0-preview3 versions of Neo node followed by compatible 0.90.0 and 0.91.0 versions of NeoGo. With NeoGo 0.90.0 release we started making neo-bench compatible with Neo 3 versions of both C# and Golang nodes and testing preview2-compatible nodes. Preview3 and 0.91.0, however, emerged so quickly that we hadn’t managed to publish those results. At the same time, we now have something to share for preview3 which is much more relevant, so dive in.
Testing setup and methods
We’re using the same single-node and “4 nodes + RPC” setups as before with the same single-threaded and multithreaded modes for pushing transactions into the network. But Neo 3 required some adjustments to both the initial setup and the transactions sent.
As Neo 3 now has a substantial amount of GAS provided in the genesis block, we no longer need thousands of blocks to be pushed into the chain for subsequent contract deployment and calls. At the same time, we need to make it more convenient to work with the chain so we add one block moving all NEO and almost all GAS from default multisignature account to a simple one. Then, as we’re stress-testing the node we also tune the default network limits such as maximum transaction number per block and maximum block size, otherwise TPS value would be severely limited.
Test transactions have been changed considerably, whereas for Neo 2 they were dummy “PUSH1” invocations (not doing anything useful), now we create transactions doing real NEP5 transfers of NEO (though the transfers go to the same account that sends it).
As before, we poll nodes via RPC to get new blocks for subsequent analysis and TPS calculations while sampling CPU and memory usage at the same time.
To compare two implementations we’ve used C# node version v3.0.0-preview3 with RPC Server plugin installed and NeoGo node version 0.91.0 (which is v3.0.0-preview3 compatible). The same node versions have been used for both consensus and RPC.
Nodes are set up using default configurations except for mempool size and time per block value. For a single-node we use a larger mempool of 500,000 transactions with 1 second per block and for four nodes we use a default 50,000 transactions mempool but with 5 seconds per block. This allows consensus nodes to push more smaller-sized blocks without overflowing the mempool during the process.
Plus, we’ve also added one special network configuration in comparison with the previous test as we’d noted that C# node’s RPC subsystem might be a bottleneck in the four nodes network setup — this setup uses four C# consensus nodes with one NeoGo RPC node.
At the same time, we know that NeoGo is a bit overly cautious in some aspects. For example, by default it verifies in-block transactions as a part of overall block check. This obviously adds some overhead, while C# node never does it. So there is one more additional experimental setup being measured here, that is NeoGo with “VerifyTransactions: false” setting in the configuration (“no VT”). Please note that this does not affect regular verifications for transactions accepted by the node via RPC or P2P and put into the mempool, this setting only changes the way blocks are verified.
To better compare with the previous Neo 2 stress-tests, we run the same set of test-cases:
- Single-node consensus: 10 workers (Case 2, C# node and Golang node with/without transactions validation)
- Single-node consensus: 30 workers (Case 2, C# node and Golang node with/without transactions validation)
- Single-node consensus: 100 workers (Case 2, Golang node with/without transactions validation)
- Single-node consensus: 25 Ops/s (Case 2, C# node and Golang node with/without transactions validation)
- Single-node consensus: 1000 Ops/s (Case 2, Golang node with/without transaction validation)
- Four-nodes consensus: 10 workers (Case 1, C# nodes with C#/Golang RPC node and Golang nodes with/without transaction validation)
- Four-nodes consensus: 30 workers (Case 1, C# nodes with C#/Golang RPC node and Golang nodes with/without transaction validation)
- Four-nodes consensus: 100 workers (Case 1, C# nodes with Golang RPC node and Golang nodes with/without transaction validation)
Results for single-node consensus test: 10 workers
In this case we observe that the Golang node with a standard configuration shows 2974.564 average TPS which is about 1.5 times greater than the average C# node TPS ≈1887.514. Comparing to the similar Neo 2 test, the TPS level of both Golang node and C# node has doubled, which is quite remarkable and shows that Neo protocol improvements are real. At the same time, we can see that NeoGo node with disabled in-block transaction validation easily jumps two-fold even from these numbers achieving an average TPS of 6962.407. Still, C# node’s blocks are more stable, oscillating slightly around an average value, while NeoGo blocks jump more in size and effective TPS.
If we’re to look at CPU and memory utilization we can see that C# node is only using around 25–30% of CPU, so there is a significant potential for improvement. At the same time, it does much better with memory management and results for Go node with and without in-block transaction validation actually give some clue on things that could be improved there.
Results for single-node consensus test: 30 workers
Increasing the number of workers allows to squeeze some additional juice from NeoGo nodes (with 2764.385 TPS for standard configuration and 6396.179 for NeoGo with in-block transaction validation disabled), C# node’s performance drops a little to 1779.307 TPS.
CPU and memory usage patterns echo the previous test results.
Results for single-node consensus test: 100 workers
C# node’s RPC subsystem still can’t handle more than 38 worker threads simultaneously pushing requests to it by default, so only NeoGo can handle this test. The result is almost the same as it was in the previous test with very little improvement (6474.284 Tx/s and 2816.046 Tx/s). Still, even the latter value is almost 1.5 times higher than what we had for the Neo 2 version of the node.
Results for single-node consensus test: 25 Ops/s
Obviously, after the workers mode pushing thousands of transactions per seconds, this 25 transactions per second test is like nothing for all nodes; they easily handle it being close to this 25 TPS value. But we can see it again that C# Neo node is more stable in its results, it shows very little spikes in blocks produced, while NeoGo is jumping from 18 Tx/s to 38 Tx/s and back with almost every block.
Even though CPU utilization rate is very close, NeoGo still uses a little more of it and given that the load is fixed and is very low, it’s not a compliment. What’s even worse is the memory utilization, but that at least partly could be attributed to the difference between Go and dotNet runtimes — while C# node starts at around 40 MB and then grows a little, NeoGo starts at around 60 MB and then may almost double this value depending on configuration.
Results for single-node consensus test: 1000 Ops/s
Attempts to run tests with more than 50 Ops/s for C# Neo node have failed — the node still returned RPC errors in this mode and didn’t work properly. Golang node can handle 1000 Tx/s being pushed by one client irrespective of in-block transactions validation setting. That may come as no surprise, though, given that it successfully did that even back in Neo 2 days.
Results for four-nodes consensus test: 10 workers
Things start to get more interesting here as four nodes obviously have to communicate to produce a block and transactions need to reach all consensus nodes to be included in the block. What we see here is NeoGo giving a more than 3 times higher result in default configurations (864.146 tx/s vs 267.765 tx/s) and this time in-block transaction validation doesn’t affect the result much, it’s just 878.3 tx/s. At the same time, it’s easy to improve C# node’s values by using more efficient RPC node (NeoGo one), this mixed setup shows an average of 373.294 TPS.
But just browse through our previous article and compare all of this with Neo 2 results — it’s about 2 times better for NeoGo and almost 6 times better for C# node. Keeping in mind that Neo 3 is not even finished yet, it’ll definitely improve even relative to these new numbers.
C# nodes connected to the Golang RPC node have high CPU and growing memory consumption which is quite an interesting pattern. It may be related to RPC node’s block processing because pure C# nodes setup is relatively stable in its resource utilization.
Results for four-nodes consensus test: 30 workers
It is a surprise that adding worker threads actually degrades the results now for NeoGo node which shows ≈717/812 tx/s with enabled/disabled in-block transactions validation. At the same time, it helps C# node with its 341.965 tx/s for C# RPC node setup. With NeoGo RPC it’s even 50 tx/s higher than that.
As well as during the previous runs, Golang node shows higher memory consumption.
Results for four-nodes consensus test: 100 workers
C# nodes connected to C# RPC node were unable to maintain the load of 100 workers, but it’s interesting to look at the configuration with 4 C# nodes connected to Go RPC node. This configuration can easily handle the load, but it doesn’t improve much in comparison with the previous results.
- Neo 3 is much better than Neo 2 in terms of raw performance as a protocol, which is proven by two independent implementations
- maximum number of RPC connections can actually be configured for C# node and neo-bench should take that into account (it was discovered only recently)
- RPC subsystem of C# node still can be improved, using NeoGo RPC node instead of C# one gives better results
- C# node performance can definitely be improved at least in single-node mode where it’s not using all of CPU power available (and maybe neo-project/neo#1507 changes are exactly what’s needed here)
- NeoGo certainly needs to take memory consumption under control, it’s not leaking anything as far as we know, but still it could use a little less of this precious resource
- NeoGo should also improve its consensus process to more reliably produce blocks of around the same size under load instead of jumping ten-fold
We hope these results and our updated benchmarking tool will help all the Neo community to improve Neo 3 protocol as well as any implementations of it.