It’s been a while since we made our preview4 benchmark post. Thereafter, we’ve gone through a number of RC releases and finally got to the mainnet launch earlier this month. We have been engaged in protocol improvements and stabilization, while performance has been less of a concern. Now with all those done, it’s time to see what is the current status of Neo nodes and what we can do to make them faster.
Test setup
We do benchmarks using neo-bench, and it hasn’t changed much since the last post except for one small thing— we had to optimize the benchmark itself to reduce its own CPU consumption. That’s because we’ve got to the point where this tool started affecting the results (processing huge blocks is hard not just for nodes). Not in a big way, but still somewhat noticeable, so fixing it was important. Apart from that, it only received some adjustments regarding the protocol changes.
Hardware-wise, testing was performed using a Ryzen 9 5950X machine with 64GB of RAM and SSD. While on software side we’ve used the official Neo C# node 3.0.2 and NeoGo 0.97.2, both orchestrated by neo-bench revision number 09fe7c2bd587a12a44410e12674837be0c07523e. Both nodes used LevelDB as a storage backend.
Single node
Just a friendly reminder, our regular single-node test runs with one-second block interval and mempool capacity of 50,000 transactions. So, theoretically, the limit for this setup is 50K TPS, and Neo nodes are getting closer to it.
Both nodes have substantially improved their single-node performance with C# implementation providing an average of 7900 TPS and NeoGo 30300 TPS, that’s 48% and 55% more than what was observed in the preview4 benchmark respectively. There is a noticeable difference in how this was achieved though.
In preview4 release, C# node had experienced occasional drops in per-block TPS values due to lower number of transactions packed into a single block and inter-block time interval spiking up to 2 seconds and more. This is just not the case anymore, the node packs around 8000 transactions into every block during the whole test and delivers these blocks in time, deviating from the perfect one second by some mere 50–70 ms.
NeoGo node however still shows the tick-tock pattern on its graph for the same reasons as with preview4, but it has moved up somewhat and reduced the amplitude. More importantly, though it no longer shows any spikes in block times that could be seen with preview4, now it’s even closer to the target than C# node adding just 20–40 ms overhead. The line is somewhat shorter than for C# just because the benchmark runs out of transactions sooner (1,000,000 of them fit into 32 blocks).
Resource utilization just reflects the other plots, both nodes tend to produce more smooth lines there with NeoGo taking more cores for its job on average. Memory usage profile is somewhat more interesting though if we’re to compare it with preview4 one for NeoGo, because even though its TPS metrics have improved its memory utilization actually dropped at the same time by around 30%.
Four nodes
This network runs with a five second interval between blocks, so its theoretical limit is 10K TPS. And we’re getting really close.
NeoGo now reaches 8800 TPS on average and C# node delivers around 1000 TPS. C# node still experiences some problems with blocks containing 50K transactions (whole mempool), so the pattern is very similar to the one seen in preview4 and the 10% difference in numbers is actually less than what can be observed between successive runs of benchmark for this node.
But things radically changed for NeoGo node, once it warms up it just packs a little less than 50K transactions into every block until the end of the transaction stream and it does so with a typical block interval of 5100–5150 ms. Combination of these factors actually drives TPS metrics way up into the 9600–9700 range for the most of the test duration with average being somewhat lower just because of initial/ending blocks.
CPU utilization patterns are pretty much the same as were in preview4, but memory consumption has decreased substantially for both nodes.
Conclusion
As we can see, a single Neo N3 node can now provide about 30K (or 3W if you prefer) TPS in NeoGo implementation and 8.8K is no longer a big problem in networked scenario. While these raw numbers are measured in a somewhat sterile environment they still are important to understand where the limits are and what can be expected of a real network. Neo N3 protocol has a lot of potential in it and we’re ready to deliver this potential to our users.