Under pressure: time-constrained dBFT

Neo SPCC
24 min readMay 29, 2024

--

TL;DR

Have you ever wondered how fast the dBFT is? In our latest comprehensive study using the enhanced neo-bench tool, we pushed the NeoGo network to its operational limits to identify the minimum possible time that dBFT spends on block acceptance under various load conditions. The results are inspiring: a distributed cluster with 7 consensus nodes can achieve a 16K TPS rate with a median of ~140 milliseconds per block! Interesting, right?

We conducted a study where dBFT was put into extremely time-constrained conditions and stress-tested against the liveness and throughput. Key takeaways of this study include:

  • Minimal Reachable Time Per Block Established: Through systematic testing, we pinpointed the lowest median time per block of ~140 ms for clustered environments (reaching ~16K TPS) and ~29 ms for local settings (reaching ~4K TPS). That is effectively the minimum possible block time that the system can function with.
  • Minimal Optimal Time Per Block Established: The minimum block time setting showed variable intervals in block production. Therefore, we identified the optimal minimum block times: 300 ms for clustered environments, which supported up to ~17.5K TPS, and 100 ms for local environments, which achieved up to 2.8K TPS. These settings robustly maintained system efficiency without compromising reliability or throughput.
  • Robust Under Stress: Even under high loads and extremely low block times, the network maintained significant stability and reliability. It efficiently handled blocks, achieving block times close to the target across both the resource-constrained local setup and the real over-the-internet cluster setup with inevitable and variable communication delays.

Overview

In our continuous effort to push the boundaries of the Neo N3 network, we have embarked on a focused exploration to identify the operational limits under varying block time constraints, utilizing our enhanced benchmarking tool, neo-bench. This detailed investigation is a key part of our strategy to enhance the network’s efficiency and robustness and pinpoint the thresholds where the network’s performance might degrade under pressure. Our primary goal was to investigate the dBFT behaviour and network liveness in time-constrained conditions to squeeze as much from the network as possible with frequently accepted blocks (minimizing acceptance delays for transactions).

The test was conducted using two distinct environments with NeoGo v0.105.1 and neo-bench, which was recently enhanced with new flags (including `msPerBlock`), to analyze the impact of block time variations, specifically:

1. Local Environment Testing:

  • Setup: Four consensus nodes and a single RPC node. All are hosted within Docker containers on a single machine (MacBook Air 2020 M1, 8 GB RAM, 512 GB SSD).
  • Purpose: This phase was designed to gain a preliminary understanding of the network’s behaviour in a resource-constrained environment with no real network communication latencies between the nodes. We aimed to establish a baseline for how the network handles transaction processing and resource utilization as we manipulated the MillisecondsPerBlock protocol setting in an extremely hardware-constrained environment.
  • Methodology: Testing began in a “worker” neo-bench mode continuously pushing as many transactions to the network as the nodes can accept to establish performance baselines with various block times. These tests were followed by “rate” mode testing aimed to provide a specific constant-value load to the network to see how changes in block time affected the network’s ability to handle different load rates.

2. Cluster Environment Testing:

  • Setup: An expanded configuration involving eight physical machines (AMD Ryzen 5 3600, 64 GB RAM, 500 GB NVMe SSD) — seven consensus nodes and one RPC node combined with the neo-bench loader instance. Several data centers in two different countries were used.
  • Purpose: To emulate a more realistic, decentralized network scenario with network communication latencies. This setup was crucial for evaluating the dBFT liveness and network’s capability under distributed conditions that closely mirror actual usage and potential stress factors in a time-constrained block scenario.
  • Methodology: Similar to the local tests, we used worker mode to find the upper limits of performance and rate mode to assess how well the network managed higher transaction rates at minimal block intervals.

The primary goal of these exhaustive tests was to:

  • Identify the Breakpoint: Determine the highest load the network can sustain before performance becomes unacceptable with standard block time. This involved pushing the network to its operational limits to find where it starts to fail.
  • Minimize Block Time: Subsequently, decrease the MSPerBlock network setting under constant-rate pressure to find the minimum possible time per block that the network can provide.
  • Balance Performance and Latency: After identifying the network’s limits, the next step was to find an optimal balance between performance (transaction throughput) and latency (block time). This balance is crucial for ensuring that the network can handle real-world applications efficiently without compromising on speed or reliability.

These experiments are critical for the ongoing development of the metadata handling subsystem for NeoFS networks, but they’re also relevant for other Neo networks.

Benchmark Setup

Configuration Details

  • Node Setup: All tests were conducted using NeoGo v0.105.1 nodes, following our focused analysis on these nodes due to their outstanding performance metrics discussed in our latest benchmark post.
  • Environment Consistency: For each test run we start from scratch to ensure consistency and accuracy of our results. This involved clearing the environment and resetting the blockchain and network to initial conditions.
  • Transaction Generation: Since our primary goal was to investigate network behaviour in a time-constrained environment, we didn’t need sophisticated transactions for tests. Thus, as usual, a batch of simple NEP-17 transfers was generated to simulate network activity.

We systematically reduced the MSPerBlock protocol setting from the default 5000 ms to as low as 25 ms. This range allowed us to explore the network’s responsiveness and stability across a spectrum of block time intervals.

Execution Methodology

The benchmarks were executed in phases to gauge performance across different settings methodically:

  • Phase I: Focused on higher MSPerBlock settings to establish baseline performance data on local and external setups under high loader pressure.
  • Phase II: Gradually moved to lower MSPerBlock settings to identify the thresholds where TPS values remained unchanged irrespective of the increasing RPS load or the performance started to degrade under constant loader pressure.

Using the benchmark results, we analyzed transaction throughput and system stability and identified the network’s operational limits in terms of minimum time per block and acceptable RPS load. This analysis was crucial in understanding the maximum load the network and dBFT can handle without significant performance degradation in a time-constrained environment.

Results and Analysis

Local Network Setup

Phase I

In the initial phase of local setup testing, we utilized workers mode with the constant number of workers (10) to evaluate the maximum possible TPS at not-so-small MSPerBlock setting values (the range from 1s to 5s).

Local setup, Phase I: Average TPS / Configured MSPerBlock

CNs can easily handle 3684 average TPS at max for 2-seconds blocks and 3079 average TPS at min for 1-second blocks. It should be mentioned that these values are much lower than the 15K TPS got in our previous benchmarks since this time we use different resource-constrained hardware setup. This setup is still sufficient for our needs since our main goal is to get comparable benchmark results.

Note that the TPS spike happens at 2-seconds blocks, i.e. network with a smaller MSPerBlock setting performs slightly better than the 5-second blocks network. This behavior is permissible in the resource-constrained setup, with some variations possible across different benchmarks. At the same time, in the resource-constrained environment, the 1-second block scenario shows lower throughput which may be a consequence of the growing overhead for block acceptance in the hardware resource consumption.

Local setup, Phase I: MSPerBlock / Block Number

The figure above shows the actual time spent on block acceptance in each of the five experiments. Note that the scatter plot presented above has a logarithmic scale on the MSPerBlock axis since several spikes are rising up to 20 seconds per block.

In general, all series do not contain significant latencies or spikes in the actual block time interval; all series are smooth and close to the desired configured MSPerBlock value. For a 5s target, the network’s median block time is close to 5.44s (~9% higher than configured block time), with less variability, as indicated by tighter 80th (~5.62s, ~12% higher than desired value) and 90th (~5.86s, ~17% higher than desired value). Conversely, at a 1s target, block times are less consistent, with a median of 1.16s (~16% higher than the configured block time) and 1.61s at the 75th percentile (~61% higher than the desired value). However, there are several spikes at the end of every experiment likely denoting ChangeViews performed by consensus nodes due to the lack of memory pool synchronization and processing delays.

Below presented some statistics on the worker mode benchmarks for the local setup:

Local setup, Phase I: summarized statistics

The highest reachable TPS of 3684 rounded to the higher thousands was taken as the breaking point for the subsequent rate mode testing scenarios. It should be noted that due to the imprecise nature of the load, the target neo-bench RPS level does not directly translate into TPS. Therefore, we are pushing slightly more transactions than the network can comfortably handle. But this value (4K target RPS) is still close to the defined maximum TPS in the workers mode.

Phase II

Starting from the load of 1000 requests per second for 5-second blocks and varying RPS value up to the practically defined breakpoint of 4000 RPS with a step of 1000 RPS, we’ve conducted a set of experiments for network setups with different MSPerBlock settings.

Below is a set of figures for setups with MSPerBlock configured to be 5s, 1s, 600ms, 300ms and 25ms under increasing RPS load. For every benchmarked MSPerBlock target value two plots are presented:

  • A low-scaled plot containing measurements for every single block in the test. This plot shows the overall deviation in the block acceptance interval and includes all spikes. This plot also displays the mean and standard deviation of the real block acceptance delays.
  • A zoomed plot containing the series with a unified scale with measurements for blocks excluding spikes and strikes, i.e. the most valuable part of the first plot. This plot also includes some statistics (median and 80, 90, and 95 percentile scores) calculated based on the whole set of measurements.

5-seconds blocks

Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 5-seconds blocks
Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 5-seconds blocks (zoomed)

The first analyzed setup is a network with 5-second blocks that showed robustness against 1–3K RPS load and maximum TPS of 3034 under 4K rate load. Under minimal tested load of 1K RPS the network effectively adheres to the configured block interval, demonstrating robust block timing very close to the target. 95% of blocks are accepted within less than 5.03 ms (0.6% slower than the configured value) and it’s clear that no ChangeView happens at all.

However, it’s noticeable that increasing the RPS load to 2K-4K results in larger deviations from the desired block time by the end of each benchmark. Thus, under the 4K RPS target load, only a few blocks are accepted within the same 5.03 ms whereas the mean time of block acceptance has increased up to 5.06 ms (1.2% slower than the configured value) which is still very close to the desired result.

For those who are interested in some more precise numbers, we offer the following table of statistics:

Local setup, Phase II: summarized statistics for 5-seconds blocks

1-second blocks

Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 1-second blocks
Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 1-second blocks (zoomed)

Further analysis of 1-second blocks shows the best attempt reaching 2544 TPS under the highest load of 4K RPS. The overall trend remains the same: only 1.5% (15 ms) higher block delay for 95% of all blocks compared to the target 1s value under the load of 1K RPS. And increasing deviations from the desired block time for increasing RPS load up to 9% slower than the configured value.

However, from an unscaled plot, it can be noticed that “spikes” appear in the middle of every benchmark, i.e. blocks that were accepted with a delay that’s more than two or even three times higher than the target value. These “spikes” are evidence of change views accepted by consensus nodes and confirmed to be caused by the lack of synchronization between the CNs’ memory pools. The number of “spikes” directly depends on the load level starting from 0.34% of “delayed” blocks under 1K RPS load and ending with ~1% of “delayed” blocks under the load of 4K RPS.

Local setup, Phase II: summarized statistics for 1-second blocks

600-ms blocks

Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 600-ms blocks
Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 600-ms blocks (zoomed)

Here comes the most intriguing part of our study: we’ve reduced the minimum time per block configuration value to just 600 milliseconds. As you can see from the plots above, the network remains stable and efficiently manages the increasing RPS load levels, demonstrating its robust capability under intensified conditions. Similar to the 1-second and 5-second experiments, the median block acceptance time ranges from 607ms in the best case — 1.1% slower than the desired block time — to 622 ms in the worst case, which is 3.7% slower. These numbers are strongly comparable with delays obtained from more stable experiments with larger block time. Note, that this time the “worst” experiment in terms of both median and percentile block acceptance delay was the one with a 3K target RPS load. This can be explained by the fact that under the high level of load benchmark results become more unpredictable in terms of the number of ChangeViews happening, and this particular experiment was just an “unlucky” one with more than 4% of delayed blocks.

Also notice that the number of “delayed” blocks accepted after the set of dBFT’s view changes increases significantly compared to the previous experiments: from ~1% under 1K RPS load (with resulting 998 average TPS) to ~4% under the worst case of 3K target RPS load (with resulting 1597 TPS). The best TPS value of 2463 is shown under the load of 4K RPS with around 2% of “delayed” blocks.

Local setup, Phase II: summarized statistics for 600-ms blocks

300-ms blocks

Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 300-ms blocks
Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 300-ms blocks (zoomed)

Pushing the boundaries further, we decreased the target MSPerBlock protocol configuration value down to 300 ms. Starting from a 2K RPS load, 95% of blocks are delayed within ~56 milliseconds compared to the target block time (~19% slower than the desired value). The number of blocks accepted from the second attempt varies from an acceptable 0.95% under a 1K RPS load to a maximum of 4.3% under a 3K RPS load level, which is also not a large value.

Also note that block “spikes” caused by view changes organize several well-shaped lines at the level of 800, 1000 and 2000 milliseconds

Local setup, Phase II: summarized statistics for 300-ms blocks

25-ms blocks

Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 25-ms blocks
Local setup, Phase II: MSPerBlock / Block Number for 1K, 2K, 3K, 4K RPS for 25-ms blocks (zoomed)

The pattern of 300-ms blocks is being preserved for shorter block intervals below 25-ms blocks: the less MSPerBlock setting is, the higher the deviation from the target block time CNs get. That is, for 100 ms of the target block acceptance interval in the best case, 95% of all blocks are accepted by less than 109 ms which is only 9% higher than the desired value whereas the same percentile score for 25-ms blocks equals 34 ms which is 36% slower than the target value. At the same time, the number of spikes, i.e. ChangeViews accepted, does not grow (4.67% in the worst-case scenario of 100-ms blocks vs 4.2% in the worst-case of 25-ms blocks). The number of RPC errors increases significantly compared to the previous benchmarks.

The charts for the 25 ms MSPerBlock setting are presented in the conclusion of the local benchmarks part since this value was the last one the local network could sustain under the load. It’s noticeable that the median block acceptance time is 16% higher than the desired value, with only 75% of all blocks being accepted within less than 30 ms. However, even with this extremely time-constrained scenario dBFT can sustain the load and show the desired TPS values that are very close to the load rate: 999 TPS under 1K RPS load, 1998 TPS under 2K RPS load, 3000 TPS under 3K RPS load and finally, 3993 TPS under 4K RPS load.

Local setup, Phase II: summarized statistics for 25-ms blocks

Local Benchmark Summary

Below is a summarized set of 3D scatter plots describing the dependency of block acceptance delays from the network load level. Each 3D plot corresponds to the fixed MSPerBlock protocol configuration setting. Each series in the plot represents a single TPS level reached under the load rate ranging from 1K to 4K requests per second. The plot outline:

  • Proximity to the target time per block: The closer each series is to a linear path near the target value across the milliseconds per block axis, the more consistent the performance, indicating stable block acceptance times. A smooth, horizontal line indicates optimal performance, characterized by minimal deviations in block time, representing the ideal case.
  • Proximity to the target RPS level: Series at varying heights represent average TPS levels the network is handling. The closer the TPS series aligns with the target load RPS level, the greater the network’s robustness to that load level.

This visual representation helps us quickly grasp the relationship between milliseconds per block and the network load level and assess the network’s efficiency under different load conditions, providing a clear picture of benchmark aggregated results. The key benchmark sum-ups for the 4 consensus nodes local network are:

  • The network can sustain the load up to 4K RPS with the minimum tested block acceptance interval of 25 ms. Under these extremely time-constrained conditions, dBFT maintains acceptable performance, producing blocks with a median interval of 30 ms. It keeps the number of ChangeViews below 5% and achieves an average TPS of 3993, precisely meeting the expected level.
  • The “ideal” target MSPerBlock value the local network can operate consistently with is 100 ms. This setup produces blocks without significant delay spikes (< 4.7 %) with time per block extremely close to the desired value (< 7% higher than the target delay in the median and ~11% slower than the target delay in the 80th percentile) and TPS value extremely close to the desired one (3111 average TPS at max).

These results are evidence of an outstanding dBFT performance, load robustness and unrelieved algorithm potential that is currently unused (remember, the target MSPerBlock protocol value of N3 Mainnet is 15 seconds). Moreover, local benchmark results gave us hope to get external cluster benchmark results nearly as good as local ones.

Local setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K, 2K, 3K, 4K target RPS for 5-seconds blocks
Local setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K, 2K, 3K, 4K target RPS for 1-second blocks
Local setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K, 2K, 3K, 4K target RPS for 600-ms blocks
Local setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K, 2K, 3K, 4K target RPS for 300-ms blocks
Local setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K, 2K, 3K, 4K target RPS for 25-ms blocks

Real Network Setup

Once we’ve got these inspiring results from the local network setup, our next goal is to test how well dBFT performs in a real distributed network scenario with a higher number of consensus nodes. Compared to the extremely resource-constrained local network scenario, consensus nodes from the external cluster are not restricted in hardware resources and, especially, in RAM. On the other hand, this cluster has some realistic network communication delays, so one of our goals is to investigate the impact of these real-life communication delays on the block acceptance interval.

We’ve measured the ping response time between the nodes of our cluster, and it turns out that some of them are closer to each other than others. Cluster machines can be categorized into three groups based on ping response times: the first group, being the most remote, has an average delay of ~25.2 ms; the second group has ~0.435 ms; and the third group, the closest machines, has ~0.022 ms delay. Given this fact and also the fact that consensus nodes need to pass through 3 dBFT rounds to accept a block and send at least two additional protocol messages to retrieve unknown transactions (CMDGetData, CMDData), we can evaluate the approximate communication delays between the nodes.

Phase I

Like the local benchmarks, the external cluster benchmarks begin with worker mode to establish the maximum achievable TPS for the cluster in a typical, time-unconstrained environment. We utilized a range of workers, from 10 to 300, and determined the best average achievable TPS: 9.291 for standard 5-second blocks, 14.747 for 3-second blocks, and 18.319 for 1-second blocks.

Real network setup, Phase I: Average TPS / Configured MSPerBlock

According to the figure above TPS increases with lower MSPerBlock setting since the consensus nodes can drain the partially-filled memory pool more often than with higher block time (mempool capacity remains the same within all setups). It may seem that TPS increases limitless with lower block time, but that’s not exactly true (you’ll see it in the rate benchmarks below). There’s a point starting from which operational and communication delays affect the network performance so that TPS can’t grow further.

Also, note that this graph is more smooth than the same graph for the local network setup which is a direct consequence of more predictable (and reproducible) network delays and a sufficient amount of hardware resources. Referring back to our previous article where the network with the same standard (50K) mempool configuration reached 15K TPS in the 4-nodes scenario, turns out that these synthetic results are not so far from the reality with the physical network communication delays.

Real network setup, Phase I: MSPerBlock / Block Number

The actual block acceptance delays are not so close to the target MSPerBlock value as for local setup, which is completely understandable due to additional communication delays. The median acceptance time grows from 7.6% higher than the target value for 5-second blocks up to 23% higher for 1-second blocks. Also note that the nodes do not fallback to change views in 2s-5s setups, unlike the 1-second blocks setup where spikes happen at the end of the benchmark. This behaviour is caused by the lack of synchronization between the nodes causing peer disconnections and, consequently, view changes.

Real network setup, Phase I: summarized statistics

Based on the average TPS values we’ve defined 20K RPS as the performance ceiling for further experiments with MSPerBlock configuration in the rate mode.

Phase II

We’ve started from the experimental well-carried load of 4K requests per second and conducted experiments for the range of RPS with a step of 4000 RPS up to the breakpoint of 20K RPS defined in the previous step. Below is a set of figures for setups with MSPerBlock configured to be 5s, 1s, 600ms, 300ms, 100ms and 60ms under increasing RPS load. Like for local setup, for every target MSPerBlock value two plots are presented:

  • A low-scaled plot containing measurements for every single block in the benchmark. This plot shows the overall deviation in block acceptance interval and includes all the spikes. This plot also displays the mean and standard deviation of the block acceptance delays.
  • A zoomed plot containing the set of charts with a unified scale for the most valuable part of the first plot, i.e. measurements for blocks excluding spikes and strikes. This plot also includes some statistics (median and 80, 90, and 95 percentile scores) calculated based on the whole set of measurements.

5-seconds blocks

Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 5-seconds blocks
Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 5-seconds blocks (zoomed)

Starting with the highest target MSPerBlock setting of 5 seconds, the network displays a relatively stable deviation in block acceptance delay from the target value. The actual median value under the load level of 1K RPS is just 1.9% higher than the desired value whereas 95% of all blocks are accepted for less than 5194ms which is only 3.9% higher than the target value. Under the larger loads, dBFT accepts blocks with ~5.39s median delay which is 7.7% higher than the target value whereas only 75% of all blocks are accepted by less than 5.4s (with a delay of 8% higher than expected). Although these numbers are relatively good for the real network, note how they differ from the local setup results. Due to the network communication delays, we got ~90–400ms median delays which is a multiple of the ping interval presented above, which is an order higher than 20–60ms median delays for the local setup.

When it comes to TPS, for the worst load case under the 20K target RPS load, the network can successfully sustain only ~9.3K RPS and produces 9282 TPS on average which is the maximum value for this setup and selected MSPerBlock configuration. It should be noted that the theoretical maximum of this setup is 10K TPS since we use a standard mempool size of 50K with 5s blocks. Thus, the resulting values are pretty close to the theoretical maximum. Also note that no change of view happens, which is a desirable behaviour.

Real network setup, Phase II: summarized statistics for 5-seconds blocks

1-second blocks

Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 1-second blocks
Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 1-second blocks (zoomed)

Decreasing the target MSPerBlock value down to 1s gives an interesting side-effect: it’s visible that block acceptance delays are split between two levels that are ~40–50ms far from each other. These levels are closely related to the network delays and correspond to the consensus nodes grouped by their ping response time. For example, with a 1K RPS load, the block acceptance time averages 1090 ms if the dBFT round’s primary node belongs to the ‘closest’ group of consensus nodes (CNs) with a ping time of 0.02–0.4 ms. However, it extends to about 1130 ms if the primary node is in the ‘most remote’ group with a ping time of approximately 26ms.

1s-blocks setup shows robust proximity to the desired block time with a median of 1094 ms (9.4% higher than the desired value) and 95% of blocks falling within the range of 1164 ms delay (16% higher than the desired value) under 1K target RPS load. The median block time exceeds the desired value by 11.3%, 13.6%, 15.9%, and 19.5% for network loads of 8K, 12K, 16K, and 20K RPS, respectively.

The highest TPS achieved is approximately 15.4K under the 16K target load level, significantly below the maximum expected value. Additionally, the percentage of blocks delayed more than twice the MSPerBlock value increases from 2.6% at a 16K RPS load to 8.6% at a 20K RPS load.

Also, under the high load (16K and 20K RPS) it’s noticeable that at the end of the benchmark, there are a couple of spikes exceeding the target MSPerBlock value more than six times. The reason for these spikes is ChangeViews caused by the peers’ disconnection due to the lack of synchronization (too stale or too new consensus messages are considered as the reason for peer disconnection). This can be improved in newer NeoGo versions.

Real network setup, Phase II: summarized statistics for 1-second blocks

600-ms blocks

Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 600-ms blocks
Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 600-ms blocks (zoomed)

And now, we have reached the most intriguing part of our study: we’ve reduced the target MSPerBlock value to just 600ms. The overall trend remains consistent: the network handles increased pressure well, though block acceptance delays fluctuate more under higher loads. The maximum average TPS has increased up to 16764 compared to the previous benchmark. Also, note that the difference between the highest and lowest median block acceptance time within different load levels becomes almost insignificant, there’s a clear tendency for block time delay unification. Thus, under the load of 4K-12K, the median block time slightly grows from 688ms to 713ms exceeding the desired value of 14.5%-18.8%. However, under the high load of 20K RPS, the standard deviation value grows up to 681, and it’s clear that there are more fluctuations in the actual block time values. In the worst scenario, only 80% of all blocks are accepted within less than 760ms (27% slower than desired).

The notion of two block acceptance delay levels becomes clearer with the same ~40–50ms distance between the levels which is another confirmation of the theory about block time dependency from the network communication delays.

The number of ChangeViews, which indicates network adjustments during transaction processing, increases from 4.3% at a load of 16K RPS to 7.4% at a load of 20K RPS. These values are still acceptable for the real network to continue further experiments.

Real network setup, Phase II: summarized statistics for 600-ms blocks

300-ms blocks

Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 300-ms blocks
Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 300-ms blocks (zoomed)

Going far beyond the limit, we’ve further decreased the target MSPerBlock protocol configuration value down to 300ms. The median block acceptance delay varies from 380ms (~26.7% higher than desired) at min to 419ms (39.7% higher than desired) at max. Starting from a 4K RPS load, 95% of blocks are delayed within ~121 seconds from the target value (~40.3% slower than the desired value). The number of blocks accepted from the second attempt varies from 8.95% under 16K RPS load to the maximum of 17.6% under 20K RPS load level, which is quite a large value. The maximum average TPS value reached is 17516 TPS under the load of 20K RPS.

This setup is deemed the last practical one, as further reducing the MSPerBlock value leads to block acceptance delays that approach or exceed the configured block time itself, rendering it impractical. However, we’ve included two more benchmark results with lower MSPerBlock values in this article since there are a couple of noticeable patterns found.

Real network setup, Phase II: summarized statistics for 300-ms blocks

100-ms blocks

Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 100-ms blocks
Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 100-ms blocks (zoomed)

The 100ms-block setup is included in this post to demonstrate that dBFT can effectively handle small block intervals in a real network environment, adhering to the configured settings even under extremely time-restricted conditions that are comparable to network communication delays. The median block time under 4K-16K load (181–189ms) is still not even twice larger than the target value which means that almost all blocks were accepted without view changing, i.e. from the first attempt of CNs to agree. These values are not inspiring from the practical point of view (no one needs more than 80% delay from the target value for the real system), but at the same time, this experiment is an outstanding challenge for the dBFT algorithm that no one has ever tried before.

This experiment also shows the third level of block acceptance delay under a 4K RPS load. This level indicates the blocks accepted after ChangeView happened.

Real network setup, Phase II: summarized statistics for 100-ms blocks

60-ms blocks

Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 60-ms blocks
Real network setup, Phase II: MSPerBlock / Block Number for 4K, 8K, 12K, 16K, 20K RPS for 60-ms blocks (zoomed)

The last experiment in this series of benchmarks is aimed to prove that even dBFT has its performance limit. Once the MSPerBlock value approaches the network communication delays order, dBFT is not capable of accepting blocks without changing its view and increasing its timer for every block. Even though with targeted 60ms per block consensus nodes delays blocks (minimum block acceptance time is around 140ms which is twice as much as the target value), dBFT is still capable of block processing. It still shows robustness against the load pressure and reaches TPS values close to the target RPS load level. The maximum average TPS value reached in this setup was 15974 which is comparable with the previous results. Additionally, although each block requires at least one view change to be accepted, the overall block acceptance delay remains relatively consistent and adheres strictly to rules dictated by network delays — particularly evident at lower RPS loads.

Real network setup, Phase II: summarized statistics for 60-ms blocks

Real Network Benchmark Summary

Below is a summarized set of 3D scatter plots describing the dependency of time per block from the network load level. Each 3D plot corresponds to the fixed MSPerBlock protocol configuration setting. Each surface in the plot represents a single resulting average TPS value reached under the load rate ranging from 4K to 20K requests per second with a step of 4K RPS. The plot outline:

  • Proximity to the target time per block: The more linear close to the target value each series is across milliseconds per block axis, the more consistent the performance, indicating stable block acceptance time. A smooth, horizontal line corresponds to the optimal performance without significant block time deviations, which is an ideal case.
  • Proximity to the target RPS level: Series at varying heights represent actual TPS levels the network is reaching. The closer the TPS series is to the target load RPS level, the more robust the network is to the load level.

The most significant results that should be emphasized for the 7 consensus nodes of the external network setup are:

  • The network can carry the load of 20K target RPS showing the best average TPS of 17516. This value is noticeable since it’s shown on the real network and comparable with the synthetic results obtained during our previous benchmarks.
  • The “ideal” target MSPerBlock value the network can operate consistently without a high number of ChangeViews is 300 ms. This setup produces blocks with an not so high number of huge delay spikes (0–18% depending on the load rate). The resulting time per block is not so close to the desired value, but still acceptable (~26.7% slower than the target delay in the median for the best case and ~39.7% slower than the target delay in the median under the full load). This setup is the one that shows the best average TPS within the whole set of benchmarks.
  • Under the extremely time-constrained condition of a 60ms minimum target block interval, dBFT demonstrates robust performance. Despite accepting ChangeViews for all blocks, the network maintains a median block acceptance interval of approximately 139ms, indicating stable and acceptable behavior. The maximum average TPS achieved in this setup is 15974, approaching the optimal performance limits.

These results confirm that the dBFT protocol is extremely robust against the time-constrained network configuration in real distributed conditions with significant network communication delays.

Real network setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K-20K target RPS for 5-seconds blocks
Real network setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K-20K target RPS for 1-second blocks
Real network setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K-20K target RPS for 600-ms blocks
Real network setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K-20K target RPS for 300-ms blocks
Real network setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K-20K target RPS for 100-ms blocks
Real network setup, Phase II: Average TPS / MSPerBlock / Block Number for 1K-20K target RPS for 60-ms blocks

Conclusion

Through rigorous benchmark testing using the enhanced neo-bench tool, the NeoGo network demonstrated remarkable performance and stability, revealing critical insights into its capabilities and limits under extremely time-constrained conditions. We established that the real distributed network from 7 consensus nodes could sustain a throughput of up to ~17500 Transactions Per Second (TPS) in a clustered environment effectively outperforming the synthetic results from our previous post.

Key findings from the study underscore the network’s efficiency in handling transactions, even with significantly reduced MSPerBlock settings, demonstrating robustness under high load conditions and extremely low block time scenarios. Notably, the network maintained transaction processing efficiency with a minimum median block time of approximately 140 milliseconds in clustered settings and even faster in local configurations, demonstrating the system’s adaptability and throughput potential.

Finally, the most significant discovery from these experiments is that as the load on the network increases, there is a smooth and predictable degradation in performance. Despite the growing block acceptance delays and more frequent ChangeViews under high loads, the network still functions without sudden failures or unexpected spikes in block acceptance delays. This demonstrates the network’s high reliability and resilience.

The network can adequately respond to the load — it may experience some delays, but resulting performance parameters still fall within acceptable limits and do not lead to critical failures. The network is predictable and that’s exactly what we want from it.

--

--