Working with distributed message log

Distributed Log Usage

Communication is an important component of any distributed system. During the development and design of our storage system, we were researching how nodes of a distributed system can log certain events. External entity requires a consistent log of all these events. This allows to evaluate the entire system state.

Figure 1. Different approaches for obtaining a distributed log

From Raft to Reliable Broadcast

Raft log

Maintaining a consistent event log is an important part of consensus algorithms. In the early stages of the storage system design, we chose Raft as the consensus algorithm. It is much simpler than Paxos and it has a consistent event log in the core.

Figure 2. Replicated state machine architecture in the Raft protocol [1]
Figure 3. Multiple log groups in the nodes

Multi Raft

There are many implementations of the Raft protocol. Some of them introduce new features to the original one. One of these is Multi Raft groups [2] support. Groups independently handle the work of the entire protocol: they choose a leader, send heartbeats, etc. Multi Raft supports heartbeat coalescing which means that the heartbeat will be sent only once between 2 nodes even if they share multiple groups. We took cockroachdb’s implementation of the Multi Raft protocol as a reference and tried to evaluate the capabilities of this solution for our storage system.

Reliable Broadcast

In the search of a lightweight solution, we turned our attention to the Reliable Broadcast protocol. In particular, it is used in the Honey Badger BFT protocol implementation, which we use for consensus tasks. Despite the fact that the protocol describes only the message delivery method, we can build a consistent log on the top of it. Messages are accompanied by the Lamport timestamps [3] in our solution, so we can restore their order.

Figure 4. Reliable broadcast messages

Multi Reliable Broadcast

The implementation of Reliable Broadcast that we are using is presented in the form of a finite state machine, which produces three types of messages (Fig. 4). One finite state machine handles a single broadcast group. In order to maintain several broadcast groups, we expanded implementation with the support of multiple groups in one broadcast instance. The proposed protocol has been called Multi Reliable Broadcast.

References

[1] D.Ongaro, J.Ousterhout In Search of an Understandable Consensus Algorithm

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store