Working with distributed message log

Authors:
Alexey Vanin, alexey@nspcc.ru
Sergei Liubich, sergei@nspcc.ru

This article describes the ways of constructing distributed message log in a decentralized system. We are still under heavy development, so more details and numeral metrics will be provided later.

Distributed Log Usage

This external entity can send multiple requests to each node in the system separately in order to place responses into an ordered log of events.

Another approach is that nodes are engaged independently with each other, therefore, they maintain a consistent log of events. The entity needs to send a request only to one node in the system.

Figure 1. Different approaches for obtaining a distributed log

The first approach is simple to implement. However, the effectiveness of this approach decreases with an increasing number of polled nodes. High frequency of consistent log queries also affects the performance. It is not scalable enough, therefore, we decided to look closer at the second approach to obtain a consistent event log.

From Raft to Reliable Broadcast

Raft log

Figure 2. Replicated state machine architecture in the Raft protocol [1]

However, during the development process, the initial issue began to change. With the system growth, issues and tasks are distributed over the nodes. Therefore, nodes of the system are divided into subsets (groups), which are responsible for their own small range of tasks to be performed. Now external entities require not the state of the whole system, but only the state of certain subsets. Each node may consist of one or more subsets.

Figure 3. Multiple log groups in the nodes

Multi Raft

During the testing, it was found out that this approach works remarkably with a small number of simultaneously working groups (from 1000 to 10 000). However, the design of our solution involves a large number of groups (up to 1 000 000).

Reliable Broadcast

Figure 4. Reliable broadcast messages

Multi Reliable Broadcast

The final implementation is still in progress. First tests have shown that Multi Reliable Broadcast can handle up to 1 000 000 concurrent groups in one instance. Multi Reliable Broadcast has better control on CPU (one thread may handle messages from several groups) and memory resources: for 10 000 groups, Multi Reliable Broadcast allocated 130 MiB while our implementation of Multi Raft allocated around 600 MiB.

References

[2] S.Turukin What is a Multiraft?

[3] L.Lamport Time, Clocks, and the Ordering of Events in a Distributed System

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store