Intro to design systems

I have worked in the design area for 10 years now and my experience has taught me that good design is born out of need. My focus for quite some time has been software design and in particular design…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Want to learn how Viewstamped Replication works? Read this summary.

It presents an updated explanation of Viewstamped Replication, a replication technique that handles failures in which nodes crash. It describes how client requests are handled, how the group reorganizes when a replica fails, and how a failed replica is able to rejoin the group.

The Viewstamped Replication protocol, referred to as VR, is used for replicated services that run on many nodes known as replicas. VR uses state machine replication: it maintains state and makes it accessible to the clients consuming that service.

Some features of VR:

We can provide a simple proof for the above statement: in a system with f crashed nodes, we need at least the majority of f+1 nodes that can mutually agree to keep the system functioning.

A group of f+1 replicas is often known as a quorum. The protocol needs the quorum intersection property to be true to work correctly. This property states that:

VR architecture

The architecture of VR is as follows:

If all the replicas should end in the same state, it is important that the above condition is met.

VR deals with the replicas as follows:

Primary: Decides the order in which the operations will be executed

Secondary: Carries out the operations in the same order as selected by the primary

What if the primary fails?

We consider the following three scenarios of the VR protocol:

The state maintained by each replica is presented in the figure above. Some points to note:

The client side proxy also maintains some state:

The normal operation of VR can be broken down into the following steps:

Usually the PREPARE message is used to inform the backup replicas of the committed operations. It can also do so by sending a COMMIT message.

To execute a request, a backup has to make sure that the operation is present in its log and that all the previous operations have been executed. Then it executes the said operation, increments its commit-number, and updates the client’s entry in the client-table. But it doesn’t send a reply to the client, as the primary has already done that.

There is no leader election in this protocol. The primary is selected in a round robin fashion. Each member has a unique IP address. The next primary is the backup replica with the smallest IP that is functioning. Each number in the group is already aware of who is expected to be the next primary.

Every executed operation at the replicas must survive the view change in the order specified when it was executed. The up-call is carried out at the primary only after it receives f PREPAREOK messages. Thus the operation has been recorded in the logs of at least f+1 replicas (the old primary and f replicas).

To make the view change operation more efficient, the paper describes the following approach:

The replica should not “forget” anything it has already done. One way to ensure this is to persist the state on disk — but this will slow down the whole system. This isn’t necessary in VR because the state is persisted at other replicas. It can be obtained by using a recovery protocol provided that the replicas are failure independent.

The recovery protocol is as follows:

Reconfiguration deals with epochs. The epoch represents the group of replicas processing client requests. If the threshold for failures, f, is adjusted, the system can either add or remove replicas and transition to a new epoch. It keeps track of epochs through the epoch-number.

Another status, namely transitioning, is used to signify that a system is moving between epochs.

The VR sub protocols need to be modified to deal with epochs. A replica doesn’t accept messages from an older epoch compared to what it knows, such as those with an older epoch-number. It informs the sender about the new epoch.

During a view-change, the primary cannot accept client requests when the system is transitioning between epochs. It does this by checking if the topmost request in its log is a RECONFIGURATION request. A recovering replica in an older epoch is informed of the epoch if it is part of the new epoch or if it shuts down.

The issue that comes to mind is that the client requests can’t be served while the system is moving to a new epoch.

This can be dealt with by “warming up” the nodes before reconfiguration happens. The nodes can be brought up-to-date using state transfer while the old group continues to reply to client requests. This reduces the delay caused during reconfiguration.

This paper has presented an improved version of Viewstamped Replication, a protocol used to build replicated systems that are able to tolerate crash failures. The protocol does not require any disk writes as client requests are processed or even during view changes, yet it allows nodes to recover from failures and rejoin the group.

The paper also presents a protocol to allow for reconfigurations that change the members of the replica group, and even the failure threshold. A reconfiguration technique is necessary for the protocol to be deployed in practice since the systems of interest are typically long lived.

If you enjoyed this essay, please hit the clap button so more people see it. Thank you!

Add a comment

Related posts:

What the Heck is Zero Trust and Why Should You Care?

Cybersecurity threats are changing at a rapid pace, but security frameworks have not changed in a long time. By “frameworks” we mean how we look at a particular problem and, correspondingly, how we…

Vibration

heaven is the red rocks against the blue sky with the music in my hair. “Vibration” is published by Ali Wyles in little love letters.