Wise people learn when they can; fools learn when they must - Arthur Wellesley

Wednesday, 8 February 2017

CLUSTER BASICS


                      CLUSTER BASICS

HA……HA………HA……… JJJ ………… It’s not Ha..Ha..Ha LLL

In the world of sys admin the word “HA” is as common as our daily natural activities.                                   

What is this HA…?

HIGH AVAILABILITY………..


It is great to hear that our applications/services will never go down. The term “High Availability” is made up of two words, High and Availability.
First “Availability” came in picture, and then the word “High” came to fulfill the incompleteness of “Availability”.

Let’s understand the “Availability” first.

General meaning of Available is to be present always. Means that thing will be available for us all the times, 100% attendance without fail.

HOW……??

Dual NIC’s with Link aggregation for availability of NIC.
Two HDD’s with SVM for availability of DISKS.
DUAL PSU’s for PS availability.

From above examples it seems that the Availability is implemented in terms of “HARDWARE”.

It’s Great, but what about OS…

Hmmmmm…………

What if OS crashed/hung/panic…LL

Here comes the “HIGH” as savior. Now our “High Availability” is complete.

Availability can be measured relative to "100% operational" or "never failing." Have you ever heard someone refer to "five 9s"? This is a metric that refers to system availability by the number of "9s." One 9 is 90% uptime, two 9s are 99%, three 9s are 99.9%, etc. Four-9 uptime would mean that your system has less than one hour of downtime per year. Five 9s are currently considered the entry point for high availability solutions. "Five 9s (which represents 5.39 minutes of annual downtime) would be considered as high-availability.

So we can define “HA” as a group of systems (called CLUSTER) which survives any case of failure, whether it is “HARDWARE” or it is “SOFTWARE”.

The Cluster can be either “MANUAL” or “AUTOMATIC”.

Manual cluster needs human intervention in case anything went wrong on one system, whereas Automatic cluster starts automatically on secondary system in case any failure on primary.


A cluster is a set of NODE’s that communicate with each other and work toward a common goal.

Advantages of clustering

·         High performance
·         Large capacity
·         High availability
·         Incremental growth

CLUSTER TERMINOLOGY……

FAULT:

Fault is anything that may or may not hamper the normal behavior of system; generally faults are not conveyed / visible to end users. Like there is one PS failed, there are some soft/hard errors on HDD but system is running. These are faults.

FAILURE:

Failure can be caused by either Hardware or Software. It stops the systems operation and requires immediate attention for rectification. System becomes non-operational due to a failure and visible to all. Like both PS failed, hardware/software errors on disk increased and results in corruption of disk.

NODE:

A node is a single server within a cluster; we can have up to 16 nodes within a single cluster. All nodes within a cluster have the capacity to talk to each other with help of INTERCONNECT, when a node joins or leaves a cluster all other nodes are made aware. Nodes should be of a similar build (same CPU, Memory, etc), though it’s good to build cluster with similar build nodes but it’s not mandatory.

CLUSTER INTERCONNECT:

The interconnect should be a private network that all cluster nodes are connected to, the nodes communicate across this network sharing information about the cluster. The interconnect should have redundancy built in thus it should be able to survive network outages.

SWITCHOVER & FAILOVER:

It can be considered as “Manual” & “Automatic” over secondary. In both cases it is confirmed that the running services are about to leave their current host server.

SWITCHOVER:

We know what we are doing, means I want that the services should run from secondary node. The switchover is planned.

FAILOVER:

System/HA Software knows what he is doing; this is done in case any failure detected by HA Software. Completely un-planned.

FAILBACK:

The process when a failed server automatically recommences performing its former operations once it is online again.

There are two common clustering technologies:

1.  High-availability (HA) clustering: Always available, not even a single point of failure. In case one node failed, services automatically migrated to another node. Sometimes also referred as “Failover Cluster”. 
2.  Load-balance clustering: It is kind of team work, sharing each other’s load. Results by improving overall response of application. Load balancing provides better and higher performance of service.


Cluster configuration models:

If there is cluster of two nodes, then no issues but if the no of participating nodes are on higher side then they can be categorized in one of the following.

Active-Active: All participant nodes in active-active cluster are actively running the same kind of service simultaneously. The main purpose of an active-active cluster is to achieve load balancing if any load balancing solution is implemented, otherwise it reduces the fail over time.

Active-Passive / Asymmetric: As name suggest, it is like one active and one as standby for that, once the active went down the application switches to standby which will be active then.

N+1: Two or more nodes are required for this configuration, “N” is the no of active nodes in that cluster and “1” is the “standby/hot-standby” for all “N” no of nodes. If any node from “N” no of cluster failed the “1” will replace it and become active.

N+M: Two or more nodes are required for this configuration, enhancement of “N+1”. Here “N” is the no of active nodes and “M” is the no of “hot-standby” nodes. Generally used in case where cluster manages many services and multiple hot-standbys are needed for failover requirement.

N-to-1: Boss is Boss, means I (stand-by) will take your place if you (Primary/Active) are down, but I will not claim your place instead I will wait till your recovery & getting back to hold your position, and then back to my original place i.e. stand-by.

N-to-N: A combination of active/active and N+M clusters, N to N clusters redistribute the services, instances or connections from the failed node among the remaining active nodes, thus eliminating (as with active/active) the need for a 'standby' node, but introducing a need for extra capacity on all active nodes.

So what we got by all these…??

Service availability without downtime. How this can be achieved…??
By clustering only…??

NOOOO…

This can be achieved only if all cluster nodes are accessing same data simultaneously.

This leads to requirement of shared Storage devices.

The shared storage is accessed by all nodes of cluster, but only the Active is owner of that shared storage. Once Active down, it is claimed automatically by Passive/Stand-by nodes. Though the data is same hence there is no difference.

In a cluster environment almost everything which needs to ensure the service availability is referred as “RESOURCE”.

Hence “Resource” is a service which can be hardware or software entity managed by cluster application. Simply A resource is a service made highly available by a cluster.

* hardware or software entity: file systems, network interface cards (NIC), IP Addresses and applications.

Failover & Failback are based on these resources.

So, Does these resources are failover & failback…??

No…!!

The failover & failback performed on “RESOURCE GROUPS”.

Cluster resources are hold together in a cluster within a cluster resource group, or a cluster group. Cluster groups are the units of failover within the cluster. When a cluster resource fails and cannot be restarted automatically, the entire cluster group is taken offline and failed over to another available cluster node.

Good… But how the other node in a cluster knows about any kind of failure…??

HEARTBEAT:

Heartbeat network is a private network which is shared only by the cluster nodes, and is not accessible from outside the cluster. It is used by cluster nodes in order to monitor each node's status and communicate with each other.

A heartbeat provides cluster members with information on the exact status of any cluster member at any given time. It means that any node of the cluster knows the exact number of the nodes/participants in the cluster it is joined to and also knows which cluster members are active or online, in maintenance mode, offline.

Generally heartbeat set on completely different subnet network, so that system can identify between physical failure and network failure.
If a network fails, then this can cause a false-positive. That’s it is recommended to having a minimum of two cluster networks.

Well, next is

QUORUM:
Before proceeding towards quorum in terms of cluster, better we should know the simple definition of quorum.

How we can define Quorum…?

Search the internet, few are here as example…

“www.dictionary.com”
The number of members of a group or organization required to be present to transact business legally, usually a majority

“www.vocabulary.com”
A quorum is not necessarily a majority of members of a group, but the minimum needed in order to conduct business. For example, if two members of a group are absent, there can still be a quorum, meaning the meeting can go on without them.

A gathering of the minimal number of members of an organization to conduct business.

“www.thefreedictionary.com”
A minimum number of members in an assembly, society, board of directors, etc, required to be present before any valid business can be transacted.

“www.businessdictionary.com”
Fixed minimum number of eligible members or stockholders (shareholders) who must be present (physically or by proxy) at a meeting before any official business may be transacted or a decision taken therein becomes legally binding. Usually the articles of association or bylaws of a firm specify this number, otherwise the number prescribed in corporate legislation (such as company law) is followed.
I think these are enough to understand the meaning of Quorum.

The simplest example came in my mind is the our Society Meeting, whenever there is meeting organized about any Festival organization and throwing a dinner by society fund, all members of society are present with their valuable feedback,…………BUT…………………BUT………………………whenever there is any meeting called to raise the fund for any constructive or maintenance purpose there are merely 15 to 20% members available, and then Society chairman and Secretary start crying about fulfillment of “QUORUM” to agree upon and pass the resolution.

So now we can define quorum,

A minimum number of members in an assembly, society, board of directors, etc, required to be present before any valid business can be transacted.

Cluster means we have at least one operational node. How this goal will achieve…?

Quorum is the minimum number of cluster member votes required to perform a cluster operation. When a node failed in cluster a config change is required b'coz there is change in no. of nodes participating in cluster. The quorum tells the cluster which node is currently active and which node or nodes are in standby.

Means what...??

The resource groups are managed by cluster nodes and when a node failed, these resource groups should be migrated on other node, right...??

So who will decide this...??

Here comes the quorum, there is a voting done in between live nodes about the new node who take over the responsibility. This voting should be agreed between all live nodes.

Each member carries one vote and the cluster member votes are required to achieve a majority in order to reach quorum.

Confused...??

OK... Let's make it simpler.

Let’s assume two node clusters who don’t know what quorum is. There a problem occurred in network and both nodes isolated,
Now what...?
Though they are live, but in eyes of cluster application there is major problem.
Both nodes are working fine, so which node will hold the service...??
Node "A" ... Node "B" or both...??
Both nodes simultaneously cannot hold the service, and node "A" & "B" are unaware that both are live, so what they will do...??
They will declare themselves as Master and take the ownership of service.

What will happen then...??

Hence the cluster state will result in "SPLIT-BRAIN".
*we will learn it later.

How to avoid it...?

Quorum... right!!!

That is why we need quorum.

The other thing that the quorum does is to intervene when communications fail between nodes. Normally, each node within a cluster can communicate with every other node in the cluster over a dedicated network connection. If this network connection were to fail though, the cluster would be split into two pieces, each containing one or more functional nodes that cannot communicate with the nodes that exist on the other side of the communications failure.

When this type of communications failure occurs, the cluster is said to have been partitioned. The problem is that both partitions have the same goal; to keep the application running. The application can’t be run on multiple servers simultaneously though, so there must be a way of determining which partition gets to run the application. This is where the quorum comes in. The partition that “owns” the quorum is allowed to continue running the application. The other partition is removed from the cluster.

Let’s back to our two node cluster example, this time assume there is quorum. Each node has one vote, total two votes. Quorum needs more than half to operate.
What happen if one node went down…??
There is only one node remaining with his vote, and definitely it is note more than half. So what happen…??
There will be no “Rise of Fallen” happen. In this case an external vote is required. But who will give vote…??
A “quorum device”, what is this quorum device.

A quorum device is a shared storage device or quorum server that is shared by two or more nodes and that contributes votes that are used to establish a quorum. The cluster can operate only when a quorum of votes is available. The quorum device is used when a cluster becomes partitioned into separate sets of nodes to establish which set of nodes constitutes the new cluster.

We know that what is “PARTITIONED” OR “PARTITIONED IN TO SUB-CLUSTER” state.

When a cluster stuck in “PARTITIONED” state, then "SPLIT-BRAIN" happens.

SPLIT-BRAIN:

Split brain occurs when the cluster interconnect between nodes is lost and the cluster becomes partitioned into sub clusters or in two sides. There is no communication between them so each side/partition believes that other is dead and tries to get ownership of resources.

How to avoid such condition…??

FENCING:

Fencing is the process of isolating a node of a computer cluster or protecting shared resources when a node appears to be malfunctioning. As the number of nodes in a cluster increases, so possibility also increases that one of them may fail at some point.
Fencing is the component of cluster that cuts off access to a resource (hard disk, etc.) from a node in cluster if it loses contact with the rest of the nodes in the cluster.

There are two kinds of fencing: Resource level and Node level.

Using the resource level fencing the cluster can make sure that a node cannot access one or more resources. One typical example is a SAN, where a fencing operation changes rules on a SAN switch to deny access from a node.

The Resource level fencing may be achieved using normal resources on which the resource we want to protect would depend. Such a resource would simply refuse to start on this node and therefore resources which depend on it will be not runnable on the same node as well.

The Node level fencing makes sure that a node does not run any resources at all. This is usually done in a very simple, yet brutal way: the node is simply reset using a power switch. This may ultimately be necessary because the node may not be responsive at all.

FENCING RACE / FENCING WAR:

Considering two-node clusters, when connection between the two nodes is broken, both nodes will follow the same procedure: "Because I'm still alive, the other node must have failed, either partially or completely. I must fence it to make sure it cannot later spontaneously recover and corrupt the disks I'm writing to." Both nodes will attempt to fence each other: if the fencing is by an external power switch, the switch should accept only one connection at a time, and therefore only one node can succeed in fencing the other. (This is called "fencing race".


References:



No comments:

Post a Comment