Wise people learn when they can; fools learn when they must - Arthur Wellesley

Saturday 11 February 2017

VERITAS CLUSTER-2 (BASICS)



VERITAS CLUSTER-2

VCS COMPONENTS:





VCS COMMUNICATION:

VCS supports up to 64 nodes cluster, let’s say there is requirement to prepare a 5 node cluster.
No problem, we can prepare it. There are some prerequisites which need to follow and some configuration, that’s it.

As we all know that there is a public IP which is used by application and end user to maintain and connect. But consider the basic need of cluster, it is 100% availability. This is achieved by migrating SG’s to working node in case failure of current master.

Well… but how the cluster nodes will know about failure of master…??
There should be a mechanism by which all the nodes within a cluster must talk to each other continuously, so that there status should reflect all the times for all cluster nodes.

This is possible by connecting nodes via a private channel, hidden from outer world. Strictly for private communication between nodes.

This communication is called,
“CLUSTER-INTERCONNECT” / “PRIVATE-INTERCONNECT” / “CLUSTER-HEARTBEAT” / PRIVATE-HEARTBEAT”

Heartbeats are a communication mechanism for nodes to exchange information concerning hardware and software status, keep track of cluster membership, and keep this information synchronized across all cluster nodes.

2 NIC’s should be connected to all nodes in a cluster to avoid any failure. First cable from NIC<n> of all nodes brought together and second cable from NIC<n> of all nodes brought together.

Private Network is strictly for internal communication, no application traffic over it.
Public Network is for application/end-user communication.

One more thing is there while communication between nodes.
How the nodes will identify that to which node they are talking…??

Again for this purpose there should be a unique identification unit. To identify nodes within a cluster all nodes having unique CLUSTER-ID, this identifies all of them as unique unit.

As we know that the “HEARTBEAT” decides the status of nodes within a cluster, by sending and receiving signals on the cluster interconnects. By this heartbeat status, VCS determines which nodes are active members of cluster and which nodes are leaving/joining the cluster.

VCS DAEMONS:

·         High availability daemon (HAD)
·         HostMonitor daemon
·         Group Membership Services/Atomic Broadcast (GAB)
·         Low Latency Transport (LLT)
   Hashadow
·         I/O fencing module
H


CLUSTER/HEARTBEAT COMMUNICATION is maintained by:-

LOW LATENCY TRANSPORT–(LTT) PROTOCOL
GROUP MEMBERSHIP SERVICE/ATOMIC BROADCAST-(GAB) MECHANISM


LOW LATENCY TRANSPORT–(LTT):

VCS uses private network communications between cluster nodes for cluster maintenance. VCS recommends two independent networks between all cluster nodes. These networks provide the required redundancy in the communication path and enable VCS to discriminate between a network failure and a system failure.
LTT is layer 2 protocol which is responsible to send HEARTBEAT messages across the links every half seconds. This is how all nodes in the cluster would know about each other.

LLT runs directly on top of the data link provider interface (DLPI) layer over Ethernet, no TCP/IP.
For Private Networks it is not required to plumb the interface and assign IP to it, hidden from “ifconfig” unless router introduced.

Traffic-Distribution:-
LLT distributes (load balances) internode communication across all available private network links. This distribution means that all cluster communications are evenly distributed across all private network links (maximum eight) for performance and fault resilience. If a link fails, traffic is redirected to the remaining links.

Heartbeat:-
LLT is responsible for sending and receiving heartbeat traffic over network links. This heartbeat is used by the Group Membership Services of GAB to determine cluster membership.

GROUP MEMBERSHIP SERVICE/ATOMIC BROADCAST-(GAB):

The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster membership and cluster communications.
GAB is a broadcast protocol that uses LTT as its transport mechanism.

Cluster Communications:--
GAB’s ensures reliable cluster communications. GAB provides guaranteed delivery of point-to-point and broadcast messages to all nodes.
The system administrator configures GAB driver by creating a configuration file (/etc/gabtab).

If there is two node cluster then two GAB’s are communicating each other.
GAB tells LTT to start heartbeat between nodes. Again GAB tells the LTT that they are communicating, they are communication means two GAB’s are communicating. If there is any communication required between GAB’s then the message given to LTT and LTT will transfer/convey the message between GAB’s over Ethernet cable.

Cluster Membership:--
GAB maintains cluster formation (also referred as SEEDING the cluster) by receiving input on the status of the heartbeat from each node via LLT. When a system no longer receives heartbeats from a peer, GAB marks the peer as DOWN and excludes the peer from the cluster.

ATOMIC BROADCAST:
Cluster configuration and status information is distributed dynamically to all system within the cluster using GAB's Atomic Broadcast feature. Atomic Broadcast ensures all active system receive all messages, for every resource and service group in the cluster. Atomic means that all system receives the update, if one fails then the change is rolled back on all systems.

I/O FENCING:

Responsible to prevent Split-Brain. Let’s consider a two node cluster working fine, after some time the heartbeat stopped. Heartbeat is stopped so both nodes thinks that other is down and both will try to take ownership of same application, if there is no mechanism to prevent this situation then the application will up on both the system and start accessing the storage simultaneously. This will surely results in DB/Data corruption. This is called “SPLIT-BRAIN”.

To prevent this situation I/O FENCING is implemented, the fencing driver implements I/O Fencing to prevent multiple systems from accessing same shared storage.

By I/O Fencing implemented, the losing system is forced to PANIC & REBOOT, the winning system is now the only member of the cluster and it fences off the shared data disks, so that only active member can access the shared storage.
Hence the active member takes the ownership and brings SG’s online on itself.


Communication between the various components of VCS is managed by the high-availability daemon, also known as "had." "Had" exchanges information between the user space components (e.g., resource agents and CLI tools) and the kernel space components (LLT and GAB). Working alongside "had" is a process called "hashadow", whose job it is to monitor the "had" process and restart it if necessary.

HIGH AVAILABILITY DAEMON (HAD):

HAD is the VCS engine which manages agents and tracks all configuration and state changes.

·         Building and running cluster config from config files.
·         Distributing the info when new node joins the cluster.
·         Taking corrective action when something fails.
·         HAD is primary VCS process running on each cluster node.
·         Maintains resource config and state info.
·         Manages agents and service groups.
·         HAD is monitored by “hashadow” daemon.

HAD tracks all changes in the cluster config and resource status by communicating with GAB, HAD manages all application service (by help of Agents) whether the cluster is of two or many nodes.
Agents manage resources and HAD acts as Manager to manage these Agents. Hence resources are managed by HAD with help of Agents. On each active cluster node, HAD updates all other nodes with changes to the config or status.

HAD maintains the cluster state information. HAD uses the main.cf file to build the cluster information in memory and is also responsible for updating the configuration in memory.

HAD operates as a Replicated State Machine (RSM). This means HAD running on each node has a completely synchronized view of the resource status on each node. The RSM is maintained through the use of a purpose-built communications package consisting of the protocols Low Latency Transport (LLT) and Group Membership Services/Atomic Broadcast (GAB).

What if HAD failed…?

To ensure that HAD is highly available, another Daemon “HASHADOW” is deployed to monitor HAD. If HAD fails the “hashadow” attempts to restart HAD and vice versa, like wise if hashadow daemon dies HAD will restart it.

How HAD maintains the Cluster Configuration…??

HAD maintains the cluster config/state in memory of each node, config changes are broadcast to all odes by HAD and the config is saved on disk as file “main.cf”.

Cluster state maintenance/monitoring means tracking the status of “resources” and “service groups” in the cluster. When there is any change on any node the HAD of that node sends a message to HAD on each node to ensure that each node has an identical view of cluster.

Atomic means all nodes receive updates or all nodes roll back to previous state.

The cluster config in the memory is created by “main.cf” file on disk. HAD is responsible to read the file and load the cluster config to memory, no running HAD means no config in memory.

HOST MONITOR DAEMON:

VCS also starts HostMonitor daemon when the VCS engine comes up. The VCS engine creates a VCS resource VCShm of type HostMonitor and a VCShmg service group. The VCS engine does not add these objects to the main.cf file. Do not modify or delete these components of VCS. VCS uses the HostMonitor daemon to monitor the resource utilization of CPU and Swap. VCS reports to the engine log if the resources cross the threshold limits that are defined for the resources.

No comments:

Post a Comment