VERITAS
CLUSTER-2
VCS COMPONENTS:
VCS COMMUNICATION:
VCS supports up to 64 nodes cluster, let’s
say there is requirement to prepare a 5 node cluster.
No problem, we can prepare it. There are
some prerequisites which need to follow and some configuration, that’s it.
As we all know that there is a public IP
which is used by application and end user to maintain and connect. But consider
the basic need of cluster, it is 100% availability. This is achieved by
migrating SG’s to working node in case failure of current master.
Well… but how the cluster nodes will know
about failure of master…??
There should be a mechanism by which all the
nodes within a cluster must talk to each other continuously, so that there
status should reflect all the times for all cluster nodes.
This is possible by connecting nodes via a
private channel, hidden from outer world. Strictly for private communication
between nodes.
This communication is called,
“CLUSTER-INTERCONNECT” /
“PRIVATE-INTERCONNECT” / “CLUSTER-HEARTBEAT” / PRIVATE-HEARTBEAT”
Heartbeats are a communication mechanism for nodes to exchange
information concerning hardware and software status, keep track of cluster
membership, and keep this information synchronized across all cluster nodes.
2 NIC’s should be connected to all nodes in
a cluster to avoid any failure. First cable from NIC<n> of all nodes
brought together and second cable from NIC<n> of all nodes brought
together.
Private Network is strictly for internal communication, no
application traffic over it.
Public Network is for application/end-user communication.
One more thing is there while communication
between nodes.
How the nodes will identify that to which
node they are talking…??
Again for this purpose there should be a
unique identification unit. To identify nodes within a cluster all nodes having
unique CLUSTER-ID,
this identifies all of them as unique unit.
As we know that the “HEARTBEAT” decides the status of nodes within a cluster, by
sending and receiving signals on the cluster interconnects. By this heartbeat
status, VCS determines which nodes are active members of cluster and which
nodes are leaving/joining the cluster.
VCS DAEMONS:
·
High availability
daemon (HAD)
·
HostMonitor daemon
·
Group Membership
Services/Atomic Broadcast (GAB)
·
Low Latency
Transport (LLT)
Hashadow
·
I/O fencing module
H
CLUSTER/HEARTBEAT COMMUNICATION is
maintained by:-
LOW LATENCY
TRANSPORT–(LTT) PROTOCOL
GROUP MEMBERSHIP
SERVICE/ATOMIC BROADCAST-(GAB) MECHANISM
LOW LATENCY TRANSPORT–(LTT):
VCS uses private network communications
between cluster nodes for cluster maintenance. VCS recommends two independent
networks between all cluster nodes. These networks provide the required
redundancy in the communication path and enable VCS to discriminate between a
network failure and a system failure.
LTT is layer 2 protocol which is responsible
to send HEARTBEAT messages across the links every half seconds. This is how all
nodes in the cluster would know about each other.
LLT runs directly on top of the data link
provider interface (DLPI) layer over Ethernet, no TCP/IP.
For Private Networks it is not required to
plumb the interface and assign IP to it, hidden from “ifconfig” unless router
introduced.
Traffic-Distribution:-
LLT distributes (load balances) internode
communication across all available private network links. This distribution
means that all cluster communications are evenly distributed across all private
network links (maximum eight) for performance and fault resilience. If a link
fails, traffic is redirected to the remaining links.
Heartbeat:-
LLT is responsible for sending and receiving
heartbeat traffic over network links. This heartbeat is used by the Group
Membership Services of GAB to determine cluster membership.
GROUP MEMBERSHIP SERVICE/ATOMIC BROADCAST-(GAB):
The Group Membership Services/Atomic
Broadcast protocol (GAB) is responsible for cluster membership and cluster
communications.
GAB is a broadcast protocol that uses LTT as
its transport mechanism.
Cluster Communications:--
GAB’s ensures reliable cluster
communications. GAB provides guaranteed delivery of point-to-point and
broadcast messages to all nodes.
The system administrator configures GAB
driver by creating a configuration file (/etc/gabtab).
If there is two node cluster then two GAB’s
are communicating each other.
GAB tells LTT to start heartbeat between
nodes. Again GAB tells the LTT that they are communicating, they are
communication means two GAB’s are communicating. If there is any communication
required between GAB’s then the message given to LTT and LTT will
transfer/convey the message between GAB’s over Ethernet cable.
Cluster Membership:--
GAB maintains cluster formation (also
referred as SEEDING the cluster) by receiving input on the status of the
heartbeat from each node via LLT. When a system no longer receives heartbeats
from a peer, GAB marks the peer as DOWN and excludes the peer from the cluster.
ATOMIC BROADCAST:
Cluster configuration and status information
is distributed dynamically to all system within the cluster using GAB's Atomic
Broadcast feature. Atomic Broadcast ensures all active system receive all
messages, for every resource and service group in the cluster. Atomic means
that all system receives the update, if one fails then the change is rolled back
on all systems.
I/O FENCING:
Responsible to prevent Split-Brain. Let’s consider a two node cluster working fine,
after some time the heartbeat stopped. Heartbeat is stopped so both nodes
thinks that other is down and both will try to take ownership of same
application, if there is no mechanism to prevent this situation then the
application will up on both the system and start accessing the storage
simultaneously. This will surely results in DB/Data corruption. This is called “SPLIT-BRAIN”.
To prevent this situation I/O FENCING is
implemented, the fencing driver implements I/O Fencing to prevent multiple
systems from accessing same shared storage.
By I/O Fencing implemented, the losing
system is forced to PANIC & REBOOT, the winning system is now the only
member of the cluster and it fences off the shared data disks, so that only
active member can access the shared storage.
Hence the active member takes the ownership
and brings SG’s online on itself.
Communication between the various components
of VCS is managed by the high-availability daemon, also known as
"had." "Had" exchanges information between the user space
components (e.g., resource agents and CLI tools) and the kernel space
components (LLT and GAB). Working alongside "had" is a process called
"hashadow", whose job it is to monitor the "had" process
and restart it if necessary.
HIGH AVAILABILITY DAEMON (HAD):
HAD is the VCS engine which manages agents
and tracks all configuration and state changes.
·
Building and running
cluster config from config files.
·
Distributing the
info when new node joins the cluster.
·
Taking corrective
action when something fails.
·
HAD is primary VCS
process running on each cluster node.
·
Maintains resource
config and state info.
·
Manages agents and
service groups.
·
HAD is monitored by
“hashadow” daemon.
HAD tracks all changes in the cluster config
and resource status by communicating with GAB, HAD manages all application
service (by help of Agents) whether the cluster is of two or many nodes.
Agents manage resources and HAD acts as
Manager to manage these Agents. Hence resources are managed by HAD with help of
Agents. On each active cluster node, HAD updates all other nodes with changes
to the config or status.
HAD maintains the cluster state information.
HAD uses the main.cf file to build the cluster information in memory and is
also responsible for updating the configuration in memory.
HAD operates as a Replicated State Machine (RSM). This means HAD running on each node has a
completely synchronized view of the resource status on each node. The RSM is
maintained through the use of a purpose-built communications package consisting
of the protocols Low Latency Transport (LLT) and Group Membership
Services/Atomic Broadcast (GAB).
What if HAD failed…?
To ensure that HAD is highly available,
another Daemon “HASHADOW” is deployed to monitor HAD. If HAD fails the
“hashadow” attempts to restart HAD and vice versa, like wise if hashadow daemon
dies HAD will restart it.
How HAD maintains the
Cluster Configuration…??
HAD maintains the cluster config/state in
memory of each node, config changes are broadcast to all odes by HAD and the
config is saved on disk as file “main.cf”.
Cluster state maintenance/monitoring means
tracking the status of “resources” and “service groups” in the cluster. When
there is any change on any node the HAD of that node sends a message to HAD on
each node to ensure that each node has an identical view of cluster.
Atomic means all nodes receive updates or
all nodes roll back to previous state.
The cluster config in the memory is created
by “main.cf” file on disk. HAD is responsible to read the file and load the
cluster config to memory, no running HAD means no config in memory.
HOST MONITOR DAEMON:
VCS also starts HostMonitor daemon when the
VCS engine comes up. The VCS engine creates a VCS resource VCShm of type
HostMonitor and a VCShmg service group. The VCS engine does not add these
objects to the main.cf file. Do not modify or delete these components of VCS. VCS
uses the HostMonitor daemon to monitor the resource utilization of CPU and
Swap. VCS reports to the engine log if the resources cross the threshold limits
that are defined for the resources.
No comments:
Post a Comment