Wise people learn when they can; fools learn when they must - Arthur Wellesley

Sunday, 19 March 2017

VCS ON RHEL6–CONFIGURE & OPERATION-P3


VCS ON RHEL6–CONFIGURE & OPERATION-P3

BASIC VCS OPERATION:

How to check the cluster status?
How to check the vcs logs?
How to check the resource and attributes?
How to check the service group and contained resources?
How to check resources & attributes?
how do I know that how many service groups are in cluster?
how do I know that how many resources a SG have?
How to know the details about a particular resource…?
CRITICAL & NON CRITICAL RESOURCES:
How to know the state (online/offline) of resources…?
How to know the virtual ip address configured with SG…?
How to know the SG is frozen or not…?
How to know that Autostart is set for an SG is or not…?


BASIC VCS OPERATION:

How to check the cluster status?
[root@pr01 ~]# hastatus
[root@pr01 ~]# hastatus –sum

How to check the vcs logs?
[root@pr01 ~]# tail -f /var/VRTSvcs/log/engine_A.log
[root@pr01 ~]# hamsg –list
[root@pr01 ~]# hamsg Apache_A

How to check the resource and attributes?
[root@pr01 ~]# hares -list
[root@pr01 ~]# hares -list -clus cluster1 |grep -i pr01

How to check the service group and contained resources?
[root@pr01 ~]# hagrp -list
[root@pr01 ~]# hagrp -resources NFS_APP1

It shows the status of cluster objects.

[root@pr01 ~]# hastatus
attempting to connect....
attempting to connect....connected


group           resource             system               message
--------------- -------------------- -------------------- --------------------
                                     pr01                 RUNNING
                                     dr01                 RUNNING
NFS_APP1                             pr01                 ONLINE
NFS_APP1                             dr01                 OFFLINE
-------------------------------------------------------------------------
Web-App                              pr01                 STOPPING PARTIAL
Web-App                              dr01                 OFFLINE
                NFS_DG1              pr01                 ONLINE
                NFS_DG1              dr01                 OFFLINE
                NFS_IP1              pr01                 ONLINE
-------------------------------------------------------------------------
                NFS_IP1              dr01                 OFFLINE
                NFS_MOUNT1           pr01                 ONLINE
                NFS_MOUNT1           dr01                 OFFLINE
                NFS_SERVICE1         pr01                 ONLINE
                NFS_SERVICE1         dr01                 OFFLINE
-------------------------------------------------------------------------
                NFS_RESTART1         pr01                 ONLINE
                NFS_RESTART1         dr01                 OFFLINE
                NFS_NIC1             pr01                 ONLINE
                NFS_NIC1             dr01                 ONLINE
                NFS_SHARE1           pr01                 ONLINE
-------------------------------------------------------------------------
                NFS_SHARE1           dr01                 OFFLINE
                NFS_VOLUME1          pr01                 ONLINE
                NFS_VOLUME1          dr01                 OFFLINE
                Web_Res              pr01                 OFFLINE
                Web_Res              dr01                 OFFLINE
-------------------------------------------------------------------------
                DG_Res               pr01                 ONLINE
                DG_Res               dr01                 OFFLINE
                Service_IP           pr01                 ONLINE
                Service_IP           dr01                 OFFLINE
                Mount_Res            pr01                 ONLINE
-------------------------------------------------------------------------
                Mount_Res            dr01                 OFFLINE
                Nic_Res              pr01                 ONLINE
                Nic_Res              dr01                 ONLINE
                Volume_Res           pr01                 ONLINE
                Volume_Res           dr01                 OFFLINE
^C

The o/p is not interesting and it is haphazard also.
Better to get a summary report.

[root@pr01 ~]# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen

A  dr01                 RUNNING              0
A  pr01                 RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  NFS_APP1        dr01                 Y          N               OFFLINE
B  NFS_APP1        pr01                 Y          N               ONLINE
B  Web-App         dr01                 Y          N               OFFLINE
B  Web-App         pr01                 Y          N               STOPPING|PARTIAL

-- RESOURCES ONLINING
-- Group           Type            Resource             System               IState

F  Web-App         Apache          Web_Res              pr01                 W_ONLINE_REVERSE_PROPAGATE


We can see that there is problem with one service group “Web-App”, the resource “Web_Res” having problem. Check the State and IState.

But what is problem with that?

[root@pr01 ~]# tail -f /var/VRTSvcs/log/engine_A.log
VCS_LOG_SCRIPT_NAME=monitor
VCS_LOG_CATEGORY=10061
VCS_LOG_RESOURCE_NAME=Web_Res
VCSONE_LOG_RESOURCE_NAME=Web_Res
VCSONE_LOG_CATEGORY=10061]
2017/03/08 18:56:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd filesystem not mounted, returning offline
2017/03/08 18:57:03 VCS INFO V-16-1-50135 User admin fired command: hagrp -switch Web-App  dr01  localclus  from ::ffff:192.168.234.1
2017/03/08 18:57:03 VCS NOTICE V-16-1-10208 Initiating switch of group Web-App from system pr01 to system dr01
2017/03/08 19:01:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd filesystem not mounted, returning offline
2017/03/08 19:06:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd filesystem not mounted, returning offline

[root@pr01 ~]# hamsg -list
#Log data files
Apache_A
NIC_A
HostMonitor_A
DiskGroup_A
CmdSlave-log32661.log_A
Volume_A
NFSRestart_A
CmdSlave-log32638.log_A
CmdSlave-log32634.log_A
CmdSlave-log32649.log_A
NFS_A
IP_A
Share_A
CmdSlave-log32657.log_A
CmdSlave-log32645.log_A
CmdSlave-log32653.log_A
engine_A
CmdSlave-log32641.log_A
Mount_A
CmdServer-log_A
hashadow-err_A
CmdSlave-log32665.log_A

[root@pr01 ~]# hamsg Volume_A
Sat 04 Mar 2017 10:38:41 AM IST VCS ERROR V-16-2-13064 Agent is calling clean for resource(Volume_Res) because the resource is up even after offline completed.
Sat 04 Mar 2017 10:38:42 AM IST VCS ERROR V-16-2-13069 Resource(Volume_Res) - clean failed.
Sat 04 Mar 2017 10:39:42 AM IST VCS ERROR V-16-2-13077 Agent is unable to offline resource(Volume_Res). Administrative intervention may be required.
Sat 04 Mar 2017 11:11:19 AM IST VCS ERROR V-16-2-13078 Resource(Volume_Res) - clean completed successfully after 33 failed attempts.
Sun 05 Mar 2017 07:37:46 PM IST VCS ERROR V-16-2-13064 Agent is calling clean for resource(NFS_VOL) because the resource is up even after offline completed.
Sun 05 Mar 2017 07:37:49 PM IST VCS ERROR V-16-2-13068 Resource(NFS_VOL) - clean completed successfully.

In the o/p we can see what is happening and act accordingly.

Good, there are two things. 1st is “service group” and 2nd is “resources & attributes” for that service group.

How to check resources & attributes?

[root@pr01 ~]# hares -list
DG_Res                  dr01
DG_Res                  pr01
Mount_Res               dr01
Mount_Res               pr01
NFS_DG1                 dr01
NFS_DG1                 pr01
NFS_IP1                 dr01
NFS_IP1                 pr01
NFS_MOUNT1              dr01
NFS_MOUNT1              pr01
NFS_NIC1                dr01
NFS_NIC1                pr01
NFS_RESTART1            dr01
NFS_RESTART1            pr01
NFS_SERVICE1            dr01
NFS_SERVICE1            pr01
NFS_SHARE1              dr01
NFS_SHARE1              pr01
NFS_VOLUME1             dr01
NFS_VOLUME1             pr01
Nic_Res                 dr01
Nic_Res                 pr01
Service_IP              dr01
Service_IP              pr01
Volume_Res              dr01
Volume_Res              pr01
Web_Res                 dr01
Web_Res                 pr01

Same thing is appearing twice…L

And what if I have many clusters…?

[root@pr01 ~]# hares -list -clus cluster1
DG_Res                  cluster1:dr01
DG_Res                  cluster1:pr01
Mount_Res               cluster1:dr01
Mount_Res               cluster1:pr01
NFS_DG1                 cluster1:dr01
NFS_DG1                 cluster1:pr01
NFS_IP1                 cluster1:dr01
NFS_IP1                 cluster1:pr01
NFS_MOUNT1              cluster1:dr01
NFS_MOUNT1              cluster1:pr01
NFS_NIC1                cluster1:dr01
NFS_NIC1                cluster1:pr01
NFS_RESTART1            cluster1:dr01
NFS_RESTART1            cluster1:pr01
NFS_SERVICE1            cluster1:dr01
NFS_SERVICE1            cluster1:pr01
NFS_SHARE1              cluster1:dr01
NFS_SHARE1              cluster1:pr01
NFS_VOLUME1             cluster1:dr01
NFS_VOLUME1             cluster1:pr01
Nic_Res                 cluster1:dr01
Nic_Res                 cluster1:pr01
Service_IP              cluster1:dr01
Service_IP              cluster1:pr01
Volume_Res              cluster1:dr01
Volume_Res              cluster1:pr01
Web_Res                 cluster1:dr01
Web_Res                 cluster1:pr01

Make it good.

[root@pr01 ~]# hares -list -clus cluster1 |grep -i pr01
DG_Res                  cluster1:pr01
Mount_Res               cluster1:pr01
NFS_DG1                 cluster1:pr01
NFS_IP1                 cluster1:pr01
NFS_MOUNT1              cluster1:pr01
NFS_NIC1                cluster1:pr01
NFS_RESTART1            cluster1:pr01
NFS_SERVICE1            cluster1:pr01
NFS_SHARE1              cluster1:pr01
NFS_VOLUME1             cluster1:pr01
Nic_Res                 cluster1:pr01
Service_IP              cluster1:pr01
Volume_Res              cluster1:pr01
Web_Res                 cluster1:pr01

Now it is looking good, but what about service group. Means how do I know that how many service groups are in cluster…?

[root@pr01 ~]# hagrp -list
NFS_APP1                dr01
NFS_APP1                pr01
Web-App                 dr01
Web-App                 pr01

Great, but how do I know that how many resources a SG have…?

[root@pr01 ~]# hagrp -resources NFS_APP1
NFS_DG1
NFS_IP1
NFS_MOUNT1
NFS_SERVICE1
NFS_RESTART1
NFS_NIC1
NFS_SHARE1
NFS_VOLUME1

[root@pr01 ~]# hagrp -resources Web-App
Web_Res
DG_Res
Service_IP
Mount_Res
Nic_Res
Volume_Res

Super, now I want to know the details about a particular resource…?

[root@pr01 ~]# hares -display Service_IP
[root@pr01 ~]# hares -display Web_Res

[root@pr01 ~]# hares -dep NFS_IP1
#Group       Parent       Child
NFS_APP1     NFS_IP1      NFS_NIC1
NFS_APP1     NFS_RESTART1 NFS_IP1

CRITICAL & NON CRITICAL RESOURCES,

The Critical attribute for a resource defines whether a service group fails over when the resource faults. If a resource is configured as non-critical (by setting the Critical attribute to 0) and no resources depending on the failed resource are critical, the service group will not fail over. VCS takes the failed resource offline and updates the group status to ONLINE|PARTIAL. The attribute also determines whether a service group tries to come online on another node if, during the group's online process, a resource fails to come online.

If a resource is configured as critical, then if that resource faults, then VCS will failover the group the resource is belongs to.  If the resource is not critical and that resource faults (and there are no critical resouces depending on the non-critical resource) then its service group will not failover.    A typical example of a non-critical resource is a backup IP - you want it in the service group so you can backup the applicaion where it resides, but if the backup IP fails, you dont want to cause an outage to your app while VCS fails the group over.

[root@pr01 ~]# hares -list Critical=1
[root@pr01 ~]# hares -list Critical=0

If Critical is 0, the resource fault will not cause group failover
Default: 1

How to know the state (online/offline) of resources…?

[root@pr01 ~]# hares -state |grep pr01
DG_Res       State                 pr01       ONLINE
Mount_Res    State                 pr01       ONLINE
NFS_DG1      State                 pr01       OFFLINE
NFS_IP1      State                 pr01       OFFLINE
NFS_MOUNT1   State                 pr01       OFFLINE
NFS_NIC1     State                 pr01       ONLINE
NFS_RESTART1 State                 pr01       OFFLINE
NFS_SERVICE1 State                 pr01       ONLINE
NFS_SHARE1   State                 pr01       OFFLINE
NFS_VOLUME1  State                 pr01       OFFLINE
Nic_Res      State                 pr01       ONLINE
Service_IP   State                 pr01       ONLINE
Volume_Res   State                 pr01       ONLINE
Web_Res      State                 pr01       OFFLINE

How to know the virtual ip address configured with SG…?

[root@pr01 ~]# hares -value Service_IP Address
192.168.234.200
[root@pr01 ~]# hares -value NFS_IP1 Address
192.168.234.190

How to know the SG is frozen or not…?

Freeze a service group to prevent it from failing over to another system. This freezing process stops all online and offline procedures on the service group. If you have used "persistent" flag in the "hagrp -freeze" command, the group will remain frozen & won't change the status once node is rebooted. So if a node is rebooted, as soon VCS is started, VCS will read the persistent frozen flag & will not take any action on group & hence it is offline.

Unfreeze a frozen service group to perform online or offline operations on the service group.

If you freeze a service group, it cannot be switched to another node in VCS until it is unfrozen. If any of the resources in the service group fault, VCS will not take any action (eg: even if that fault would normally trigger a failover if the group was unfrozen).
If you freeze a node/system, this just prevents service groups from being brought online on the node. Existing service groups will continue to run on the system, however if a critical resource faults, this will trigger failover to another system.

[root@pr01 ~]# hagrp -list Frozen=1
[root@pr01 ~]# hagrp -list Frozen=0
NFS_APP1                dr01
NFS_APP1                pr01
Web-App                 dr01
Web-App                 pr01

There is no frozen SG,

[root@pr01 /]# haconf -makerw
[root@pr01 ~]# hagrp -freeze NFS_APP1 –persistent

[root@pr01 ~]# hastatus –sum

[root@pr01 ~]# hagrp -list Frozen=0
Web-App                 dr01
Web-App                 pr01
[root@pr01 ~]# hagrp -list Frozen=1
NFS_APP1                dr01
NFS_APP1                pr01
[root@pr01 ~]# hagrp -unfreeze NFS_APP1 -persistent
[root@pr01 ~]# hagrp -list Frozen=1
[root@pr01 ~]# hagrp -list Frozen=0
NFS_APP1                dr01
NFS_APP1                pr01
Web-App                 dr01
Web-App                 pr01
[root@pr01 /]# haconf -dump –makero

How to know that Autostart is set for an SG is or not…?

Designates whether a service group is automatically started when VCS is started.
Default: 1 (enabled)

Restarts a service group after a faulted persistent resource becomes online.
The attribute can take the following values:
0 - Autorestart is disabled.
1 - Autorestart is enabled.
2 - When a faulted persistent resource recovers from a fault, the VCS engine clears the faults on all non-persistent faulted resources on the system. It then restarts the service group.

[root@pr01 ~]# hagrp -display Web-App |grep -i autostart
Web-App      AutoStart             global     1
Web-App      AutoStartIfPartial    global     1
Web-App      AutoStartList         global     pr01
Web-App      AutoStartPolicy       global     Order

Method to change “Autostart”

# haconf -makerw
# hares -modify Web-App AutoStart 0
# haconf -dump –makero


REFERENCE & GOOD READ:




No comments:

Post a Comment