VCS ON
RHEL6–CONFIGURE & OPERATION-P3
BASIC VCS OPERATION:
How to check the cluster status?
How to check the vcs logs?
How to check the resource and attributes?
How to check the service group and contained resources?
How to check resources & attributes?
how do I know that how many service groups are in
cluster?
how do I know that how many resources a SG have?
How to know the details about a particular resource…?
CRITICAL & NON CRITICAL RESOURCES:
How to know the state (online/offline) of resources…?
How to know the virtual ip address configured with SG…?
How to know the SG is frozen or not…?
How to know that Autostart is set for an SG is or not…?
BASIC VCS OPERATION:
How to check the cluster status?
[root@pr01 ~]# hastatus
[root@pr01 ~]# hastatus –sum
How to check the vcs logs?
[root@pr01 ~]# tail -f /var/VRTSvcs/log/engine_A.log
[root@pr01 ~]# hamsg –list
[root@pr01 ~]# hamsg Apache_A
How to check the resource and attributes?
[root@pr01 ~]# hares -list
[root@pr01 ~]# hares -list -clus cluster1 |grep -i pr01
How to check the service group and contained resources?
[root@pr01 ~]# hagrp -list
[root@pr01 ~]# hagrp -resources NFS_APP1
It shows the status of cluster objects.
[root@pr01 ~]# hastatus
attempting
to connect....
attempting
to connect....connected
group resource system message
---------------
-------------------- -------------------- --------------------
pr01 RUNNING
dr01 RUNNING
NFS_APP1 pr01 ONLINE
NFS_APP1 dr01 OFFLINE
-------------------------------------------------------------------------
Web-App pr01 STOPPING PARTIAL
Web-App dr01 OFFLINE
NFS_DG1 pr01 ONLINE
NFS_DG1 dr01 OFFLINE
NFS_IP1 pr01 ONLINE
-------------------------------------------------------------------------
NFS_IP1 dr01 OFFLINE
NFS_MOUNT1 pr01 ONLINE
NFS_MOUNT1 dr01 OFFLINE
NFS_SERVICE1 pr01 ONLINE
NFS_SERVICE1 dr01 OFFLINE
-------------------------------------------------------------------------
NFS_RESTART1 pr01 ONLINE
NFS_RESTART1 dr01 OFFLINE
NFS_NIC1 pr01 ONLINE
NFS_NIC1 dr01 ONLINE
NFS_SHARE1 pr01 ONLINE
-------------------------------------------------------------------------
NFS_SHARE1 dr01 OFFLINE
NFS_VOLUME1 pr01 ONLINE
NFS_VOLUME1 dr01 OFFLINE
Web_Res pr01 OFFLINE
Web_Res dr01 OFFLINE
-------------------------------------------------------------------------
DG_Res pr01 ONLINE
DG_Res dr01 OFFLINE
Service_IP pr01 ONLINE
Service_IP dr01 OFFLINE
Mount_Res pr01 ONLINE
-------------------------------------------------------------------------
Mount_Res dr01 OFFLINE
Nic_Res pr01 ONLINE
Nic_Res dr01 ONLINE
Volume_Res pr01 ONLINE
Volume_Res dr01 OFFLINE
^C
The o/p is not interesting and it is haphazard also.
Better to get a summary report.
[root@pr01 ~]# hastatus -sum
--
SYSTEM STATE
--
System State Frozen
A dr01 RUNNING 0
A pr01 RUNNING 0
--
GROUP STATE
--
Group System Probed AutoDisabled State
B NFS_APP1 dr01 Y N OFFLINE
B NFS_APP1 pr01 Y N ONLINE
B Web-App dr01 Y N OFFLINE
B Web-App pr01 Y N STOPPING|PARTIAL
--
RESOURCES ONLINING
--
Group Type Resource System IState
F Web-App Apache Web_Res pr01 W_ONLINE_REVERSE_PROPAGATE
We can see that there is problem with one service group “Web-App”,
the resource “Web_Res” having problem. Check the State and IState.
But what is problem with that?
[root@pr01 ~]# tail -f /var/VRTSvcs/log/engine_A.log
VCS_LOG_SCRIPT_NAME=monitor
VCS_LOG_CATEGORY=10061
VCS_LOG_RESOURCE_NAME=Web_Res
VCSONE_LOG_RESOURCE_NAME=Web_Res
VCSONE_LOG_CATEGORY=10061]
2017/03/08
18:56:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd
filesystem not mounted, returning offline
2017/03/08
18:57:03 VCS INFO V-16-1-50135 User admin fired command: hagrp -switch
Web-App dr01 localclus
from ::ffff:192.168.234.1
2017/03/08
18:57:03 VCS NOTICE V-16-1-10208 Initiating switch of group Web-App from system
pr01 to system dr01
2017/03/08
19:01:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd
filesystem not mounted, returning offline
2017/03/08
19:06:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd
filesystem not mounted, returning offline
[root@pr01 ~]# hamsg -list
#Log
data files
Apache_A
NIC_A
HostMonitor_A
DiskGroup_A
CmdSlave-log32661.log_A
Volume_A
NFSRestart_A
CmdSlave-log32638.log_A
CmdSlave-log32634.log_A
CmdSlave-log32649.log_A
NFS_A
IP_A
Share_A
CmdSlave-log32657.log_A
CmdSlave-log32645.log_A
CmdSlave-log32653.log_A
engine_A
CmdSlave-log32641.log_A
Mount_A
CmdServer-log_A
hashadow-err_A
CmdSlave-log32665.log_A
[root@pr01 ~]# hamsg Volume_A
Sat
04 Mar 2017 10:38:41 AM IST VCS ERROR V-16-2-13064 Agent is calling clean for
resource(Volume_Res) because the resource is up even after offline completed.
Sat
04 Mar 2017 10:38:42 AM IST VCS ERROR V-16-2-13069 Resource(Volume_Res) - clean
failed.
Sat
04 Mar 2017 10:39:42 AM IST VCS ERROR V-16-2-13077 Agent is unable to offline
resource(Volume_Res). Administrative intervention may be required.
Sat
04 Mar 2017 11:11:19 AM IST VCS ERROR V-16-2-13078 Resource(Volume_Res) - clean
completed successfully after 33 failed attempts.
Sun
05 Mar 2017 07:37:46 PM IST VCS ERROR V-16-2-13064 Agent is calling clean for
resource(NFS_VOL) because the resource is up even after offline completed.
Sun
05 Mar 2017 07:37:49 PM IST VCS ERROR V-16-2-13068 Resource(NFS_VOL) - clean
completed successfully.
In the o/p we can see what is happening and act accordingly.
Good, there are two things. 1st is “service group” and 2nd
is “resources & attributes” for that service group.
How to check resources & attributes?
[root@pr01 ~]# hares -list
DG_Res dr01
DG_Res pr01
Mount_Res dr01
Mount_Res pr01
NFS_DG1 dr01
NFS_DG1 pr01
NFS_IP1 dr01
NFS_IP1 pr01
NFS_MOUNT1 dr01
NFS_MOUNT1 pr01
NFS_NIC1 dr01
NFS_NIC1 pr01
NFS_RESTART1 dr01
NFS_RESTART1 pr01
NFS_SERVICE1 dr01
NFS_SERVICE1 pr01
NFS_SHARE1 dr01
NFS_SHARE1 pr01
NFS_VOLUME1 dr01
NFS_VOLUME1 pr01
Nic_Res dr01
Nic_Res pr01
Service_IP dr01
Service_IP pr01
Volume_Res dr01
Volume_Res pr01
Web_Res dr01
Web_Res pr01
Same thing is appearing twice…L
And what if I have many clusters…?
[root@pr01 ~]# hares -list -clus cluster1
DG_Res cluster1:dr01
DG_Res cluster1:pr01
Mount_Res cluster1:dr01
Mount_Res cluster1:pr01
NFS_DG1 cluster1:dr01
NFS_DG1 cluster1:pr01
NFS_IP1 cluster1:dr01
NFS_IP1 cluster1:pr01
NFS_MOUNT1 cluster1:dr01
NFS_MOUNT1 cluster1:pr01
NFS_NIC1 cluster1:dr01
NFS_NIC1 cluster1:pr01
NFS_RESTART1 cluster1:dr01
NFS_RESTART1 cluster1:pr01
NFS_SERVICE1 cluster1:dr01
NFS_SERVICE1 cluster1:pr01
NFS_SHARE1 cluster1:dr01
NFS_SHARE1 cluster1:pr01
NFS_VOLUME1 cluster1:dr01
NFS_VOLUME1 cluster1:pr01
Nic_Res cluster1:dr01
Nic_Res cluster1:pr01
Service_IP cluster1:dr01
Service_IP cluster1:pr01
Volume_Res cluster1:dr01
Volume_Res cluster1:pr01
Web_Res cluster1:dr01
Web_Res cluster1:pr01
Make it good.
[root@pr01 ~]# hares -list -clus cluster1 |grep -i pr01
DG_Res
cluster1:pr01
Mount_Res
cluster1:pr01
NFS_DG1
cluster1:pr01
NFS_IP1
cluster1:pr01
NFS_MOUNT1
cluster1:pr01
NFS_NIC1
cluster1:pr01
NFS_RESTART1
cluster1:pr01
NFS_SERVICE1
cluster1:pr01
NFS_SHARE1 cluster1:pr01
NFS_VOLUME1
cluster1:pr01
Nic_Res
cluster1:pr01
Service_IP
cluster1:pr01
Volume_Res
cluster1:pr01
Web_Res
cluster1:pr01
Now it is looking good, but what about service group. Means how do I
know that how many service groups are in cluster…?
[root@pr01 ~]# hagrp -list
NFS_APP1 dr01
NFS_APP1 pr01
Web-App dr01
Web-App pr01
Great, but how do I know that how many resources a SG have…?
[root@pr01 ~]# hagrp -resources NFS_APP1
NFS_DG1
NFS_IP1
NFS_MOUNT1
NFS_SERVICE1
NFS_RESTART1
NFS_NIC1
NFS_SHARE1
NFS_VOLUME1
[root@pr01 ~]# hagrp -resources Web-App
Web_Res
DG_Res
Service_IP
Mount_Res
Nic_Res
Volume_Res
Super, now I want to know the details about a particular resource…?
[root@pr01 ~]# hares -display Service_IP
[root@pr01 ~]# hares -display Web_Res
[root@pr01 ~]# hares -dep NFS_IP1
#Group Parent Child
NFS_APP1 NFS_IP1 NFS_NIC1
NFS_APP1 NFS_RESTART1
NFS_IP1
CRITICAL & NON CRITICAL RESOURCES,
The Critical attribute for a resource defines whether a service group
fails over when the resource faults. If a resource is configured as
non-critical (by setting the Critical attribute to 0) and no resources
depending on the failed resource are critical, the service group will not fail
over. VCS takes the failed resource offline and updates the group status to
ONLINE|PARTIAL. The attribute also determines whether a service group tries to
come online on another node if, during the group's online process, a resource
fails to come online.
If a resource is configured as critical, then if that resource
faults, then VCS will failover the group the resource is belongs to. If the resource is not critical and that
resource faults (and there are no critical resouces depending on the
non-critical resource) then its service group will not failover. A typical example of a non-critical
resource is a backup IP - you want it in the service group so you can backup
the applicaion where it resides, but if the backup IP fails, you dont want to
cause an outage to your app while VCS fails the group over.
[root@pr01 ~]# hares -list Critical=1
[root@pr01 ~]# hares -list Critical=0
If Critical is 0, the resource fault will not cause group failover
Default: 1
How to know the state (online/offline) of resources…?
[root@pr01 ~]# hares -state |grep pr01
DG_Res State pr01 ONLINE
Mount_Res State pr01 ONLINE
NFS_DG1 State pr01 OFFLINE
NFS_IP1 State pr01 OFFLINE
NFS_MOUNT1 State pr01 OFFLINE
NFS_NIC1 State pr01 ONLINE
NFS_RESTART1 State
pr01 OFFLINE
NFS_SERVICE1 State
pr01 ONLINE
NFS_SHARE1 State pr01 OFFLINE
NFS_VOLUME1 State pr01 OFFLINE
Nic_Res State pr01 ONLINE
Service_IP State pr01 ONLINE
Volume_Res State pr01 ONLINE
Web_Res State pr01 OFFLINE
How to know the virtual ip address configured with SG…?
[root@pr01 ~]# hares -value Service_IP Address
192.168.234.200
[root@pr01 ~]# hares -value NFS_IP1 Address
192.168.234.190
How to know the SG is frozen or not…?
Freeze a service group to prevent it from failing over to another
system. This freezing process stops all online and offline procedures on the
service group. If you have used "persistent" flag in the
"hagrp -freeze" command, the group will remain frozen & won't
change the status once node is rebooted. So if a node is rebooted, as soon VCS
is started, VCS will read the persistent frozen flag & will not take any
action on group & hence it is offline.
Unfreeze a frozen service group to perform online or offline
operations on the service group.
If you freeze a service group, it cannot be switched to another node
in VCS until it is unfrozen. If any of the resources in the service group
fault, VCS will not take any action (eg: even if that fault would normally
trigger a failover if the group was unfrozen).
If you freeze a node/system, this just prevents service groups from
being brought online on the node. Existing service groups will continue to run
on the system, however if a critical resource faults, this will trigger
failover to another system.
[root@pr01 ~]# hagrp -list Frozen=1
[root@pr01 ~]# hagrp -list Frozen=0
NFS_APP1 dr01
NFS_APP1 pr01
Web-App dr01
Web-App pr01
There is no frozen SG,
[root@pr01 /]# haconf -makerw
[root@pr01 ~]# hagrp -freeze NFS_APP1 –persistent
[root@pr01 ~]# hastatus –sum
[root@pr01 ~]# hagrp -list Frozen=0
Web-App dr01
Web-App pr01
[root@pr01 ~]# hagrp -list Frozen=1
NFS_APP1 dr01
NFS_APP1 pr01
[root@pr01 ~]# hagrp -unfreeze NFS_APP1 -persistent
[root@pr01 ~]# hagrp -list Frozen=1
[root@pr01 ~]# hagrp -list Frozen=0
NFS_APP1 dr01
NFS_APP1 pr01
Web-App dr01
Web-App pr01
[root@pr01 /]# haconf -dump –makero
How to know that Autostart is set for an SG is or not…?
Designates whether a service group is automatically started when VCS
is started.
Default: 1 (enabled)
Restarts a service group after a faulted persistent resource becomes
online.
The attribute can take the following values:
0 - Autorestart is disabled.
1 - Autorestart is enabled.
2 - When a faulted persistent resource recovers from a fault, the VCS
engine clears the faults on all non-persistent faulted resources on the system.
It then restarts the service group.
[root@pr01 ~]# hagrp -display Web-App |grep -i autostart
Web-App AutoStart global 1
Web-App
AutoStartIfPartial global 1
Web-App AutoStartList global pr01
Web-App
AutoStartPolicy global Order
Method to change “Autostart”
# haconf -makerw
# hares -modify Web-App AutoStart 0
# haconf -dump –makero
REFERENCE & GOOD READ:
No comments:
Post a Comment