THE ADMIN's LAB : VCS ON RHEL6–CONFIGURE & OPERATION-P3

VCS ON RHEL6–CONFIGURE & OPERATION-P3

BASIC VCS OPERATION:

How to check the cluster status?

How to check the vcs logs?

How to check the resource and attributes?

How to check the service group and contained resources?

How to check resources & attributes?

how do I know that how many service groups are in cluster?

how do I know that how many resources a SG have?

How to know the details about a particular resource…?

CRITICAL & NON CRITICAL RESOURCES:

How to know the state (online/offline) of resources…?

How to know the virtual ip address configured with SG…?

How to know the SG is frozen or not…?

How to know that Autostart is set for an SG is or not…?

BASIC VCS OPERATION:

How to check the cluster status?

[root@pr01 ~]# hastatus

[root@pr01 ~]# hastatus –sum

How to check the vcs logs?

[root@pr01 ~]# tail -f /var/VRTSvcs/log/engine_A.log

[root@pr01 ~]# hamsg –list

[root@pr01 ~]# hamsg Apache_A

How to check the resource and attributes?

[root@pr01 ~]# hares -list

[root@pr01 ~]# hares -list -clus cluster1 |grep -i pr01

How to check the service group and contained resources?

[root@pr01 ~]# hagrp -list

[root@pr01 ~]# hagrp -resources NFS_APP1

It shows the status of cluster objects.

[root@pr01 ~]# hastatus

attempting to connect....

attempting to connect....connected

group resource system message

--------------- -------------------- -------------------- --------------------

pr01 RUNNING

dr01 RUNNING

NFS_APP1 pr01 ONLINE

NFS_APP1 dr01 OFFLINE

-------------------------------------------------------------------------

Web-App pr01 STOPPING PARTIAL

Web-App dr01 OFFLINE

NFS_DG1 pr01 ONLINE

NFS_DG1 dr01 OFFLINE

NFS_IP1 pr01 ONLINE

-------------------------------------------------------------------------

NFS_IP1 dr01 OFFLINE

NFS_MOUNT1 pr01 ONLINE

NFS_MOUNT1 dr01 OFFLINE

NFS_SERVICE1 pr01 ONLINE

NFS_SERVICE1 dr01 OFFLINE

-------------------------------------------------------------------------

NFS_RESTART1 pr01 ONLINE

NFS_RESTART1 dr01 OFFLINE

NFS_NIC1 pr01 ONLINE

NFS_NIC1 dr01 ONLINE

NFS_SHARE1 pr01 ONLINE

-------------------------------------------------------------------------

NFS_SHARE1 dr01 OFFLINE

NFS_VOLUME1 pr01 ONLINE

NFS_VOLUME1 dr01 OFFLINE

Web_Res pr01 OFFLINE

Web_Res dr01 OFFLINE

-------------------------------------------------------------------------

DG_Res pr01 ONLINE

DG_Res dr01 OFFLINE

Service_IP pr01 ONLINE

Service_IP dr01 OFFLINE

Mount_Res pr01 ONLINE

-------------------------------------------------------------------------

Mount_Res dr01 OFFLINE

Nic_Res pr01 ONLINE

Nic_Res dr01 ONLINE

Volume_Res pr01 ONLINE

Volume_Res dr01 OFFLINE

The o/p is not interesting and it is haphazard also.

Better to get a summary report.

[root@pr01 ~]# hastatus -sum

-- SYSTEM STATE

-- System State Frozen

A dr01 RUNNING 0

A pr01 RUNNING 0

-- GROUP STATE

-- Group System Probed AutoDisabled State

B NFS_APP1 dr01 Y N OFFLINE

B NFS_APP1 pr01 Y N ONLINE

B Web-App dr01 Y N OFFLINE

B Web-App pr01 Y N STOPPING|PARTIAL

-- RESOURCES ONLINING

-- Group Type Resource System IState

F Web-App Apache Web_Res pr01 W_ONLINE_REVERSE_PROPAGATE

We can see that there is problem with one service group “Web-App”, the resource “Web_Res” having problem. Check the State and IState.

But what is problem with that?

[root@pr01 ~]# tail -f /var/VRTSvcs/log/engine_A.log

VCS_LOG_SCRIPT_NAME=monitor

VCS_LOG_CATEGORY=10061

VCS_LOG_RESOURCE_NAME=Web_Res

VCSONE_LOG_RESOURCE_NAME=Web_Res

VCSONE_LOG_CATEGORY=10061]

2017/03/08 18:56:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd filesystem not mounted, returning offline

2017/03/08 18:57:03 VCS INFO V-16-1-50135 User admin fired command: hagrp -switch Web-App dr01 localclus from ::ffff:192.168.234.1

2017/03/08 18:57:03 VCS NOTICE V-16-1-10208 Initiating switch of group Web-App from system pr01 to system dr01

2017/03/08 19:01:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd filesystem not mounted, returning offline

2017/03/08 19:06:38 VCS WARNING V-16-10031-7017 (dr01) NFS:NFS_SERVICE1:monitor:nfsd filesystem not mounted, returning offline

[root@pr01 ~]# hamsg -list

#Log data files

Apache_A

NIC_A

HostMonitor_A

DiskGroup_A

CmdSlave-log32661.log_A

Volume_A

NFSRestart_A

CmdSlave-log32638.log_A

CmdSlave-log32634.log_A

CmdSlave-log32649.log_A

NFS_A

IP_A

Share_A

CmdSlave-log32657.log_A

CmdSlave-log32645.log_A

CmdSlave-log32653.log_A

engine_A

CmdSlave-log32641.log_A

Mount_A

CmdServer-log_A

hashadow-err_A

CmdSlave-log32665.log_A

[root@pr01 ~]# hamsg Volume_A

Sat 04 Mar 2017 10:38:41 AM IST VCS ERROR V-16-2-13064 Agent is calling clean for resource(Volume_Res) because the resource is up even after offline completed.

Sat 04 Mar 2017 10:38:42 AM IST VCS ERROR V-16-2-13069 Resource(Volume_Res) - clean failed.

Sat 04 Mar 2017 10:39:42 AM IST VCS ERROR V-16-2-13077 Agent is unable to offline resource(Volume_Res). Administrative intervention may be required.

Sat 04 Mar 2017 11:11:19 AM IST VCS ERROR V-16-2-13078 Resource(Volume_Res) - clean completed successfully after 33 failed attempts.

Sun 05 Mar 2017 07:37:46 PM IST VCS ERROR V-16-2-13064 Agent is calling clean for resource(NFS_VOL) because the resource is up even after offline completed.

Sun 05 Mar 2017 07:37:49 PM IST VCS ERROR V-16-2-13068 Resource(NFS_VOL) - clean completed successfully.

In the o/p we can see what is happening and act accordingly.

Good, there are two things. 1^st is “service group” and 2^nd is “resources & attributes” for that service group.

How to check resources & attributes?

[root@pr01 ~]# hares -list

DG_Res dr01

DG_Res pr01

Mount_Res dr01

Mount_Res pr01

NFS_DG1 dr01

NFS_DG1 pr01

NFS_IP1 dr01

NFS_IP1 pr01

NFS_MOUNT1 dr01

NFS_MOUNT1 pr01

NFS_NIC1 dr01

NFS_NIC1 pr01

NFS_RESTART1 dr01

NFS_RESTART1 pr01

NFS_SERVICE1 dr01

NFS_SERVICE1 pr01

NFS_SHARE1 dr01

NFS_SHARE1 pr01

NFS_VOLUME1 dr01

NFS_VOLUME1 pr01

Nic_Res dr01

Nic_Res pr01

Service_IP dr01

Service_IP pr01

Volume_Res dr01

Volume_Res pr01

Web_Res dr01

Web_Res pr01

Same thing is appearing twice…L

And what if I have many clusters…?

[root@pr01 ~]# hares -list -clus cluster1

DG_Res cluster1:dr01

DG_Res cluster1:pr01

Mount_Res cluster1:dr01

Mount_Res cluster1:pr01

NFS_DG1 cluster1:dr01

NFS_DG1 cluster1:pr01

NFS_IP1 cluster1:dr01

NFS_IP1 cluster1:pr01

NFS_MOUNT1 cluster1:dr01

NFS_MOUNT1 cluster1:pr01

NFS_NIC1 cluster1:dr01

NFS_NIC1 cluster1:pr01

NFS_RESTART1 cluster1:dr01

NFS_RESTART1 cluster1:pr01

NFS_SERVICE1 cluster1:dr01

NFS_SERVICE1 cluster1:pr01

NFS_SHARE1 cluster1:dr01

NFS_SHARE1 cluster1:pr01

NFS_VOLUME1 cluster1:dr01

NFS_VOLUME1 cluster1:pr01

Nic_Res cluster1:dr01

Nic_Res cluster1:pr01

Service_IP cluster1:dr01

Service_IP cluster1:pr01

Volume_Res cluster1:dr01

Volume_Res cluster1:pr01

Web_Res cluster1:dr01

Web_Res cluster1:pr01

Make it good.

[root@pr01 ~]# hares -list -clus cluster1 |grep -i pr01

DG_Res cluster1:pr01

Mount_Res cluster1:pr01

NFS_DG1 cluster1:pr01

NFS_IP1 cluster1:pr01

NFS_MOUNT1 cluster1:pr01

NFS_NIC1 cluster1:pr01

NFS_RESTART1 cluster1:pr01

NFS_SERVICE1 cluster1:pr01

NFS_SHARE1 cluster1:pr01

NFS_VOLUME1 cluster1:pr01

Nic_Res cluster1:pr01

Service_IP cluster1:pr01

Volume_Res cluster1:pr01

Web_Res cluster1:pr01

Now it is looking good, but what about service group. Means how do I know that how many service groups are in cluster…?

[root@pr01 ~]# hagrp -list

NFS_APP1 dr01

NFS_APP1 pr01

Web-App dr01

Web-App pr01

Great, but how do I know that how many resources a SG have…?

[root@pr01 ~]# hagrp -resources NFS_APP1

NFS_DG1

NFS_IP1

NFS_MOUNT1

NFS_SERVICE1

NFS_RESTART1

NFS_NIC1

NFS_SHARE1

NFS_VOLUME1

[root@pr01 ~]# hagrp -resources Web-App

Web_Res

DG_Res

Service_IP

Mount_Res

Nic_Res

Volume_Res

Super, now I want to know the details about a particular resource…?

[root@pr01 ~]# hares -display Service_IP

[root@pr01 ~]# hares -display Web_Res

[root@pr01 ~]# hares -dep NFS_IP1

#Group Parent Child

NFS_APP1 NFS_IP1 NFS_NIC1

NFS_APP1 NFS_RESTART1 NFS_IP1

CRITICAL & NON CRITICAL RESOURCES,

The Critical attribute for a resource defines whether a service group fails over when the resource faults. If a resource is configured as non-critical (by setting the Critical attribute to 0) and no resources depending on the failed resource are critical, the service group will not fail over. VCS takes the failed resource offline and updates the group status to ONLINE|PARTIAL. The attribute also determines whether a service group tries to come online on another node if, during the group's online process, a resource fails to come online.

If a resource is configured as critical, then if that resource faults, then VCS will failover the group the resource is belongs to. If the resource is not critical and that resource faults (and there are no critical resouces depending on the non-critical resource) then its service group will not failover. A typical example of a non-critical resource is a backup IP - you want it in the service group so you can backup the applicaion where it resides, but if the backup IP fails, you dont want to cause an outage to your app while VCS fails the group over.

[root@pr01 ~]# hares -list Critical=1

[root@pr01 ~]# hares -list Critical=0

If Critical is 0, the resource fault will not cause group failover

Default: 1

How to know the state (online/offline) of resources…?

[root@pr01 ~]# hares -state |grep pr01

DG_Res State pr01 ONLINE

Mount_Res State pr01 ONLINE

NFS_DG1 State pr01 OFFLINE

NFS_IP1 State pr01 OFFLINE

NFS_MOUNT1 State pr01 OFFLINE

NFS_NIC1 State pr01 ONLINE

NFS_RESTART1 State pr01 OFFLINE

NFS_SERVICE1 State pr01 ONLINE

NFS_SHARE1 State pr01 OFFLINE

NFS_VOLUME1 State pr01 OFFLINE

Nic_Res State pr01 ONLINE

Service_IP State pr01 ONLINE

Volume_Res State pr01 ONLINE

Web_Res State pr01 OFFLINE

How to know the virtual ip address configured with SG…?

[root@pr01 ~]# hares -value Service_IP Address

192.168.234.200

[root@pr01 ~]# hares -value NFS_IP1 Address

192.168.234.190

How to know the SG is frozen or not…?

Freeze a service group to prevent it from failing over to another system. This freezing process stops all online and offline procedures on the service group. If you have used "persistent" flag in the "hagrp -freeze" command, the group will remain frozen & won't change the status once node is rebooted. So if a node is rebooted, as soon VCS is started, VCS will read the persistent frozen flag & will not take any action on group & hence it is offline.

Unfreeze a frozen service group to perform online or offline operations on the service group.

If you freeze a service group, it cannot be switched to another node in VCS until it is unfrozen. If any of the resources in the service group fault, VCS will not take any action (eg: even if that fault would normally trigger a failover if the group was unfrozen).

If you freeze a node/system, this just prevents service groups from being brought online on the node. Existing service groups will continue to run on the system, however if a critical resource faults, this will trigger failover to another system.

[root@pr01 ~]# hagrp -list Frozen=1

[root@pr01 ~]# hagrp -list Frozen=0

NFS_APP1 dr01

NFS_APP1 pr01

Web-App dr01

Web-App pr01

There is no frozen SG,

[root@pr01 /]# haconf -makerw

[root@pr01 ~]# hagrp -freeze NFS_APP1 –persistent

[root@pr01 ~]# hastatus –sum

[root@pr01 ~]# hagrp -list Frozen=0

Web-App dr01

Web-App pr01

[root@pr01 ~]# hagrp -list Frozen=1

NFS_APP1 dr01

NFS_APP1 pr01

[root@pr01 ~]# hagrp -unfreeze NFS_APP1 -persistent

[root@pr01 ~]# hagrp -list Frozen=1

[root@pr01 ~]# hagrp -list Frozen=0

NFS_APP1 dr01

NFS_APP1 pr01

Web-App dr01

Web-App pr01

[root@pr01 /]# haconf -dump –makero

How to know that Autostart is set for an SG is or not…?

Designates whether a service group is automatically started when VCS is started.

Default: 1 (enabled)

Restarts a service group after a faulted persistent resource becomes online.

The attribute can take the following values:

0 - Autorestart is disabled.

1 - Autorestart is enabled.

2 - When a faulted persistent resource recovers from a fault, the VCS engine clears the faults on all non-persistent faulted resources on the system. It then restarts the service group.

[root@pr01 ~]# hagrp -display Web-App |grep -i autostart

Web-App AutoStart global 1

Web-App AutoStartIfPartial global 1

Web-App AutoStartList global pr01

Web-App AutoStartPolicy global Order

Method to change “Autostart”

# haconf -makerw

# hares -modify Web-App AutoStart 0

# haconf -dump –makero

REFERENCE & GOOD READ:

https://sort.veritas.com/public/documents/sf/5.0MP3/aix/html/vcs_users/ap_vcs_attributes5.html

https://sort.veritas.com/public/documents/sf/5.0/solaris/html/vcs_users/ap_vcs_attributes5.html