Monday, December 04, 2006

The cluster is hanging !!!!

The cluster is hanging, the system is hanging, everything seems to be hanging, and I'm not sure what's going on. What I should and shouldn't I do with VCS?


In emergency situations, it's probably not a good idea to blindly run commands if you don't know what state your services are in. Doing so can cause Concurrency Violations and split brains, which can cause further confusion or data corruption.

Here are some safe commands to gather data and orient yourself before calling support:

hastatus -sum
hares -probe {resource name} -sys {machine name}
/sbin/gabconfig -a
ps -ef
ifconfig -a
vxdg list
vxdisk -o alldgs list
df -kl
uptime
Sometimes it may be wise to freeze a Service Group or force stop VCS:

hagrp -freeze {Service Group} -persistent
haconf -dump -makero
hastop -local -force or hastop -all -force
Force stopping VCS is a common practice when "things get stuck". Force stopping VCS also lets your applications stay up (if they are still up). Also, when a Group is frozen, a force stop is the only way to shutdown VCS. The following operations are usually not very helpful when things are hanging:

hagrp -offline {Service Group} -sys {machine name}
hagrp -online {Service Group} -sys {machine name}
hagrp -switch {Service Group} -to {machine name}
hastop -local
hastop -all
Why? Because these commands assume your systems are behaving normally. These commands tell VCS to online or offline services in an orderly manner. But if your system or cluster is already hung, running these commands probably won't to do any good. These commands may just hang themselves, get queued up in a job scheduler, and add additional load to your system.

Also, if you are unfamiliar with the cluster, running "hastop -all" could shutdown or hang *everything* on all nodes, causing additional unnecessary downtime.

In an emergency situation where you are unfamiliar with the cluster, its probably best to gather information and call Support, instead of trying to make VCS do things haphazardly.

No comments: