Wednesday, June 11, 2008

Monitoring an HACMP Cluster

This chapter describes tools you can use to monitor an HACMP cluster.
You can use either ASCII SMIT or WebSMIT to configure and manage the cluster and view
interactive cluster status. Starting with HACMP 5.4, you can also use WebSMIT to navigate,
configure and view the status of the and graphical displays of the running cluster. For more
information about WebSMIT, see Chapter 2: Administering a Cluster Using WebSMIT.
Note: The default locations of log files are used in this chapter. If you
redirected any logs, check the appropriate location.
The main topics in this chapter include:
• Periodically Monitoring an HACMP Cluster
• Monitoring a Cluster with HAView
• Monitoring Clusters with Tivoli Distributed Monitoring
• Monitoring Clusters with clstat
• Monitoring Applications
• Monitoring Applications
• Displaying an Application-Centric Cluster View
• Using Resource Groups Information Commands
• Using HACMP Topology Information Commands
• Monitoring Cluster Services
• HACMP Log Files.
Periodically Monitoring an HACMP Cluster
By design, HACMP provides recovery for various failures that occur within a cluster. For
example, HACMP can compensate for a network interface failure by swapping in a standby
interface. As a result, it is possible that a component in the cluster has failed and that you are
unaware of the fact. The danger here is that, while HACMP can survive one or possibly several
failures, each failure that escapes your notice threatens a cluster’s ability to provide a highly
available environment, as the redundancy of cluster components is diminished.
To avoid this situation, you should customize your system by adding event notification to the
scripts designated to handle the various cluster events. You can specify a command that sends
you mail indicating that an event is about to happen (or that an event has just occurred), along
with information about the success or failure of the event. The mail notification system
enhances the standard event notification methods.
In addition, HACMP offers application monitoring capability that you can configure and
customize in order to monitor the health of specific applications and processes.
Monitoring an HACMP Cluster
Periodically Monitoring an HACMP Cluster
266 Administration Guide
10
Use the AIX 5L Error Notification facility to add an additional layer of high availability to an
HACMP environment. You can add notification for failures of resources for which HACMP
does not provide recovery by default. The combination of HACMP and the high availability
features built into the AIX 5L system keeps single points of failure to a minimum; the Error
Notification facility can further enhance the availability of your particular environment. See the
chapter on Configuring AIX 5L for HACMP in the Installation Guide for suggestions on
customizing error notification.
See Chapter 7: Planning for Cluster Events in the Planning Guide for detailed information on
predefined events and on customizing event handling. Also, be sure to consult your worksheets,
to document any changes you make to your system, and to periodically inspect the key cluster
components to make sure they are in full working order.
Automatic Cluster Configuration Monitoring
Verification automatically runs on one user-selectable HACMP cluster node once every 24
hours. By default, the first node in alphabetical order runs the verification at midnight. If
verification finds errors, it warns about recent configuration issues that might cause problems
at some point in the future. HACMP stores the results of the automatic monitoring on every
available cluster node in the /var/hacmp/log/clutils.log file.
If cluster verification detects some configuration errors, you are notified about the potential
problems:
• The exit status of verification is published across the cluster along with the information
about cluster verification process completion.
• Broadcast messages are sent across the cluster and displayed on stdout. These messages
inform you about detected configuration errors.
• A cluster_notify event runs on the cluster and is logged in hacmp.out (if cluster services
is running).
More detailed information is available on the node that completes cluster verification in
/var/hacmp/clverify/clverify.log file. If a failure occurs during processing, error messages and
warnings clearly indicate the node and reasons for the verification failure.
Tools for Monitoring an HACMP Cluster
HACMP supplies tools for monitoring a cluster. These are described in subsequent sections:
• The HAView utility extends Tivoli NetView services so you can monitor HACMP clusters
and cluster components from a single node. Using HAView, you can also view the full
cluster event history in the /usr/es/sbin/cluster/history/cluster.mmddyyyy file. The event
history (and other cluster status and configuration information) is accessible through Tivoli
NetView’s menu bar. For more information, see Monitoring a Cluster with HAView.
• Cluster Monitoring with Tivoli allows you to monitor clusters and cluster components
and perform cluster administration tasks through your Tivoli Framework console. For more
information, see Monitoring Clusters with Tivoli Distributed Monitoring.
• clstat (the /usr/es/sbin/cluster/clstat utility) reports the status of key cluster
components—the cluster itself, the nodes in the cluster, the network interfaces connected
to the nodes, the service labels, and the resource groups on each node.
Monitoring an HACMP Cluster
Periodically Monitoring an HACMP Cluster
Administration Guide 267
10
• WebSMIT displays cluster information using a slightly different layout and
organization. Cluster components are displayed along their status. Expanding the item
reveals additional information about it, including the network, interfaces and active
resource groups.
For more information, see Monitoring Clusters with clstat.
• Application Monitoring allows you to monitor specific applications and processes and
define action to take upon detection of process death or other application failures.
Application monitors can watch for the successful startup of the application, check that the
application runs successfully after the stabilization interval has passed, or monitor both the
startup and the long-running process. For more information, see Monitoring Applications.
• SMIT and WebSMIT give you information on the cluster.
You have the ability to see the cluster from an application-centric point of view.
• The HACMP Resource Group and Application Management panel in SMIT has an
option to Show Current Resource Group and Application State. The SMIT panel
Show All Resources by Node or Resource Group has an option linking you to the
Show Current Resource Group and Application State panel.
• Using the WebSMIT version lets you expand and collapse areas of the information.
Colors reflect the state of individual items (for example, green indicates online).
For more information, see Displaying an Application-Centric Cluster View.
The System Management (C-SPOC) >Manage HACMP Services > Show Cluster
Services SMIT panel shows the status of the HACMP daemons.
• The Application Availability Analysis tool measures uptime statistics for applications
with application servers defined to HACMP. For more information, see Measuring
Application Availability.
• The clRGinfo and cltopinfo commands display useful information on resource group
configuration and status and topology configuration, respectively. For more information,
see Using Resource Groups Information Commands.
• Log files allow you to track cluster events and history: The /usr/es/adm/cluster.log file
tracks cluster events; the /tmp/hacmp.out file records the output generated by
configuration scripts as they execute; the /usr/es/sbin/cluster/history/cluster.mmddyyyy
log file logs the daily cluster history; the /tmp/cspoc.log file logs the status of C-SPOC
commands executed on cluster nodes. You should also check the RSCT log files. For more
information, see HACMP Log Files.
In addition to these cluster monitoring tools, you can use the following:
• The Event Emulator provides an emulation of cluster events. For more information, see
the section on Emulating Events in the Concepts Guide.
• The Custom Remote Notification utility allows you to define a notification method
through the SMIT interface to issue a customized page in response to a cluster event. In
HACMP 5.3 and up, you can also send text messaging notification to any address including
a cell phone. For information and instructions on setting up pager notification, see the
section on Configuring a Custom Remote Notification Method in the Planning Guide.
Monitoring an HACMP Cluster
Monitoring a Cluster with HAView
268 Administration Guide
10
Monitoring a Cluster with HAView
HAView is a cluster monitoring utility that allows you to monitor HACMP clusters using
NetView for UNIX. Using Tivoli NetView, you can monitor clusters and cluster components
across a network from a single management station.
HAView creates and registers Tivoli NetView objects that represent clusters and cluster
components. It also creates submaps that present information about the state of all nodes,
networks, network interfaces, and resource groups associated with a particular cluster. This
cluster status and configuration information is accessible through Tivoli NetView’s menu bar.
HAView monitors cluster status using the Simple Network Management Protocol (SNMP). It
combines periodic polling and event notification through traps to retrieve cluster topology and
state changes from the HACMP management agent, the Cluster Manager.
You can view cluster event history using the HACMP Event Browser and node event history
using the Cluster Event Log. Both browsers can be accessed from the Tivoli NetView menu bar.
The /usr/es/sbin/cluster/history/cluster.mmddyyyy file contains more specific event history.
This information is helpful for diagnosing and troubleshooting fallover situations. For more
information about this log file, see Chapter 2: Using Cluster Log Files in the
Troubleshooting Guide.
HAView Installation Requirements
HAView has a client/server architecture. You must install both an HAView server image and
an HAView client image, on the same machine or on separate server and client machines. For
information about installation requirements, see Installation Guide.
HAView File Modification Considerations
Certain files need to be modified in order for HAViewto monitor your cluster properly. When
configuring HAView, you should check and edit the following files:
• haview_start
• clhost
• snmpd.conf or snmpdv3.conf
haview_start File
You must edit the haview_start file so that it includes the name of the node that has the
HAView server executable installed. This is how the HAView client knows where the HAView
server is located. Regardless of whether the HAView server and client are on the same node or
different nodes, you are required to specify the HAView server node in the haview_start file.
The haview_start file is loaded when the HAView client is installed and is stored in
/usr/haview. Initially, the haview_start file contains only the following line:
"${HAVIEW_CLIENT:-/usr/haview/haview_client}" $SERVER
You must add the following line to the file:
SERVER="${SERVER:-}"
Monitoring an HACMP Cluster
Monitoring a Cluster with HAView
Administration Guide 269
10
For example, if the HAView server is installed on mynode, the edited haview_start file appears
as follows:
SERVER="${SERVER:-mynode}"
"${HAVIEW_CLIENT:-/usr/haview/haview_client}" $SERVER
where mynode is the node that contains the HAView server executable.
Note: If you have configured a persistent node IP label on a node on a
network in your cluster, it maintains a persistent “node address” on the
node – this address can be used in the haview_start file.
clhosts File
HAView monitors a cluster’s state within a network topology based on cluster-specific
information in the /usr/es/sbin/cluster/etc/clhosts file. The clhosts file must be present on the
Tivoli NetView management node. Make sure this file contains the IP address or IP label of the
service and/or base interfaces of the nodes in each cluster that HAView is to monitor.
Make sure that the hostname and the service label of your Tivoli NetView nodes are exactly the
same. (If they are not the same, add an alias in the /etc/hosts file to resolve the name difference.)
WARNING: If an invalid IP address exists in the clhosts file, HAView will
fail to monitor the cluster. Make sure the IP addresses are valid,
and there are no extraneous characters in the clhosts file.
snmpd.conf File
The Tivoli NetView management node must also be configured in the list of trap destinations
in the snmpd.conf files on the cluster nodes of all clusters you want it to monitor. This makes
it possible for HAView to utilize traps in order to reflect cluster state changes in the submap in
a timely manner. Also, HAView can discover clusters not specified in the clhosts file on the
nodes in another cluster.
Note: The default version of the snmpd.conf file for AIX 5L v.5.2 and
AIX 5L v. 5.3 is snmpdv3.conf.
The format for configuring trap destinations is as follows:
trap 1.2.3 fe
For example, enter:
trap public 140.186.131.121 1.2.3 fe
Note the following:
• You can specify the name of the management node instead of the IP address.
• You can include multiple trap lines in the snmpd.conf file.
Note: HACMP now supports a SNMP Community Name other than
“public.” If the default SNMP Community Name has been changed in
/etc/snmpd.conf to something different from the default of “public”
HACMP will function correctly. The SNMP Community Name used
by HACMP will be the first name found that is not “private” or
“system” using the lssrc -ls snmpd command.
Monitoring an HACMP Cluster
Monitoring a Cluster with HAView
270 Administration Guide
10
Clinfo will also get the SNMP Community Name in the same manner.
Clinfo will still support the -c option for specifying SNMP
Community Name but its use is not required. The use of the -c option
is considered a security risk because doing a ps command could find
the SNMP Community Name. If it is important to keep the SNMP
Community Name protected, change permissions on
/tmp/hacmp.out, /etc/snmpd.conf, /smit.log and
/usr/tmp/snmpd.log to not be world-readable.
See the AIX documentation for full information on the snmpd.conf file. Version 3 has
some differences from Version 1.
Tivoli NetView Hostname Requirements for HAView
The following hostname requirements apply to using HAView in a Tivoli NetView
environment. If you change the hostname of a network interface, the Tivoli NetView database
daemons and the default map are affected.
Hostname Effect on the Tivoli NetView Daemon
The hostname required to start Tivoli NetView daemons must be associated with a valid
interface name or else Tivoli NetView fails to start.
Hostname Effect on the Tivoli NetView Default Map
If you change the hostname of the Tivoli NetView client, the new hostname does not match the
original hostname referenced in the Tivoli NetView default map database and Tivoli NetView
will not open the default map. Using the Tivoli NetView mapadmin command, you need to
update the default map (or an invalid map) to match the new hostname.
See the Tivoli NetView Administrator’s Guide for more information about updating or deleting
an invalid Tivoli NetView map.
Starting HAView
Once you have installed the HAView client and server, HAView is started and stopped when
you start or stop Tivoli NetView. However, before starting Tivoli NetView/HAView, check the
management node as follows:
• Make sure both client and server components of HAView are installed. See the installation
or migration chapters in the Installation Guide for more information.
• Make sure access control has been granted to remote nodes by running the xhost command
with the plus sign (+) or with specified nodes:
xhost + (to grant access to all computers)
or, to grant access to specific nodes only:
xhost
• Make sure the DISPLAY variable has been set to the monitoring node and to a label that
can be resolved by and contacted from remote nodes:
export DISPLAY=:0.0
Monitoring an HACMP Cluster
Monitoring a Cluster with HAView
Administration Guide 271
10
These actions allow you to access HACMP SMIT panels using the HAView Cluster
Administration option.
After ensuring these conditions are set, type the following to start Tivoli NetView:
/usr/OV/bin/nv6000
Refer to the Tivoli NetView User’s Guide for Beginners for further instructions about starting
Tivoli NetView.
When Tivoli NetView starts, HAView creates objects and symbols to represent a cluster and its
components. Through submaps, you can view detailed information about these components.
HAView places the Clusters symbol (shown below) on the Tivoli NetView map after Tivoli
NetView starts. The Clusters symbol is added to the Netview Root map and is placed alongside
the Tivoli NetView Collections symbol and other symbols:
HAView Clusters Symbol
Viewing Clusters and Components
To see which clusters HAView is currently monitoring, double-click the Clusters symbol. The
Clusters submap appears. You may see one or more symbols that represent specific clusters.
Each symbol is identified by a label indicating the cluster’s name. Double-click a cluster
symbol to display symbols for nodes, networks, and resource groups within that cluster.
Note that the cluster status symbol may remain unknown until the next polling cycle, even
though the status of its cluster components is known. See Customizing HAView Polling
Intervals for more information about the default intervals and how to change them using SMIT.
You can view component details at any time using the shortcut ctrl-o. See Obtaining
Component Details in HAView for information and instructions.
Read-Write and Read-Only NetView Maps
Normally, you have one master monitoring station for Tivoli NetView/HAView. This station
is supplied with new information as cluster events occur, and its map is updated so it always
reflects the current cluster status.
In normal cluster monitoring operations, you will probably not need to open multiple Tivoli
NetView stations on the same node. If you do, and you want the additional stations to be
updated with current cluster status information, you must be sure they use separate maps with
different map names. For more information on multiple maps and changingmap permissions,
see the Tivoli NetView Administrator’s Guide.

No comments: