Wednesday, June 11, 2008

Maintaining an HACMP Cluster

The following maintenance tasks for an HACMP system are described in detail in subsequent
chapters:
• Starting and Stopping Cluster Services
• Maintaining Shared Logical Volume Manager Components
• Managing the Cluster Topology
• Managing Cluster Resources
• Managing Cluster Resource Groups
• Managing Users and Groups in a Cluster
• Managing Cluster Security and Inter-Node Communications
Administering an HACMP Cluster
Maintaining an HACMP Cluster
Administration Guide 25
1
• Understanding the /usr/es/sbin/cluster/etc/rhosts File
• Saving and Restoring HACMP Cluster Configurations
• Additional HACMP Maintenance Tasks.
Starting and Stopping Cluster Services
Various methods for starting and stopping cluster services are available. Chapter 9: Starting
and Stopping Cluster Services describes how to start and stop HACMP on server and client
nodes.
Maintaining Shared Logical Volume Manager Components
Any changes to logical volume components must be synchronized across all nodes in the
cluster. Chapter 11: Managing Shared LVM Components, and Chapter 12: Managing
Shared LVM Components in a Concurrent Access Environment describe how to maintain
cluster LVM components. Using C-SPOC (the Cluster Single Point of Control) to configure the
cluster components on one node and then synchronize the cluster saves you time and effort.
Managing the Cluster Topology
Any changes to cluster topology require updating the cluster across all nodes. Chapter 13:
Managing the Cluster Topology describes how to modify cluster topology after the initial
configuration. You can make most changes on one node and then synchronize the cluster.
This chapter also includes information about the HACMP Communication Interface
Management SMIT menu that lets you configure communication interfaces/devices to AIX 5L
without leaving HACMP SMIT.
Managing Cluster Resources
Any changes to cluster resources require updating the cluster across all nodes. You can make
most changes on one node and then synchronize the cluster. Chapter 14: Managing the Cluster
Resources describes how to modify cluster resources after the initial configuration.
Managing Cluster Resource Groups
Chapter 15: Managing Resource Groups in a Cluster describes how to modify cluster
resource groups after the initial configuration. You can add or delete resources and change the
runtime policies of resource groups.
You can dynamically migrate resource groups to other nodes and take them online or offline,
using the Resource Group Management utility (clRGmove) from the command line or through
SMIT.
Managing Users and Groups in a Cluster
HACMP lets you manage user accounts for a cluster from a Single Point of Control (C-SPOC).
Use C-SPOC to create, change, or remove users and groups from all cluster nodes by executing
a C-SPOC command on any single cluster node.
For information, see Chapter 16: Managing User and Groups.
Administering an HACMP Cluster
Maintaining an HACMP Cluster
26 Administration Guide
1
Managing Cluster Security and Inter-Node Communications
You can protect access to your HACMP cluster by setting up security for cluster
communications between nodes. HACMP provides security for connections between nodes,
with higher levels of security for inter-node communications provided through Kerberos (on SP
nodes only) or through virtual private networks (VPN). In addition, you can configure
authentication and encryption of the messages sent between nodes.
For information, see Chapter 17: Managing Cluster Security.
Understanding the /usr/es/sbin/cluster/etc/rhosts File
This section explains how and when HACMP uses the /usr/es/sbin/cluster/etc/rhosts file,
which HACMP uses for inter-node communications. It also describes how this file relates to the
~/.rhosts file.
The /usr/es/sbin/cluster/etc/rhosts file
A Cluster Communications daemon (clcomd) runs on each HACMP node to transparently
manage inter-node communications for HACMP. In other words, HACMP manages
connections for you automatically:
If the /usr/es/sbin/cluster/etc/rhosts file is empty (this is the initial state of this file, upon
installation), then clcomd accepts the first connection from another node and adds entries
to the /etc/rhosts file. Since this file is empty upon installation, the first connection from
another node adds IP addresses to this file. The first connection usually is performed for
verification and synchronization purposes, and this way, for all subsequent connections,
HACMP already has entries for node connection addresses in its Configuration Database.
clcomd validates the addresses of the incoming connections to ensure that they are received
from a node in the cluster. The rules for validation are based on the presence and contents
of the /usr/es/sbin/cluster/etc/rhosts file.
• In addition, HACMP includes in the /usr/es/sbin/cluster/etc/rhosts file the addresses for
all network interface cards from the communicating nodes.
• If the /usr/es/sbin/cluster/etc/rhosts file is not empty, then clcomd compares the incoming
address with the addresses/labels found in the HACMP Configuration Database (ODM)
and then in the /usr/es/sbin/cluster/etc/rhosts file and allows only listed connections. In
other words, after installation, HACMP accepts connections from another HACMP node
and adds the incoming address(es) to the local file, thus allowing you to configure the
cluster without ever editing the file directly.
• If the /usr/es/sbin/cluster/etc/rhosts file is not present, clcomd rejects all connections
Typically, you do not manually add entries to the /usr/es/sbin/cluster/etc/rhosts file unless you
have specific security needs or concerns.
If you are especially concerned about network security (for instance, you are configuring a
cluster on an unsecured network), then prior to configuring the cluster, you may wish to
manually add all the IP addresses/labels for the nodes to the empty
/usr/es/sbin/cluster/etc/rhosts file. For information on how to do it, see Manually
Configuring /usr/es/sbin/cluster/etc/rhosts file on Individual Nodes in Chapter 17:
Managing Cluster Security.
Administering an HACMP Cluster
Monitoring the Cluster
Administration Guide 27
1
After you synchronize the cluster, you can empty the /usr/es/sbin/cluster/etc/rhosts file (but
not remove it), because the information present in the HACMP Configuration Database would
be sufficient for all future connections.
If the configuration for AIX 5L adapters was changed after the cluster has been synchronized,
HACMP may issue an error. See the section Troubleshooting the Cluster Communications
Daemon, or the Troubleshooting Guide for information on refreshing the clcomd utility and
updating /usr/es/sbin/cluster/etc/rhosts.
The ~/.rhosts File
~/.rhosts is only needed during the migration from pre-5.1 versions of HACMP. Once
migration is completed, we recommend removing ~/.rhosts, if no other applications need rsh
for inter-node communication.
Saving and Restoring HACMP Cluster Configurations
After you configure the topology and resources of a cluster, you can save the cluster
configuration by taking the cluster snapshot. This saved configuration can later be used to
restore the configuration if this is needed by applying the cluster snapshot. A cluster snapshot
can also be applied to an active cluster to dynamically reconfigure the cluster. Chapter 18:
Saving and Restoring Cluster Configurations describes how to use the Cluster Snapshot
utility.
Additional HACMP Maintenance Tasks
Additional tasks that you can perform to maintain an HACMP system include changing the log
file attributes for a node and performance tuning. For information on these tasks, see the section
on Troubleshooting an HACMP Cluster in this chapter.
Monitoring the Cluster
By design, failures of components in the cluster are handled automatically, but you need to be
aware of all such events. Chapter 10: Monitoring an HACMP Cluster describes various tools
you can use to check the status of an HACMP cluster, the nodes, networks, and resource groups
within that cluster, and the daemons that run on the nodes.
The HACMP software includes the Cluster Information Program (Clinfo), based on SNMP.
The HACMP for AIX software provides the HACMP for AIX 5L MIB, associated with and
maintained by HACMP. Clinfo retrieves this information from the HACMP for
AIX Management Information Base (MIB).
The Cluster Manager gathers information relative to cluster state changes of nodes and
interfaces. The Cluster Information Program (Clinfo) gets this information from the Cluster
Manager and allows clients communicating with Clinfo to be aware of a cluster’s state changes.
This cluster state information is stored in the HACMP MIB.
Clinfo runs on cluster server nodes and on HACMP client machines. It makes information
about the state of an HACMP cluster and its components available to clients and applications
via an application programming interface (API). Clinfo and its associated APIs enable you to
write applications that recognize and respond to changes within a cluster.
Administering an HACMP Cluster
Troubleshooting an HACMP Cluster
28 Administration Guide
1
The Clinfo program, the HACMP MIB, and the APIs are described in the Programming Client
Applications Guide.
Although the combination of HACMP and the high availability features built into the AIX 5L
system keeps single points of failure to a minimum, there are still failures that, although
detected, can cause other problems.
For suggestions on customizing error notification for various problems not handled by the
HACMP events, the Planning Guide.

No comments: