Suresh's passion for AIX: April 2008

Wednesday, April 2, 2008

Monitoring and Troubleshooting a Cluster

This chapter presents general information for monitoring and troubleshooting an HACMP for Linux configuration.
This chapter contains the following sections:
•Problem Determination Tools
•Viewing Cluster Information (clstat) in WebSMIT
•Useful Commands
•Logging Messages
•Solving Common Problems with Networks and Applications.
Problem Determination Tools
WebSMIT Problem Determination Tools menu has a set of tools for troubleshooting and recovering from problems that may arise in a cluster environment.
The Problem Determination Tools panel in WebSMIT includes:
•View Current State. WebSMIT displays cluster information using a slightly different layout and organization. Cluster components are displayed along their status. Expanding the item reveals additional information about it, including network, interfaces and active resource groups.
•HACMP Log Viewing and Management. Contains utilities that display or manage logs maintained by HACMP. These include the log file named hacmp.out, which keeps a record of all of the local cluster events as performed by the HACMP event scripts. These HACMP event scripts automate many common system administration tasks, and, in the event of a failure, will manage HACMP and system resource to provide recovery.
•Recover From HACMP Script Failure. Contains a command that HACMP will run to recover from a script failure. This is useful if the Cluster Manager is in reconfiguration due to a failed event script. Use this option after having manually fixed the error condition.
•Restore HACMP Configuration Database from Active Configuration.
Viewing Cluster Information (clstat) in WebSMIT
With HACMP 5.4.1, you can use WebSMIT to:
•Display detailed cluster information
•Navigate and view the status of the running cluster
•Configure and manage the cluster
•View graphical displays of sites, networks, nodes and resource group dependencies.
Useful Commands
You have these additional utilities:
•To view the resource group location and status, use the clRGinfo command.
•To view the service IP label information, run the ifconfig command on the node that currently owns the resource group.
For a list of commands supported in HACMP for Linux, see Command Reference in Appendix A: Command Reference and the clinfo Utility.
Logging Messages
HACMP for Linux uses the standard logging facilities for HACMP. For information about logging in HACMP, see the HACMP for AIX Troubleshooting Guide.
To troubleshoot the HACMP operations in your cluster, use the event summaries in the hacmp.out file and syslog.
The system logs messages into the following files:
•/tmp/clstrmgr.debug
•/tmp/cspoc.log
•/tmp/clappmond
•/tmp/hacmp.out
•/usr/es/adm/cluster.log
•/var/hacmp/clcomd/clcomd.log
•/var/hacmp/clcomd/clcomddiag.log
•/var/hacmp/log/clutils.log
•/usr/es/sbin/cluster/wsm/logs/wsm_smit.*
•/websmit/logs/wsm_smit.*
•/usr/es/sbin/cluster/snapshots/*
Collecting Cluster Log Files for Problem Reporting
To view the system files and log files as they are collected in an archive file:
1.In WebSMIT, go to the Collect Cluster log files for Problem Reporting menu.
2.Type or select values in entry fields.
3.Use an appropriate Linux tool to extract or view the archive file. The archive file contains the log and system files.

Resetting Cluster Tunables

You can change the settings for a list of tunable values that were altered during cluster maintenance and reset them to their default settings, or installation-time cluster settings. The installation-time cluster settings are equal to the values that appear in the cluster after installing HACMP from scratch.
Note:Resetting the tunable values does not change any other aspects of the configuration, while installing HACMP removes all user-configured configuration information including nodes, networks, and resources.
To reset the cluster tunable values:
1.Stop the cluster services.
2.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
3.In WebSMIT, select Extended Configuration > Extended Topology Configuration > Configure an HACMP Cluster > Reset Cluster Tunables and press Continue.
Use this option to reset all the tunables (customizations) made to the cluster. For a list of the tunable values that will change, see the section Listing Tunable Values. Using this option returns all tunable values to their default values but does not change the cluster configuration. HACMP takes a snapshot file before resetting. You can choose to have HACMP synchronize the cluster when this operation is complete.
4.Select the options as follows and press Continue:
Synchronize Cluster Configuration
If you set this option to yes, HACMP synchronizes the cluster after resetting the cluster tunables.
5.HACMP asks: “Are you sure?”
6.Press Continue.
HACMP resets all the tunable values to their original settings and removes those that should be removed (such as the nodes’ knowledge about customized pre- and post-event scripts).
Resetting HACMP Tunable Values using the Command Line
We recommend that you use the SMIT interface to reset the cluster tunable values. The clsnapshot -t command also resets the cluster tunables. This command is intended for use by IBM support. See the man page for more information.
Listing Tunable Values
You can change and reset the following list of tunable values:
•User-supplied information.
•Network module tuning parameters, such as, failure detection rate, grace period and heartbeat rate. HACMP resets these parameters to their installation-time default values.
•Cluster event customizations, such as, all changes to cluster events. Note that resetting changes to cluster events does not remove any files or scripts that the customization use; it only removes the knowledge HACMP has of pre- and post-event scripts.
•Cluster event rule changes made to the event rules database are reset to the installation-time default values.
•HACMP command customizations made to the default set of HACMP commands are reset to the installation-time defaults.
•Automatically generated and discovered information.
Generally users cannot see this information. HACMP rediscovers or regenerates this information when the cluster services are restarted or during the next cluster synchronization.
HACMP resets the following:
•Local node names stored in the cluster definition database
•Netmasks for all cluster networks
•Netmasks, interface names and aliases for disk heartbeating (if configured) for all cluster interfaces
•SP switch information generated during the latest node_up event (this information is regenerated at the next node_up event)
•Instance numbers and default log sizes for the RSCT subsystem.
Understanding How HACMP Resets Cluster Tunables
HACMP resets tunable values to their default values under the following conditions:
•Before resetting HACMP tunable values, HACMP takes a cluster snapshot. After the values have been reset to defaults, if you want to go back to your customized cluster settings, you can restore them with the cluster snapshot. HACMP saves snapshots of the last ten configurations in the default cluster snapshot directory, /usr/es/sbin/cluster/snapshots, with the name active.x.odm, where x is a digit between 0 and 9, with 0 being the most recent.
•Stop cluster services on all nodes before resetting tunable values. HACMP prevents you from resetting tunable values in a running cluster.
In some cases, HACMP cannot differentiate between user-configured information and discovered information, and does not reset such values. For example, you may enter a service label and HACMP automatically discovers the IP address that corresponds to that label. In this case, HACMP does not reset the service label or the IP address. The cluster verification utility detects if these values do not match.
The clsnapshot.log file in the snapshot directory contains log messages for this utility. If any of the following scenarios are run, then HACMP cannot revert to the previous configuration:
•cl_convert is run automatically
•cl_convert is run manually

System Management (C-SPOC) Tasks

Use the System Management (C-SPOC) panel in WebSMIT to configure from one node the resources that are shared among nodes. System Management utility of HACMP lets you administer many aspects of the cluster and its components from one Cluster Single Point of Control (C-SPOC). By automating repetitive tasks, C-SPOC eliminates a potential source of errors, and speeds up the cluster maintenance process.
In WebSMIT, you access C-SPOC using the System Management (C-SPOC) menu.
In this panel, you can do the following tasks from one node:
•Manage HACMP services, or start and stop cluster services: Cluster Manager (clstrmgr) and Cluster Information (clinfo).
•HACMP Communication Interface Management. Manage the communication interfaces of existing cluster nodes using C-SPOC.
•HACMP Resource Group and Application Management Provides menus to manage cluster resource groups and analyze cluster applications.
•HACMP Log Viewing and Management. Manage, view, and collect HACMP log files and event summaries.
Starting HACMP Cluster Services
To start HACMP cluster services:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select System Management (C-SPOC) > Manage HACMP Services > Start HACMP Services and press Continue.
For detailed instructions, see the HACMP on AIX Administration Guide.
Stopping HACMP Cluster Services
To stop HACMP cluster services:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select System Management (C-SPOC) > Manage HACMP Services > Start HACMP Services and press Continue.
For detailed instructions, see the HACMP on AIX Administration Guide.
Managing Resource Groups and Applications
To manage resource groups and applications:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select System Management (C-SPOC) > HACMP Resource Group and Application Management and press Continue.
Viewing and Managing Logs
To view and manage logs:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select System Management (C-SPOC) > HACMP Log Viewing and Management and press Continue.
For detailed instructions, see the HACMP on AIX Administration Guide.

Viewing the Cluster Status

HACMP has a cluster status utility, the /usr/es/sbin/cluster/clstat. It reports the status of key cluster components—the cluster itself, the nodes in the cluster, the network interfaces connected to the nodes, and the resource groups on each node.
clstat is available in WebSMIT at the left side of the top-level menu. It displays an expandable list of cluster components along with their status. The cluster status display window shows information and status (up or down, online, offline or error) on cluster nodes, networks, interfaces, application servers and resource groups. For resource groups, it also shows the node on which the group is currently hosted.
Here is an example of the clstat output in WebSMIT. This is the left-hand side panel of the window:

Figure 2. clstat Output
Here is an example of the ASCII-based output from the clstat command, used on a Linux cluster with nodes named ppstest1 and ppstest2:
ppstest2:~ # /usr/es/sbin/cluster/clstat
clstat - HACMP Cluster Status Monitor
-------------------------------------
Cluster: test1234 (1148058900)
Wed May 17 16:45:41 2006
State: UP Nodes: 4
SubState: STABLE
Node: ppstest1 State: UP
Interface: tr0 (6) Address: 9.57.28.3
State: UP
Resource Group: rg1 State: On line
Node: ppstest2 State: UP
Interface: tr0 (6) Address: 9.57.28.4
State: UP
Resource Group: rg2 State: On line
Node: ppstest3 State: UP
Interface: tr0 (6) Address: 9.57.28.5
State: UP
Node: ppstest4 State: UP
Interface: tr0 (6) Address: 9.57.28.6
State: UP
Resource Group: rg3 State: On line
Resource Group: rg4 State: On line

Configuring HACMP Application Servers

To configure an application server on any cluster node:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select Extended Configuration > Extended Resource Configuration > HACMP Extended Resources Configuration > Configure HACMP Applications > Configure HACMP Application Servers > Add an Application Server and press Continue.
WebSMIT displays the Add an Application Server panel.
3.Enter field values as follows:
Server Name
Enter an ASCII text string that identifies the server. You will use this name to refer to the application server when you define resources during node configuration. The server name can include alphabetic and numeric characters and underscores. Use no more than 64 characters.
Start Script
Enter the pathname of the script (followed by arguments) called by the cluster event scripts to start the application server. (Maximum 256 characters.) This script must be in the same location on each cluster node that might start the server. The contents of the script, however, may differ.
Stop Script
Enter the pathname of the script called by the cluster event scripts to stop the server. (Maximum 256 characters.) This script must be in the same location on each cluster node that may start the server. The contents of the script, however, may differ.
4.Press Continue to add this information to the HACMP Configuration Database on the local node.
5.Add the application start, stop and notification scripts to every node in the cluster.
Verifying Application Servers
Make sure that the application start, stop and notification scripts exist and are executable on every node in the cluster. Use the cllsserv command.
For example:
ppstest2:~ # /usr/es/sbin/cluster/utilities/cllsserv
app_test2_primary /usr/local/app_start /usr/local/app_stop
ppstest2:~ # ls -l /usr/local/app_start
-rwxr--r-- 1 root root 169 May 10 22:54 /usr/local/app_start
Configuring Application Monitors
Once you configured application servers, HACMP for Linux lets you have application monitors that will check the health of the running application process, or check for the successful start of the application.
To configure application monitors:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select Extended Configuration > Extended Resource Configuration > HACMP Extended Resources Configuration > Configure HACMP Applications > Configure HACMP Application Monitoring and press Continue. A selector screen appears for Configure Process Application Monitoring and Configure Custom Application Monitoring.
3.Select the type of monitoring you want and press Continue.
4.Select the application server to which you want to add a monitor.
5.Fill in the field values and press Continue.
For additional reference information on application monitoring, its types, modes, and other information, see the HACMP for AIX Administration Guide.
Including Resources into Resource Groups
Once you configure resources to HACMP, you include them in resource groups so that HACMP can manage them as a single set. For example, if an application depends on the service IP label, you can add it to a single resource group.
HACMP manages the resources in a resource group by bringing the resource groups online and offline on their home node(s), or moving them to other nodes, if necessary for recovery.
Note:For detailed instructions on resource groups, see the HACMP for AIX Administration Guide. This guide contains descriptions of procedures in HACMP SMIT, and the options are identical to those used in WebSMIT in HACMP for Linux.
Resource Group Management: Overview
In the Extended Configuration > Extended Resource Configuration > HACMP Extended Resource Group Configuration WebSMIT screen, you can:
•Add a resource group.
•Change/Show a resource group. The system displays all previously defined resource groups. After selecting a particular resource group, you can view and change the group name, node relationship, and participating nodes (nodelist). You can also change the group’s startup, fallover and fallback policies.
•Remove a resource group.
•Change/Show resources for a resource group. Add resources, such as a service IP label for the application, or an application server, to a resource group. HACMP always activates and brings offline these resources on a particular node as a single set. If you want HACMP to activate one set of resources on one node and another set of resources on another node, create separate resource groups for each set.
•Show all resources by node for a resource group.
HACMP for Linux does not allow to change resources dynamically, that is, when HACMP cluster services are running on the nodes. To change the previously added resources, stop the cluster services.
Adding Resources to a Resource Group
To include resources into a resource group:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select Extended Configuration > Extended Resource Configuration > HACMP Extended Resources Configuration > Change/Show All Resources and Attributes for a Resource Group and press Continue.
3.Fill in the field values and press Continue. HACMP adds the resources.
For additional information on adding or changing resources in resource groups, and for information on other resource group management tasks, see the Administration Guide.
Synchronizing the HACMP Cluster Configuration
We recommend that you do all the configuration from one node and synchronize the cluster to propagate this information to other nodes.
Use this WebSMIT option to commit and distribute your changes automatically to all of the specified nodes.
To synchronize an HACMP cluster configuration:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select Extended Configuration > Extended Verification and Synchronization and press Continue.
If you configured the cluster correctly, HACMP synchronizes the configuration. HACMP issues errors if the configuration is not valid.
Displaying the HACMP Cluster Configuration
You can ask HACMP to show you the status of different configured components. The WebSMIT options for displaying different cluster entities are grouped together with the options for adding them to the cluster.
Here are some examples of the options you have:
•Show HACMP Topology by node, by network name, or by communication interface
•Change/Show Persistent IP Labels
•Show Cluster Applications and change/show application monitors per application
•Change/Show Service IP Labels
•Show all Resources by Node or Resource Groups
•View cluster logs (In WebSMIT, it is under System Management > Log Viewing and Management)
•Show Cluster Services (whether running or not).

Configuring Service IP Labels

To add service IP labels/addresses as resources to the resource group in your cluster:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select Extended Configuration > Extended Resource Configuration > HACMP Extended Resources Configuration > Configure HACMP Service IP Labels/Addresses > Add a Service IP Label/Address and press Continue.
3.Fill in field values as follows:
IP Label/Address
Enter, or select from the picklist the IP label/address to be kept highly available.
Network Name
Enter the symbolic name of the HACMP network on which this Service IP label/address will be configured.
4.Press Continue after filling in all required fields.
5.Repeat the previous steps until you have configured all service IP labels/addresses for each network, as needed.

WebSMIT Tasks Overview

The main WebSMIT menu in HACMP for Linux contains the following menu items and tabs:
•Extended Configuration to configure your cluster.
•System Management (C-SPOC). C-SPOC (Cluster Single Point of Control) is an HACMP function that lets you run HACMP cluster-wide configuration commands from one node in the cluster. In HACMP for Linux, you can use System Management (C-SPOC) to start and stop the cluster services and to move, bring online and bring offline resource groups.
•Problem Determination Tools. You can customize cluster verification, view current cluster state, view logs, recover from a cluster event failure, configure error notification methods and perform other troubleshooting tasks.
•HACMP Documentation. This is the top-level tab that contains a page with links to all online and printable versions of HACMP documentation, including this guide.
Here is the top-level HACMP for Linux WebSMIT menu:

Tasks for Configuring a Basic Cluster

You configure an HACMP for Linux cluster using the Extended Configuration path in WebSMIT.
Note:In general, the sections in this guide provide a high-level overview of user interface options. See the HACMP for AIX Administration Guide for detailed procedures, field help, and recommendations for configuring each and every HACMP component.
To configure a basic cluster:
1.On one cluster node, configure a cluster name and add cluster nodes. See:
•Defining a Cluster Name
•Adding Nodes and Establishing Communication Paths
2.On each cluster node, configure all supporting networks and interfaces: serial networks for heartbeating and IP-based cluster networks for cluster communication.
Also, configure communication devices (that you must have previously defined to the operating system) to HACMP. Configure boot network interfaces (that you must have previously defined to the operating system) to HACMP. Also, configure persistent IP labels for cluster administration purposes. See:
•Configuring Serial Networks for Heartbeating
•Adding IP-Based Networks
•Configuring Communication Interfaces/Devices to HACMP
•Adding Persistent IP Labels for Cluster Administration Purposes
3.On one cluster node, configure cluster resources that will be associated with the application: service IP labels, application servers and application monitors. See:
•Configuring Resources to Make Highly Available
•Configuring Service IP Labels
•Configuring Application Servers
•Configuring Application Monitors
4.On one cluster node, include resources into resource groups. See Including Resources into Resource Groups.
5.Synchronize the cluster configuration. See Synchronizing the HACMP Cluster Configuration.
6.View the HACMP cluster configuration. See Displaying the HACMP Cluster Configuration.
7.Start the HACMP for Linux cluster services on the cluster nodes. When you do so, HACMP will activate the resource group with the application, and will start monitoring it for high availability. See Starting HACMP Cluster Services.
Defining a Cluster Name
Before starting to configure a cluster:
•Make sure that you added all necessary entries to the /etc/hosts file on each machine that will serve as a cluster node. See Planning IP Networks and Network Interfaces.
•Make sure that WebSMIT is installed and can be started on one of the nodes. See Installing and Configuring WebSMIT.
•Log in to WebSMIT. See Starting WebSMIT.
The only step necessary to configure a cluster is to assign the cluster name. When you assign a name to your cluster in WebSMIT, HACMP associates this name with the HACMP-assigned cluster ID.
To assign a cluster name and configure a cluster:
1.Log in to a URL where WebSMIT is installed. The browser window displays the top-level WebSMIT screen.
2.In WebSMIT, select Extended Configuration > Extended Topology Configuration > Configure an HACMP Cluster > Add/Change/Show an HACMP Cluster and press Continue.
3.Enter field values as follows:
Cluster Name
Enter an ASCII text string that identifies the cluster. The cluster name can include alphanumeric characters and underscores, but cannot have a leading numeric. Use no more than 32 characters. Do not use reserved names. For a list of reserved names see List of Reserved Words.
4.Press Continue. If you are changing an existing cluster name, restart HACMP for changes to take effect.

Understanding Cluster Network Requirements and Heartbeating

To avoid a single point of failure, the cluster should have more than one network configured. Often the cluster has both IP and non-IP based networks, which allows HACMP to use different heartbeat paths. Use the Add a Network to the HACMP Cluster WebSMIT panel to configure HACMP IP and point-to-point networks.

You can use any or all of these methods for heartbeat paths:
•Point-to-point networks
•IP-based networks, including heartbeating using IP aliases.
Launching the WebSMIT Interface
Use WebSMIT to:
•Navigate the running cluster.
•View and customize graphical displays of networks, nodes and resource group dependencies.
•View the status of any connected node (with HACMP cluster services running on the nodes).
Starting WebSMIT
For instructions on integrating WebSMIT with your Apache server, and for launching WebSMIT, see the /usr/es/sbin/cluster/wsm/README readme file. It contains sample post-install scripts with variables. Each variable is commented with an explanation of its purpose along with the possible values. You can modify the values of the variables to influence the script behavior.
To start WebSMIT:
1.Using a web browser, navigate to the secure URL of your cluster node, for instance enter the URL similar to the following:
https://..com:42267
The 42267 is the name of the port for HACMP for Linux. The entry is optional, it is only necessary if you are logging in to a server that is not part of your local network. The system asks you to log in.
2.Log in to the system and press Continue. WebSMIT starts.

HACMP - Installation Process Overview

Install the HACMP for Linux software on each cluster node (server). Perform the installation process as the root user.
Installing HACMP for Linux RPMs
Before you install, ensure that you have installed all the prerequisites for the installation. See Software Prerequisites for Installation.
To install HACMP for Linux:
1.Insert the HACMP for Linux CD ROM and install the hacmp.license.rpm RPM:
rpm -ivh hacmp.license.rpm
This RPM provides a utility that lets you accept the License Agreement for HACMP for Linux v.5.4.1, and complete the installation.
Note:You can install the HACMP for Linux documentation without accepting the License Agreement.
2.Run the HACMP installation script /usr/es/sbin/cluster/install_hacmp.
This script has two options:
-y
Lets you automatically accept the License Agreement. By specifying this flag you agree to the terms and conditions of the License Agreement and will not be prompted.
-d
Lets you specify an alternate path to the RPMs for installation, if you are not installing directly from the CD-ROM.
The /usr/es/sbin/cluster/install_hacmp installation script launches the License Agreement Program (LAP) and the License Agreement acceptance dialog appears.
3.Read and accept the License Agreement. The software places a key on your system to identify that you accepted the license agreement.
You can also accept the license without installing the rest of the filesets by running the /usr/es/sbin/cluster/hacmp_license script. You can then use the RPM tool to install the remaining RPMs.
The usr/es/sbin/cluster/install_hacmp installation script checks for the following prerequisites for the HACMP for Linux software:
•rsct.basic-x.x.x.x
•rsct.core-x.x.x.x.
•rsct.core.utils-x.x.x.x
•An appropriate version of ksh93 (such as ksh-20050202-1.ppc.rpm)
•Perl 5 (an RSCT prerequisite. perl-5.8.3 is installed with RHEL)
•src-1.3.0.1 (an RSCT prerequisite)
4.Check the required RSCT levels in the HACMP for Linux v.5.4.1 Release Notes, or in the section Software Prerequisites for Installation.

You can install HACMP for Linux when prerequisites are already installed, or together with the prerequisites.
The usr/es/sbin/cluster/install_hacmp installation script runs the rpm command to install HACMP for Linux RPMs:
rpm -ivh hacmp.*
5.Verify the installed cluster software. Verify that the RPMs have correct version numbers and other specific information. The RPMs cannot be installed when prerequisites are not installed.
6.Configure WebSMIT. See /usr/es/sbin/cluster/wsm/README for information, as well as the section Installing and Configuring WebSMIT in this chapter.
7.Read the HACMP for Linux v. 5.4.1 Release Notes /usr/es/sbin/cluster/release_notes.linux, for information that does not appear in the product documentation.
Note:You can manually install all RPMs without using the install_hacmp script.
Installing and Configuring WebSMIT
WebSMIT is a Web-based user interface that provides consolidated access to all functions of HACMP configuration and management, interactive cluster status, and the HACMP documentation.
WebSMIT is:
•Supported on Mozilla-based browsers (Mozilla 1.7.3 for AIX and FireFox 1.5.0.2),
•Supported on Internet Explorer, versions 5.0, 5.5 and 6.0.
•Requires that JavaScript is enabled in your browser.
•Requires network access between the browser and the cluster node that serves as a Web server. To run WebSMIT on a node, you must ensure HTTP(S)/SSL connectivity to that node; it is not handled automatically by WebSMIT or HACMP.
To launch the WebSMIT interface:
1.Configure and run a Web server process, such as Apache server, on the cluster node(s) to be administered.
2.See the /usr/es/sbin/cluster/wsm/README file for information on basic Web server configuration, the default security mechanisms in place when installing HACMP, and the configuration files available for customization.
You can run WebSMIT on a single node. Note that WebSMIT will be unavailable if a node failure occurs. To provide better availability, you can setup WebSMIT to run on multiple nodes. Since WebSMIT is retrieving and updating information from the HACMP cluster, that information should be available from all nodes in the cluster.
Typically, you set up WebSMIT to be accessible from the cluster’s internal network that is not reachable from the Internet.
Since the WebSMIT interface runs in a Web browser, you can access it from any platform. For information on WebSMIT security, see Security Considerations.

For more information about installing WebSMIT, see the section Installing and Configuring WebSMIT in the HACMP for AIX Installation Guide.
Integration of WebSMIT with the Apache Server on Different Linux Distributions
The WebSMIT readme file /usr/es/sbin/cluster/wsm/README contains different template files and instructions to enable you to handle variations in packaging, when integrating WebSMIT with the Apache server on different Linux distributions.
Verifying the Installed Cluster Software
After the HACMP for Linux software is installed on all nodes, verify the configuration. Use the verification functions of the RPM utility: your goal is to ensure that the cluster software is the same on all nodes.
Verify that the information returned by the rpm command is accurate:
rpm -qi hacmp.server
rpm -qi hacmp.client
rpm -qi hacmp.license
rpm -qi hacmp.doc.html
rpm -qi hacmp.doc.pdf
Each command should return information about each RPM. In particular, the Name, Version, Vendor, Summary and Description fields should contain appropriate information about each package.
HACMP modifies different system files during the installation process (such as /etc/inittab, /etc/services, and others). To view the details of the installation process, see the log file file/var/hacmp/log/ hacmp.install.log..
Example of the Installation Using RPM
Here is an example of the installation using rpm:
# rpm -iv hacmp*
Preparing packages for installation...
Cluster services are not active on this node.
hacmp.client-5.4.1.0-06128
Cluster services are not active on this node.
hacmp.server-5.4.1.0-06128
May 8 2006 22:26:18 Starting execution of /usr/es/sbin/cluster/etc/rc.init
with parameters:
May 8 2006 22:26:18 Completed execution of /usr/es/sbin/cluster/etc/rc.init
with parameters: .
Exit status = 0
Installation of HACMP for Linux is complete.
After installation, use the rpm command to view the information about the installed product:
ppstest3:~ # rpm -qa | grep hacmp
hacmp.server-5.4.1.0-06128
hacmp.client-5.4.1.0-06128
ppstest3:~ # rpm -qi hacmp.server-5.4.1.0-06128
Name : hacmp.server Relocations: (not relocatable)
Version : 5.4.1.0 Vendor: IBM Corp.

Release : 06128 Build Date: Mon May 8 21:21:09 2006
Install date: Tue May 9 13:03:20 2006 Build Host: bldlnx18.ppd.pok.ibm.com
Group : System Environment/Base Source RPM: hacmp.server-5.4.1.0-06128.nosrc.rpm
Size : 48627953 License: IBM Corp.
Signature : (none)
Packager : IBM Corp.
URL : http://www.ibm.com/systems/p/ha/
Summary : High Availability Cluster Multi-Processing - server part
Description :
hacmp.server provides the server side functions for HACMP.
Service information for this package can be found at
http://techsupport.services.ibm.com/server/cluster
Product ID 5765-G71
Distribution: (none)
Entries Added to System Directories after Installation
After you install HACMP for Linux, the installation process adds the following lines to the /usr/es/sbin/cluster/etc/inittab file:
harc:2345:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1
SRC definitions are added (run lssrc -s ):
Subsystem Group
clcomdES clcomdES
clstrmgrES cluster
topsvcs topsvcs
grpsvcs grpsvcs
Addressing Problems during Installation
If you experience problems during the installation, refer to the RPM documentation for information on a cleanup process after an unsuccessful installation and other issues.
To view the details of the installation process, see the following log file:
/var/hacmp/log/ hacmp.install.log..

Contents of the Installation Media

The HACMP for Linux installation media provides the following .rpm files:
hacmp.server-5.4.1.0.ppc.rpm
High Availability Cluster Multi-Processing—server part. hacmp.server provides the server-side functions for HACMP.
hacmp.client-5.4.1.0.ppc.rpm
High Availability Cluster Multi-Processing—client part. hacmp.client provides the client-side functions for HACMP.
hacmp.license
hacmp.license-5.4.1.0.ppc.rpm
HACMP for Linux License Package. hacmp.license provides the software License Agreement functions for the HACMP for Linux software.
hacmp.doc
hacmp.doc.html-5.4.1.0.ppc.rpm
HACMP for Linux HTML documentation—U.S. English
hacmp.doc.pdf-5.4.1.0.ppc.rpm
HACMP for Linux PDF documentation—U.S. English

Planning the HACMP Configuration

Plan to have the following components in an HACMP cluster:
•An application
•Up to eight nodes
•Resource groups
•Networks.
Planning Applications
Once you put an application under HACMP’s control, HACMP starts it on the node(s) and periodically polls the application’s status, if you define application monitors. In cases of component failures, HACMP moves the application to other nodes while the process is invisible to application’s end users.
Plan to have the following for your application:
•Customized application start and stop scripts and their locations. The scripts should contain all pre- and post-processing you want HACMP to do so that it starts and stops the applications on the nodes cleanly and according to your requirements. You define these scripts as the application server in WebSMIT.
•Customized scripts you may want to use in HACMP for monitoring the application’s successful startup, and for periodically checking the application’s running process. You define these scripts to HACMP as application monitors in WebSMIT.
•If you have a complex production environment with tiered applications that require dependencies between their startup, or a staged production environment where some applications should start only if their “supporting” applications are already running, HACMP supports these configurations by letting you configure multiple types of dependencies between resource groups in WebSMIT.
To configure a working cluster that will support such dependent applications, first plan the dependencies for all the services that you want to make highly available. For examples of such planning, see the HACMP for AIX Planning Guide and Administration Guide (sections on multi-tiered applications and resource group dependencies).

In HACMP 5.4.1, you can use WebSMIT to take an application out of HACMP’s control temporarily without disrupting it, and then restart HACMP on the nodes that currently run the application.
Planning HACMP Nodes
HACMP for Linux lets you configure up to eight HACMP nodes.
For each critical application, be mindful of the resources required by the application, including its processing and data storage requirements. For example, when you plan the size of your cluster, include enough nodes to handle the processing requirements of your application after a node fails.
Keep in mind the following considerations when determining the number of cluster nodes and planning the nodes:
•An HACMP cluster can be made up of any combination of supported workstations, LPARs, and other machines. See Hardware for Cluster Nodes. Ensure that all cluster nodes do not share components that could be a single point of failure (for example, a power supply). Similarly, do not place nodes on a single rack.
•Create small clusters that consist of nodes that perform similar functions or share resources. Smaller, simple clusters are easier to design, implement, and maintain.
•For performance reasons, it may be desirable to use multiple nodes to support the same application. To provide mutual takeover services, the application must be designed in a manner that allows multiple instances of the application to run on the same node.
For example, if an application requires that the dynamic data reside in a directory called /data, chances are that the application cannot support multiple instances on the same processor. For such an application (running in a non-concurrent environment), try to partition the data so that multiple instances of the application can run—each accessing a unique database.
Furthermore, if the application supports configuration files that enable the administrator to specify that the dynamic data for instance1 of the application reside in the data1 directory, instance2 resides in the data2 directory, and so on, then multiple instances of the application are probably supported.
•In certain configurations, including additional nodes in the cluster design can increase the level of availability provided by the cluster; it also gives you more flexibility in planning node fallover and reintegration.
The most reliable cluster node configuration is to have at least one standby node.
•Choose cluster nodes that have enough I/O slots to support redundant network interface cards and disk adapters.
Ensure you have enough cluster nodes in your cluster. Although this adds to the cost of the cluster, we highly recommend to support redundant hardware, (such as enough I/O slots for network interface cards and disk adapters). This will increase the availability of your application.
•Use nodes with similar processing speed.
•Use nodes with the sufficient CPU cycles and I/O bandwidth to allow the production application to run at peak load. Remember, nodes should have enough capacity to allow HACMP to operate.
for this, benchmark or model your production application, and list the parameters of the heaviest expected loads. Then choose nodes for an HACMP cluster that will not exceed 85% busy, when running your production application.
Planning for Resource Groups in an HACMP Cluster
To make your applications highly available in an HACMP cluster, plan and configure resource groups. Resource groups must include resources related to the application, such as its start and stop script (application server) and the service IP label for the application.
Plan the following for resource groups in HACMP for Linux:
•The nodelist for the resource groups must contain all or some nodes from the cluster. These are the nodes on which you “allow” HACMP to host your application. The first node in the nodelist is the default node, or the home node for the resource group that contains the application. You define the nodelist in WebSMIT.
•You can use any set of resource group policies for a resource group startup, fallover and fallback. In WebSMIT, HACMP lets you combine only valid sets of these policies and prevents you from configuring non-working scenarios.
•HACMP for Linux supports only non-concurrent resource groups.
•HACMP for Linux does not support the fallover policy Fallover using Dynamic Node Priority policy.
•HACMP for Linux does not support cluster sites.
•If your applications are dependent on other applications, you may need to plan for dependencies between resource groups. HACMP lets you have node-collocated resource groups, resource groups that always must reside on different nodes, and also child resource groups that do not start before their parent resource groups are active (parent/child dependencies). Make a diagram of your dependent applications to better plan dependencies that you want to configure for resource groups, and then define them in WebSMIT.
•HACMP processes the resource groups in parallel by default.
•HACMP for Linux does not allow dynamic changes to the cluster resources or resource groups (also known as dynamic reconfiguration or DARE). This means that you must stop the cluster services, before changing the resource groups or their resources.
For complete planning information, see the guidelines in Chapter 6: Planning Resource Groups in the HACMP Cluster in the HACMP for AIX Planning Guide.

Resource Group Policies: Overview
HACMP allows you to configure only valid combinations of startup, fallover, and fallback behaviors for resource groups. The following table summarizes the basic startup, fallover, and fallback behaviors you can configure for resource groups in HACMP for Linux v. 5.4.1:
Startup Behavior
Fallover Behavior
Fallback Behavior
Online only on home node (first node in the nodelist)
•Fallover to next priority node in the list
•Never fall back
or
•Fall back to higher priority node in the list
Online on first available node
Any of these:
•Fallover to next priority node in the list
•Bring offline (on error node only)
•Never fall back
or
•Fall back to higher priority node in the list
Planning IP Networks and Network Interfaces
Plan to configure the following networks and IP interfaces:
•A heartbeating IP-based network. An HACMP cluster requires at least one network that will be used for the cluster heartbeating traffic.
•A heartbeating serial network, such as RS232.
•An IP-based network that lets you connect from the application’s client machine to the nodes. The nodes serve as the application’s servers and run HACMP. To configure this network, plan to configure a client machine with a network adapter and a NIC compatible with at least one of the networks configured on the cluster nodes.
•Two HACMP cluster networks. These are TCP/IP-based networks used by HACMP for inter-node communication. HACMP utilities use them to synchronize information between the nodes and propagate cluster changes across the cluster nodes.
For each HACMP cluster network, on each cluster node plan to configure two IP labels that will be available at boot time, will be configured on different subnets, and will be used for IPAT via IP aliasing. See Planning IP Labels for IPAT via IP Aliasing.
•On the cluster node that will serve as a Web server, set up a network connection to access WebSMIT. Typically, you set up WebSMIT to be accessible from the cluster’s internal network that is not reachable from the Internet. To securely run WebSMIT on a node, you must ensure HTTP(S)/SSL connectivity to that node; it is not handled automatically by WebSMIT or HACMP. See Security Considerations.
Planning IP Labels for IPAT via IP Aliasing
IP address takeover via IP aliasing is the default method of taking over the IP address and is supported in HACMP for Linux. IPAT via IP aliasing allows one node to acquire the IP label and the IP address of another node in the cluster, using IP aliases.

To enable that IP Address Takeover via IP aliases can be used in the HACMP for Linux networks configuration, configure NICs for the two HACMP cluster networks that meet the following requirements:
•Plan to configure more than one boot-time IP label on the service network interface card on each cluster node.
•Subnet requirements:
•Multiple boot-time addresses configured on a node should be defined on different subnets.
•Service IP addresses must be configured on a different subnet from all non-service addresses (such as boot) defined for that network on the cluster node.
•Multiple service labels can coexist as aliases on a given interface.
•The netmask for all IP labels in an HACMP network must be the same.
•Manually add the IP labels described in this section into the /etc/hosts file on each node. This must be done before you proceed to configure an HACMP cluster in WebSMIT.
HACMP non-service labels are defined on the nodes as the boot-time addresses, assigned by the operating system after a system boot and before the HACMP software is started. When you start the HACMP software on a node, the node’s service IP label is added as an alias onto one of the NICs that has a non-service label.
When using IPAT via IP Aliases, the node’s NIC must meet the following conditions:
•The NIC has both the boot-time and service IP addresses configured, where the service IP label is an alias placed on the interface.
•The boot-time address is never removed from a NIC, simply an alias is added on the NIC in addition to the boot-time address.
•If the node fails, a takeover node acquires the failed node’s service address as an alias on one of its non-service interfaces on the same HACMP network. During a node fallover event, the service IP label that is moved is placed as an alias on the target node’s NIC in addition to any other service labels that may already be configured on that NIC.
When using IPAT via IP Aliases, service IP labels are acquired using all available non-service interfaces. If there are multiple interfaces available to host the service IP label, the interface is chosen according to the number of IP labels currently on that interface. If multiple service IP labels are acquired and there are multiple interfaces available, the service IP labels are distributed across all the available interfaces.
Once you install HACMP for Linux, proceed to configure WebSMIT for access to the cluster configuration user interface.

Software Prerequisites for Installation
When you install HACMP for Linux, make sure that the following software is installed on the cluster nodes:
•Red Hat™ Enterprise Linux (RHEL) 4 or SUSE™ LINUX Enterprise Server (SLES) 9 (both with latest updates).
Read the readme file for WebSMIT/usr/es/sbin/cluster/wsm/README for information on specific Apache V1 and V2 requirements, and for information on specific issues related to RHEL or SUSE Linux distribution.
•RSCT 2.4.5.2. For the latest information about RSCT levels and the latest available APARs for RSCT, check the HACMP for Linux v. 5.4.1 Release Notes.
•Apache WebServer V1 and V2 (provided with the Linux distribution).
•ksh93. A compliant version of ksh. Ensure that the ksh version you have installed is ksh93 compliant. The ksh93 environment is a prerequisite for the RHEL distribution, and HACMP for Linux checks for it prior to the installation.
You can download ksh93 from the Web. The fileset name is similar to the following: ksh-20050202-1.ppc.rpm.

Cluster Software

The HACMP for Linux cluster software can be described in these two categories:
•Software that you need sot that you can install and run the cluster. In particular, HACMP for Linux requires RSCT (IBM Reliable Scalable Cluster Technology) subsystem to be installed on the nodes. For complete information on what software you need to install, see the installation section.
•The application that you plan to make highly available with the use of HACMP. It can be a database or another service.

Planning and Installing HACMP for Linux

This chapter describes how to plan and install HACMP for Linux. It contains the following sections:
•Cluster Hardware
•Cluster Software
•Planning the HACMP Configuration
•Installing HACMP for Linux
•Contents of the Installation Media
•Installation Process Overview
•Security Considerations
•Where You Go from Here.
Cluster Hardware
This section lists examples of IBM hardware that you can use for cluster nodes, cluster networks and cluster storage disks. For complete information, see the IBM Portal on Linux website:
http://www.ibm.com/linux/
Hardware for Cluster Nodes
HACMP for Linux lets you configure up to eight HACMP nodes. You can use:
•Selected models of IBM System p™ servers
For more information, see: http://www.ibm.com/systems/p/linux/
Also, for descriptions of IBM hardware that you can use as HACMP cluster nodes in AIX, see the HACMP for AIX Planning Guide.
Hardware for Cluster Networks
HACMP for Linux supports the following interconnection networks for clusters:
•Selected modes of 10/100 Mbps Ethernet
•Selected models of Gigabit Ethernet
•Token Ring.
An Ethernet or a Token Ring network can be used as an HACMP cluster IP-based network.
Planning and Installing HACMP for Linux
Cluster Hardware

Hardware for Cluster Storage
HACMP for Linux does not provide high availability for storage resources in your cluster configuration. However you can use NFS or IBM TotalStorage disk subsystems as the storage options in your cluster.
No Automatic NFS and Volume Management
Although you can have disks and file systems configured in the same environment in which your HACMP for Linux cluster is configured, HACMP for Linux does not support NFS file systems. You cannot include file systems associated with the application into the resource groups.
This means that the file systems are not kept highly available by HACMP for Linux. In particular, during fallovers, when applications are moved to other nodes, HACMP for Linux does not automatically unmount the associated file systems on one node and mount them on the takeover node. Similarly, HACMP for Linux does not automatically perform any volume management or volume group operations for volume groups that a particular application needs to access.
However, if you want to manage storage in the cluster, you can still use NFS or GPFS to control it. To ensure that your NFS file systems work within the cluster, you must manage NFS manually, that is, completely outside of your HACMP for Linux cluster.
For example, for a two-node cluster, you can have an NFS server configured somewhere at your site, and make it to export the file system to your cluster nodes. Both nodes will need to mount the file system at boot time. The file system will be also mounted on another cluster node, the one to which the resource group may potentially fall over in cases of failures. Your application and service IP label will be running on one node. On fallover, the application and service IP label will move to the takeover node where the NFS file system has also been mounted since boot time. This way, your application has access to the file system regardless of which node is currently hosting the application. However, the NFS file systems service provided to your application is not kept highly available by HACMP.
As an alternative, here is a cluster configuration that lets you have high availability of your NFS file system in the HACMP for Linux cluster. You can configure an NFS server on a separate two-node cluster, with both nodes running HACMP for AIX, specifically, the nodes should run HACMP’s NFS component (it is part of HACMP for AIX). You can then export the file system from this highly available NFS server to the nodes of your separate HACMP for Linux cluster

Preventing Cluster Partitioning

To prevent cluster partitioning, configure a serial network for heartbeating between the nodes, in addition to the IP-based cluster network. If the IP-based cluster network connection between the nodes fails, the heartbeating network prevents data divergence and cluster partitioning.

Network Interface Failure

The HACMP software handles failures of network interfaces on which a service IP label is configured. Types of such failures are:
•Out of two network interfaces configured on the same HACMP node and network, the network interface with a service IP label fails, but an additional “backup” network interface card remains available. In this case, the Cluster Manager removes the service IP label from the failed network interface, and recovers it, via IP aliasing, on the “backup” network interface. Such a network interface failure is transparent to you except for a small delay while the system reconfigures the network interface on the node.
•Out of two network interfaces configured on a node, an additional or a “backup” network interface fails, but the network interface with a service IP label configured on it remains available. In this case, the Cluster Manager detects a (backup) network interface failure, logs the event, and sends a message to the system console. The application continues to be highly available. If you want additional processing, you can customize the processing for this event.
•If the service IP label that is part of a resource group cannot be recovered on a local node, HACMP moves the resource group with the associated IP label to another node, using IP aliasing as the mechanism to recover the associated service IP label.

How HACMP Handles Network Failures on the Local Node

A local network failure occurs when all interfaces of a specific cluster network on a node fail. For example, if you have nodes A and B, and networks net1 and net2, and all interfaces of network net1 on node A fail, then a network_down event runs for net1 with node A as the event node. You can see this in the /tmp/hacmp.out file. This is also called a local network failure.
In this case, the Cluster Manager takes selective recovery actions for resource groups containing a service IP label connected to that network. The Cluster Manager attempts to recover only the resource groups affected by the local network failure event.

Network Failure

A network failure occurs when none of the cluster nodes can access each other using any of the network interface cards configured for the HACMP network.
To protect against network failures, we recommend that you have the nodes in the cluster connected by multiple networks. If one network fails, HACMP uses a network that is still available for cluster traffic and for monitoring the status of the nodes (heartbeating).
You can also specify additional actions to process a network failure—for example, re-routing through an alternate network.

Node and Network Failure Scenarios

This section describes how HACMP for Linux handles failures and ensures that the application keeps running.
The following scenarios are considered:
•Node Failure
•Network Failure
•Network Interface Failure
•Preventing Cluster Partitioning.

Node Failure:

If the application is configured to normally run on Node1 and Node1 fails, the resource group with the application falls over, or moves, to Node2.

At a high level, on Node2, HACMP detects that Node1, the default owner of the resource group, has failed and moves the resource group to Node2. This operation is called a resource group takeover. The application is kept highly available and the end users continue to access it.
If Node2 rejoins the cluster, based on the resource group policy HACMP performs the resource group fallback. The resource group moves back to Node1 (for example, if that is the selected fallback policy for the resource group).

HACMP:Sample Configuration with a Diagram

The following configuration includes:
•Node1 and Node2 running Linux
•A serial network
•An IP-based network.

Cluster Terminology

The list below includes basic terms used in the HACMP environment.
Note:In general, terminology for HACMP is based on industry conventions for high availability. However, the meaning of some of the terms in HACMP may differ from the generic terms.
An application is a service, such as a database, or a collection of system services and their dependent resources, such as a service IP label and application’s start and stop scripts, that you want to keep highly available with the use of HACMP.
An application server is a collection of application start and stop scripts that you provide to HACMP by entering the pathnames for the scripts in the WebSMIT user interface. An application server becomes a resource associated with an application, you include it in a resource group for HACMP to keep it highly available. HACMP ensures that the application can start and stop successfully no matter on which cluster node it is being started.
A cluster node is a physical machine, typically an AIX or a Linux server on which you install HACMP. A cluster node also hosts an application. A cluster node serves as a server for application’s clients. HACMP’s role is to ensure continuous access to the application, no matter on which node in the cluster the application is currently active.
A home node is a node on which the application is hosted, based on your default configuration for the application’s resource group, and under normal conditions.
A takeover node is a backup cluster node to which HACMP may move the application. You can move the application to this node manually, for instance, to free the home node for planned maintenance. Or, HACMP moves the application automatically, due to a cluster component failure.
In HACMP for Linux v.5.4.1, a cluster configuration includes up to eight nodes. Therefore, you can have more than one potential takeover nodes for a particular application. You define the list of nodes on which you want HACMP to host your application using the WebSMIT interface. This list is called a resource group’s nodelist.
A cluster IP network is used for cluster communications between the nodes and for sending heartbeating information. All IP labels configured on the same HACMP network share the netmask, but may be required to have different subnets.
An IP label is a name of a network interface card (NIC) that you provide to HACMP. Network configuration for HACMP requires planning for several types of IP labels:
•Base (or boot) IP labels on each node—the ones through which an initial cluster connectivity is established.
•Service IP labels for each application—the ones through which a connection for a highly available application is established.
•Backup IP labels (optional).
•Persistent IP labels on each node. These are node-bound IP labels that are useful to have in the cluster for administrative purposes.

Note that to ensure high availability and access to the application, HACMP “recovers” the service IP address associated with the application on another node in the cluster in cases of network interface failures. HACMP uses IP aliases for HACMP networks. For information, see Planning IP Networks and Network Interfaces.
An IP alias is an alias placed on an IP label. It coexists on an interface along with the IP label. Networks that support Gratuitous ARP cache updates enable configuration of IP aliases.
IP Address Takeover (IPAT) is a process whereby a service IP label on one node is taken over by a backup node in the cluster. HACMP uses IPAT to provide high availability of IP service labels that belong to resource groups. These labels provide access to applications. HACMP uses IPAT to recover the IP label on the same node or the backup node. HACMP for Linux by default supports the mode of IPAT known as IPAT via IP Aliasing. (The other method of IPAT—IPAT via IP Replacement is not supported).
IP Address Takeover via IP Aliasing is the default method of IPAT used in HACMP. HACMP uses IPAT via IP Aliasing in cases when it must automatically recover a service IP label on another node. To configure IPAT via IP Aliasing, you configure service IP labels and their aliases to the system. When HACMP performs IPAT during automatic cluster events, it places an IP alias recovered from the “failed” node on top of the service IP address on the takeover node. As a result, access to the application continues to be provided.
Cluster resources can include an application server and a service IP label. All or some of these resources can be associated with an application you plan to keep highly available. You include cluster resources into resource groups.
A resource group is a collection of cluster resources.
Resource group startup is an activation of a resource group and its associated resources on a specified cluster node. You choose a resource group startup policy from a predefined list in WebSMIT.
Resource group fallover is an action of a resource group, when HACMP moves it from one node to another. In other words, a resource group and its associated application fall over to another node. You choose a resource group fallover policy from a predefined list in WebSMIT.
Takeover is an automatic action during which HACMP takes over resources from one node and moves them to another node. Takeover occurs when a resource group falls over to another node. A backup node is referred to as a takeover node.
Resource group fallback is an action of a resource group, when HACMP returns it from a takeover node back to the home node. You choose a resource group fallback policy from a predefined list in WebSMIT.
Cluster Startup is the starting of HACMP cluster services on the node(s).
Cluster Shutdown is the stopping of HACMP cluster services on the node(s).
Pre- and post-events are customized scripts provided by you (or other system administrators), which you can make known to HACMP and which will be run before or after a particular cluster event. For more information on pre- and post-event scripts, see the chapter on Planning Cluster Events in the HACMP for AIX Planning Guide.

High Availability Cluster Multi-Processing (HACMP™) on Linux is the IBM tool for building Linux-based computing platforms that include more than one server and provide high availability of applications and services.
Both HACMP for AIX and HACMP for Linux versions use a common software model and present a common user interface (WebSMIT). This chapter provides an overview of HACMP on Linux and contains the following sections:
•Overview
•Cluster Terminology
•Sample Configuration with a Diagram
•Node and Network Failure Scenarios
•Where You Go from Here.

Overview:

HACMP for Linux enables your business application and its dependent resources to continue running either at its current hosting server (node) or, in case of a failure at the hosting node, at a backup node, thus providing high availability and recovery for the application.
HACMP detects component failures and automatically transfers your application to another node with little or no interruption to the application’s end users.
HACMP for Linux takes advantage of the following software components to reduce application downtime and recovery:
•Linux operating system (RHEL or SUSE ES versions)
•TCP/IP subsystem
•High Availability Cluster Multi-Processing (HACMP™) on Linux cluster management subsystem (the Cluster Manager daemon).
HACMP for Linux Cluster Overview
Overview
14 HACMP for Linux: Installation and Administration Guide
1

HACMP for Linux provides:

•High Availability for system processes, services and applications that are running under HACMP’s control. HACMP ensures continuing service and access to applications during hardware or software outages (or both), planned or unplanned, in an eight-node cluster. Nodes may have access to the data stored on shared disks over an IP-based network (although shared disks cannot be part of the HACMP for Linux cluster and are not kept highly available by HACMP).
•Protection and recovery of applications when components fail. HACMP protects your applications against node and network failures, by providing automatic recovery of applications.
If a node fails, HACMP recovers applications on a surviving node. If a network or a network interface card (adapter) fails, HACMP uses an alternate networks, an additional network interface or an IP label alias to recover the communication links and continue providing access to the data.
•WebSMIT, a web-based user interface to configure an HACMP cluster. In WebSMIT, you can configure a basic cluster with the most widely used, default settings, or configure a customized cluster while having the access to customizable tools and functions. WebSMIT lets you view your existing cluster configuration in different ways (node-centric view, or application-centric view) and provides cluster status tools.
•Easy customization of how applications are managed by HACMP. You can configure HACMP to handle applications in the way you want:
•Applications startup.You select from a set of options for how you want HACMP to start up applications on the node(s).
•Applications recovery actions that HACMP takes. If a failure occurs with an application’s resource that is monitored by HACMP, you select whether you want HACMP to recover applications on another cluster node, or stop the applications.
•HACMP’s follow-up after recovery. You select how you want HACMP to react in cases when you have restored a failed cluster component. For instance, you decide on which node HACMP should restart the application that was previously automatically stopped (or moved to another node) due to a previously detected resource failure.
•Built-in configuration, system maintenance and troubleshooting functions. HACMP has functions to help you with your daily system management tasks, such as cluster administration, automatic cluster monitoring of the application’s health, or notification upon component failures.
•Tools for creating similar clusters from an existing “sample” cluster. You can save your existing HACMP cluster configuration in a cluster snapshot file, and later recreate it in an identical cluster in a few steps.

Suresh's passion for AIX