Suresh's passion for AIX: 2008

Friday, July 11, 2008

Quorum

The quorum is one of the mechanisms that the LVM uses to ensure that a volume group is ready to use and contains the most up-to-date data.
A quorum is a vote of the number of Volume Group Descriptor Areas and Volume Group Status Areas (VGDA/VGSA) that are active. A quorum ensures data integrity of the VGDA/VGSA areas in the event of a disk failure. Each physical disk in a volume group has at least one VGDA/VGSA. When a volume group is created onto a single disk, it initially has two VGDA/VGSA areas residing on the disk. If a volume group consists of two disks, one disk still has two VGDA/VGSA areas, but the other disk has one VGDA/VGSA. When the volume group is made up of three or more disks, then each disk is allocated just one VGDA/VGSA.
A quorum is lost when enough disks and their VGDA/VGSA areas are unreachable so that a 51% majority of VGDA/VGSA areas no longer exists. In a two-disk volume group, if the disk with only one VGDA/VGSA is lost, a quorum still exists because two of the three VGDA/VGSA areas still are reachable. If the disk with two VGDA/VGSA areas is lost, this statement is no longer true. The more disks that make up a volume group, the lower the chances of quorum being lost when one disk fails.
When a quorum is lost, the volume group varies itself off so that the disks are no longer accessible by the LVM. This prevents further disk I/O to that volume group so that data is not lost or assumed to be written when physical problems occur. Additionally, as a result of the vary-off, the user is notified in the error log that a hardware error has occurred and service must be performed.
There are cases when it is desirable to continue operating the volume group even though a quorum is lost. In these cases, quorum checking can be turned off for the volume group. This type of volume group is referred to as a nonquorum volume group. The most common case for a nonquorum volume group occurs when the logical volumes have been mirrored. When a disk is lost, the data is not lost if a copy of the logical volume resides on a disk that is not disabled and can be accessed. However, there can be instances in nonquorum volume groups, mirrored or nonmirrored, when the data (including copies) resides on the disk or disks that have become unavailable. In those instances, the data might not be accessible even though the volume group continues to be varied on.

Volume groups

A volume group is a collection of 1 to 32 physical volumes of varying sizes and types.
A big volume group can have from 1 to 128 physical volumes. A scalable volume group can have up to 1024 physical volumes. A physical volume can belong to only one volume group per system; there can be up to 255 active volume groups.
When a physical volume is assigned to a volume group, the physical blocks of storage media on it are organized into physical partitions of a size you specify when you create the volume group. For more information.
When you install the system, one volume group (the root volume group, called rootvg) is automatically created that contains the base set of logical volumes required to start the system, as well as any other logical volumes you specify to the installation script. The rootvg includes paging space, the journal log, boot data, and dump storage, each in its own separate logical volume. The rootvg has attributes that differ from user-defined volume groups. For example, the rootvg cannot be imported or exported. When performing a command or procedure on the rootvg, you must be familiar with its unique characteristics.
You create a volume group with the mkvg command. You add a physical volume to a volume group with the extendvg command, make use of the changed size of a physical volume with the chvg command, and remove a physical volume from a volume group with the reducevg command. Some of the other commands that you use on volume groups include: list (lsvg), remove (exportvg), install (importvg), reorganize (reorgvg), synchronize (syncvg), make available for use (varyonvg), and make unavailable for use (varyoffvg).
Small systems might require only one volume group to contain all the physical volumes attached to the system. You might want to create separate volume groups, however, for security reasons, because each volume group can have its own security permissions. Separate volume groups also make maintenance easier because groups other than the one being serviced can remain active. Because the rootvg must always be online, it contains only the minimum number of physical volumes necessary for system operation.
You can move data from one physical volume to other physical volumes in the same volume group with the migratepv command. This command allows you to free a physical volume so it can be removed from the volume group. For example, you could move data from a physical volume that is to be replaced.
A volume group that is created with smaller physical and logical volume limits can be converted to a format which can hold more physical volumes and more logical volumes. This operation requires that there be enough free partitions on every physical volume in the volume group for the volume group descriptor area (VGDA) expansion. The number of free partitions required depends on the size of the current VGDA and the physical partition size. Because the VGDA resides on the edge of the disk and it requires contiguous space, the free partitions are required on the edge of the disk. If those partitions are allocated for a user's use, they are migrated to other free partitions on the same disk. The rest of the physical partitions are renumbered to reflect the loss of the partitions for VGDA usage. This renumbering changes the mappings of the logical to physical partitions in all the physical volumes of this volume group. If you have saved the mappings of the logical volumes for a potential recovery operation, generate the maps again after the completion of the conversion operation. Also, if the backup of the volume group is taken with map option and you plan to restore using those maps, the restore operation might fail because the partition number might no longer exist (due to reduction). It is recommended that backup is taken before the conversion and right after the conversion if the map option is used. Because the VGDA space has been increased substantially, every VGDA update operation (creating a logical volume, changing a logical volume, adding a physical volume, and so on) might take considerably longer to run.

Viewing BOS installation logs using SMIT

You can use the SMIT fast path to view some logs in the /var/adm/ras directory.
To view some logs in the /var/adm/ras directory, you can use the following SMIT fast path:
smit alog_show
The resulting list contains all logs that are viewable with the alog command. Select from the list by pressing the F4 key

Viewing BOS installation logs

Information saved in BOS installation log files might help you determine the cause of installation problems.
To view BOS installation log files, type cd /var/adm/ras and view the files in this directory. One example is the devinst.log, which is a text file that can be viewed with any text editor or paged

Viewing BOS installation logs with the alog command

You can use the alog command to view some logs in the /var/adm/ras directory.
To view some logs in the /var/adm/ras directory, type:
alog -o -f bosinstlog

Troubleshooting a full /usr file system

Use this procedure for troubleshooting a full /usr file system.
To release space in a full /usr file system, complete one or more of the following tasks:
• Type installp -c all to commit all updates and release space in the /usr file system.
• If the system is not a Network Installation Management (NIM) system serving a Shared Product Object Tree (SPOT), enter /usr/lib/instl/inurid -r to remove client information for root file system installations. For information about NIM and SPOTs, see Using the SPOT (Shared Product Object Tree) resource in the NIM Resources section.
• Remove software that you do not need

Mirroring the root volume group in AIX

The following scenario explains how to mirror the root volume group (rootvg).
Note:
1. Mirroring the root volume group requires advanced system administration experience. If not done correctly, you can cause your system to be unbootable.
2. Mirrored dump devices are supported in AIX 4.3.3 or later.
In the following scenario, the rootvg is contained on hdisk01, and the mirror is being made to a disk called hdisk11:
1. Check that hdisk11 is supported by AIX® as a boot device:
bootinfo -B hdisk11
If this command returns a value of 1, the selected disk is bootable by AIX. Any other value indicates that hdisk11 is not a candidate for rootvg mirroring.
2. Extend rootvg to include hdisk11, using the following command:
extendvg rootvg hdisk11
If you receive the following error messages:
0516-050 Not enough descriptor space left in this volume group, Either try
adding a smaller PV or use another volume group.
or a message similar to:
0516-1162 extendvg: Warning, The Physical Partition size of 16 requires the
creation of 1084 partitions for hdisk11. The limitation for volume group
rootvg is 1016 physical partitions per physical volume. Use chvg command with
the -t option to attempt to change the maximum physical partitions per Physical
Volume for this volume group.
You have the following options:
• Mirror the rootvg to an empty disk that already belongs to the rootvg.
• Use a smaller disk.
• Change the maximum number of partitions supported by the rootvg, using the following procedure:
a. Check the message for the number of physical partitions needed for the destination disk and the maximum number currently supported by rootvg.
b. Use the chvg -t command to multiply the maximum number of partitions currently allowed in rootvg (in the above example, 1016) to a number that is larger than the physical partitions needed for the destination disk (in the above example, 1084). For example:
chvg -t 2 rootvg
c. Reissue the extendvg command at the beginning of step 2.
3. Mirror the rootvg, using the exact mapping option, as shown in the following command:
mirrorvg -m rootvg hdisk11
This command will turn off quorum when the volume group is rootvg. If you do not use the exact mapping option, you must verify that the new copy of the boot logical volume, hd5, is made of contiguous partitions.
4. Initialize all boot records and devices, using the following command:
bosboot -a
5. Initialize the boot list with the following command:
bootlist -m normal hdisk01 hdisk11
Note:
a. Even though the bootlist command identifies hdisk11 as an alternate boot disk, it cannot guarantee the system will use hdisk11 as the boot device if hdisk01 fails. In such case, you might have to boot from the product media, select maintenance, and reissue the bootlist command without naming the failed disk.
b. If your hardware model does not support the bootlist command, you can still mirror the rootvg, but you must actively select the alternate boot disk when the original disk is unavailable.

Installing when booting a system backup fails

If a backup tape fails to boot, you can still install by using a mksysb image stored on the tape.
Boot the machine from the product media (Volume 1 if there is more than one volume), then install the backup from Maintenance mode. For instructions on booting, refer to Installing the Base Operating System. Follow the instructions to the point when the Welcome to the Base Operating System Installation and Maintenance screen displays.
If your system fails to boot from a mksysb tape, you may have encountered a problem which can be identified and resolved with these instructions. Affected systems include all CHRP architecture systems, which started with the model F50. Access the firmware command line prompt, which usually appears as an option in the SMS menus. At the firmware command line prompt, type following two commands:
setenv real-base 1000000
reset-all
The system will then reboot, and you will be able to boot from tape, assuming that you have an otherwise valid boot image on your tape media.

Installing a system backup on the source machine

You can use Web-based System Manager or command line to restore an operating system onto the same machine from which you created the backup.
For either interface, the following conditions must be met before beginning the procedure:
• All hardware must already be installed, including external devices, such as tape and CD/DVD-ROM drives.
• Obtain your system backup image from one of the following sources:
CD or DVD BOS CDs, created in one of the following ways:
• Using the Web-based System Manager Backup and Restore application. Select System backup to writable CD.
• Using the SMIT Back Up This System to CD menu.
• From the command line, using the mkcd or mkdvd command.

Tape BOS tapes, created in one of the following ways:
• Using the Web-based System Manager Backup and Restore application. Select Back up the system.
• Using the SMIT Back Up the System to Tape/File menu.
• From the command line, using the mksysb -i Target command.
Note: If devices were removed from or replaced on the system after the backup was created, their information will be restored when you install a backup. The system shows these devices in a defined state because the ODM from the system at the time of backup is restored instead of rebuilt.
Network The path to your backup image file. For information about installing a backup across a network, refer to Using a mksysb image to install the base operating system on a NIM Client.

Note: Before you begin, select the tape or CD/DVD-ROM drive as the primary boot device. For additional information, refer to the section in your hardware documentation that discusses system management services.
Due to enhancements in the mksysb command, you can control how devices are recovered when you install a system backup on the source machine. This behavior is determined by the RECOVER_DEVICES variable in the bosinst.data file. This variable can be set to default, yes, or no. The following list shows the resulting behaviors for each value:
default
ODM is restored
yes
ODM is restored
no
No recovery of devices
Note: You can override the default value of RECOVER_DEVICES by selecting yes or no in the Backup Restore menu or by editing the value of the attribute in the bosinst.data file.
To use Web-based System Manager:
1. Start the Web-based System Manager by typing wsm on the command line as root user.
2. Expand Software in the Navigation Area, select Overview and Tasks, then select Reinstall Operating System.
3. Choose the installation device:
• Network
If you choose this option, your machine must either be a configured NIM client, or have access to a NIM environment. If your machine is not a NIM client, the Reinstall Base Operating System wizard leads you through the process. For more information on setting up a NIM environment, see Using installation images to install the base operating system on a NIM client.
• Tape or CD/DVD-ROM
4. Choose Install a system backup image (mksysb) as the installation type.
5. Follow the wizard prompts to complete the procedure.
To use the command line:
1. You can use the bootlist command to display or change the primary boot device.
To display the primary boot device:
bootlist -m normal -o
To change the primary boot device:
bootlist -m normal rmt0
bootlist -m normal cd0
2. Power off your machine by following these steps:
a. Log in as the root user.
b. Enter the following command:
shutdown -F
c. If your system does not automatically power off, place the power switch in the Off (0) position.
Attention: Do not turn on the system unit until Step 6.
3. Turn on all attached external devices. These include:
• Terminals
• CD or DVD drives
• Tape drives
• Monitors
• External disk drives
Turning on the external devices first is necessary so that the system unit can identify them during the startup (boot) process.
4. Insert the installation media into the tape or CD or DVD drive.
You might find that on certain tape drive units, the tape drive door does not open while the system is turned off. If you have this problem, use the following procedure:
. Turn on the system unit.
a. Insert the boot installation tape (insert Volume 1 if you received more than one volume).
b. Turn off the system unit and wait for 30 seconds.
5. If you are not using an ASCII terminal, skip to Step 6. If you are using an ASCII terminal, use the following criteria to set the communications, keyboard, and display options.
Note: If your terminal is an IBM® 3151, 3161, or 3164, press the Ctrl+Setup keys to display the Setup Menu and follow the on-screen instructions to set these options. If you are using some other ASCII terminal, refer to the appropriate documents for information about how to set these options. Some terminals have different option names and settings than those listed here.
Communication Options
Option Setting
Line Speed (baud rate) 9600
Word Length (bits per character) 8
Parity no (none)
Number of Stop Bits 1
Interface RS-232C (or RS-422A)
Line Control IPRTS
Keyboard and Display Options
Option Setting
Screen normal
Row and Column 24x80
Scroll jump
Auto LF (line feed) off
Line Wrap on
Forcing Insert line (or both)
Tab field
Operating Mode echo
Turnaround Character CR
Enter return
Return new line
New Line CR
Send page
Insert Character space
6. Turn the system unit power switch from Off (0) to On (|). The system begins booting from the backup media. If your system is booting from tape, it is normal for the tape to move back and forth. If your system has an LED display, the three-digit LED should display c31.
Note: You can boot from production media (tape or CD) if your backup media fails to boot. The initial Welcome screen includes an option to enter a maintenance mode in which you can continue the installation from your backup media. Refer to Troubleshooting an installation from a system backup for more information.
If you have more than one console, each terminal and directly attached display device (or console) might display a screen that directs you to press a key to identify your system console. A different key is specified for each terminal displaying this screen. If this screen is displayed, then press the specified key only on the device to be used as the system console. (The system console is the keyboard and display device used for installation and system administration.) Press a key on only one console.
Note: If the bosinst.data file lists a valid display device for the CONSOLE variable, you do not manually choose a system console. Read Customizing your installation for more information about the bosinst.data file.
7. The type of installation that begins is determined by the settings of the PROMPT field in the control_flow stanza of the bosinst.data file. Use the following criteria to determine the type of installation you will be using:
PROMPT = no Nonprompted Installation. This installation method is used if the backup image is configured to install automatically, without having to respond to the installation program. Go to step 8.
PROMPT = yes Prompted Installation. This installation method is used if you need to use menu prompts to install the backup image. Also, use this installation method if a nonprompted installation halts and the Welcome to Base Operating System Installation and Maintenance screen displays. Go to step 9.
8. A successful nonprompted installation requires no further instructions because the installation is automatic.
Note: If the backup image holds source system-configuration information that is incompatible with the target system, the nonprompted installation stops and a prompted installation begins.
The Installing Base Operating System screen displays before the installation starts. The nonprompted installation pauses for approximately five seconds before beginning. After this time, the non-prompted installation continues to completion.
However, if you decide to interrupt the automatic installation and start a prompted session, type 000 (three zeros) at the terminal and follow the remaining steps in this procedure.
9. The Welcome to the Base Operating System Installation and Maintenance screen displays.
Note: You can view Help information at each screen of this installation process by typing 88.
Choose the Change/Show Installation Settings and Install option.
10. The System Backup Installation and Settings displays. This screen shows current settings for the system. An ellipsis follows the disk listed in the first line if there is more than one disk selected.
11. Either accept the settings or change them. For more information on using map files, see Creating system backups.
To accept the settings and begin the installation, skip to step 16.
To change the settings, continue with step 12.
12. Type 1 in the System Backup Installation and Settings screen to specify disks where you want to install the backup image. The Change Disk(s) Where You Want to Install screen displays. This screen lists all available disks on which you can install the system backup image. Three greater-than signs (>>>) mark each selected disk.
Type the number and press Enter for each disk you choose. Type the number of a selected disk to deselect it. You can select more than one disk.
Note: You can also specify a supplemental disk by typing 66 and pressing the Enter key for the Disks not known to Base Operating System Installation option. This option opens a new menu that prompts for a device support media for the supplemental disk. BOS installation configures the system for the disk and then returns to the Change Disk(s) Where You Want to Install screen.
13. After you have finished selecting disks, press the Enter key.
The screen that displays after you press the Enter key is dependent on the availability of map files for all of the selected disks. The criteria for this is as follows:
• If one or more selected disks have no maps, BOS installation returns directly to the System Backup Installation and Settings screen. Skip to step 15.
• If all selected disks have maps, the Change Use Maps Status screen displays, where you choose whether to use maps for installation. Continue with step 14.
To preserve the placement of logical volumes during a future restoration of the backup, you can create map files before backing up a system. Map files, stored in the /tmp/vgdata/rootvg directory, match the physical partitions on a drive to its logical partitions. Create map files either with the SMIT Backup the System menu, using Web-based System Manager, or using the -m option when you run the mksysb command.
For more information about map files, see Using Map Files for Precise Allocation in Operating system and device management.
14. Type either 1 or 2 in the Change Use Maps Status screen to specify whether the installation program is to use maps.
When you complete this choice, BOS installation returns to the System Backup Installation and Settings screen.
15. Decide whether BOS installation is to shrink file systems on the disks where you install the system. When you choose this option, the logical volumes and file systems within a volume group are re-created to the minimum size required to contain the data. This reduces wasted free space in a file system.
File systems on your backup image might be larger than required for the installed files. Press the 2 key to toggle the Shrink File Systems option between Yes and No in the System Backup Installation and Settings screen. The default setting is No.
Note: Shrinking the file system disables the use of maps.
16. Type 0 to accept the settings in the System Backup Installation and Settings screen.
The Installing Base Operating System screen displays the rate of completion and duration.
If you specified a supplemental disk in step 12, an untitled screen temporarily replaces the Installing Base Operating System screen. When this screen displays, it prompts you to place the device-support media in the drive and press the Enter key. BOS installation reconfigures the supplemental disk, then returns to the Installing Base Operating System screen.
The system reboots automatically when the installation completes.

Creating and installing a software bundle

Using this scenario, you can create a user-defined software bundle and install its contents.
Things to consider
The information in this how-to scenario was tested using specific versions of AIX®. The results you obtain might vary significantly depending on your version and level of AIX.
A user-defined software bundle is a text file ending in .bnd that is located in the /usr/sys/inst.data/user_bundles path. By creating the software bundle file in the /usr/sys/inst.data/user_bundles path, SMIT (System Management Interface Tool) can locate the file and display it in the bundle selection screen.
In this scenario, you will do the following:
• Create a user-defined software bundle that contains the Web-based System Manager Security application, which is located on the Expansion Pack
• Install the software bundle
• Verify the installation of the software bundle was successful
Step 1. Creating a user-defined software bundle
1. Create a text file with the extension .bnd in the /usr/sys/inst.data/user_bundles path by running the following:
# vi /usr/sys/inst.data/user_bundles/MyBundle.bnd
2. Add the software products, packages, or filesets to the bundle file with one entry per line. Add a format-type prefix to each entry. For this example, we are dealing with AIX installp packages, so the format-type prefix is I:. Type the following in the MyBundle.bnd file: I:sysmgt.websm.security.
For more information on installation format types, see Software product packaging.
3. Save the software bundle file and exit the text editor.
Step 2. Installing the software bundle
1. Type the following at the command line: # smitty easy_install
2. Enter the name of the installation device or directory.
3. From the selection screen, select the name of the user-defined software bundle, MyBundle, you created.
4. Install Software Bundle
5.
6. Type or select a value for the entry field.
7. Press Enter AFTER making all desired changes.
8. +--------------------------------------------------------------------------+
9. | Select a Fileset Bundle |
10. | |
11. | Move cursor to desired item and press Enter. |
12. | |
13. | App-Dev |
14. | CDE |
15. | GNOME |
16. | KDE |
17. | Media-Defined |
18. | MyBundle |
19. | ... |
20. | ... |
21. | |
22. | F1=Help F2=Refresh F3=Cancel |
23. | F8=Image F10=Exit Enter=Do |
24. | /=Find n=Find Next |
+--------------------------------------------------------------------------+
25. Change the values provided in the Install Software Bundle screen as appropriate to your situation. You can change the PREVIEW only? option to yes to preview the installation of your software bundle before you install it. You might also need to accept new license agreements if the software in your bundle has an electronic license.
26. Install Software Bundle
27.
28. Type or select values in entry fields.
29. Press Enter AFTER making all desired changes.
30.
31. [Entry Fields]
32. * INPUT device / directory for software /cdrom
33. * BUNDLE MyBundle +
34. * SOFTWARE to install [all] +
35. PREVIEW only? (install operation will NOT occur) no/yes +
36. COMMIT software updates? yes +
37. SAVE replaced files? no +
38. AUTOMATICALLY install requisite software? yes +
39. EXTEND file systems if space needed? yes +
40. VERIFY install and check file sizes? no +
41. Include corresponding LANGUAGE filesets? yes +
42. DETAILED output? no +
43. Process multiple volumes? yes +
44. ACCEPT new license agreements? no/yes +
45. Preview new LICENSE agreements? no +
46.
47. F1=Help F2=Refresh F3=Cancel F4=List
48. Esc+5=Reset F6=Command F7=Edit F8=Image
F9=Shell F10=Exit Enter=Do
49. Press Enter to continue.
50. Press Enter a second time to confirm your decision and begin the installation of your software bundle.
Step 3. Verify the installation of the software bundle
• Check the installation summary at the end of the installation output by scrolling to the end of the output. The output indicates whether the installation of your user-defined software bundle was successful. You might see output similar to the following:
• +-----------------------------------------------------------------------------+
• Summaries:
• +-----------------------------------------------------------------------------+
•
• Installation Summary
• --------------------
• Name Level Part Event Result
• -------------------------------------------------------------------------------
• sysmgt.websm.security 5.3.0.0 USR APPLY SUCCESS
sysmgt.websm.security 5.3.0.0 ROOT APPLY SUCCESS
• You can also verify the installation at a later time by completing one of the following:
• Run the following command:
lslpp -Lb MyBundle
The output indicates whether the installation of your user-defined software bundle was successful. You might see output similar to the following:
Fileset Level State Type Description
-------------------------------------------------------------------------------------------
sysmgt.websm.security 5.1.0.0 C F WebSM Security Components

State codes:
A -- Applied.
B -- Broken.
C -- Committed.
E -- EFIX Locked.
O -- Obsolete. (partially migrated to newer version)
? -- Inconsistent State...Run lppchk -v.

Type codes:
F -- Installp Fileset
P -- Product
C -- Component
T -- Feature
R -- RPM Package
• Complete the following steps in SMIT:
1. Type the following at a command line: smitty list_installed
2. Select List Installed Software by Bundle.
3. With your cursor at the BUNDLE name field, press F4 and select your bundle from the list.
4. Press Enter. Output is shown similar to that in the preceding option

Creating a system backup to tape

Using this scenario, you can create and verify a bootable system backup, also known as a root volume group backup or mksysb image
Things to consider
The information in this how-to scenario was tested using specific versions of AIX®. The results you obtain might vary significantly depending on your version and level of AIX.
Step 1. Prepare for system backup creation
Before creating system backups, complete the following prerequisites:
• Be sure you are logged in as root user.
• If you plan to use a backup image for installing other differently configured target systems, you must create the image before configuring the source system, or set the RECOVER_DEVICES variable to no in the bosinst.data file. For more information about the bosinst.data file, refer to The bosinst.data file in Installation and migration.
• Consider altering passwords and network addresses if you use a backup to make master copies of a source system. Copying passwords from the source to a target system can create security problems. Also, if network addresses are copied to a target system, duplicate addresses can disrupt network communications.
• Mount all file systems you want to back up. The mksysb command backs up only mounted JFS and JFS2 in the rootvg. To mount file systems, use the mount command.
Note: The mksysb command does not back up file systems mounted across an NFS network.
• Unmount any local directories that are mounted over another local directory.
Note: This backup procedure backs up files twice if a local directory is mounted over another local directory in the same file system. For example, if you mount /tmp over /usr/tmp, the files in the /tmp directory are then backed up twice. This duplication might exceed the number of files that a file system can hold, which can cause a future installation of the backup image to fail.
• Use the /etc/exclude.rootvg file to list files you do not want backed up.
• Make at least 12 MB of free disk space available in the /tmp directory. The mksysb command requires this working space for the duration of the backup.
Use the df command, which reports in units of 512-byte blocks, to determine the free space in the /tmp directory. Use the chfs command to change the size of the file system, if necessary.
For example, the following command adds 12 MB of disk space to the /tmp directory of a system with 4 MB partitions:
# chfs -a size=+24000 /tmp
• All hardware must already be installed, including external devices, such as tape and media drives.
• The bos.sysmgt.sysbr fileset must be installed. The bos.sysmgt.sysbr fileset is automatically installed in AIX 5.3. To determine if the bos.sysmgt.sysbr fileset is installed on your system, type:
# lslpp -l bos.sysmgt.sysbr
If the lslpp command does not list the bos.sysmgt.sysbr fileset, install it before continuing with the backup procedure. Type the following:
# installp -agqXd /dev/cd0 bos.sysmgt.sysbr
Step 2. Create a system backup to tape
1. Enter the smit mksysb fast path.
2. Select the tape device in the Backup DEVICE or File field.
3. If you want to create map files, select yes in the Create Map Files? field.
For more information, see Using map files for precise allocation in Operating system and device management.
Note: If you plan to reinstall the backup to target systems other than the source system, or if the disk configuration of the source system might change before reinstalling the backup, do not create map files.
4. To exclude certain files from the backup, select yes in the Exclude Files field.
5. Select yes in the List files as they are backed up field.
6. Select yes in the Disable software packing of backup? field, if you are running any other programs during the backup.
7. Use the default values for the rest of the menu options.
8. Press Enter to confirm and begin the system backup process.
9. The COMMAND STATUS screen displays, showing status messages while the system makes the backup image. When the backup process finishes, the COMMAND: field changes to OK.
10. To exit SMIT when the backup completes, press F10 (or Esc+0).
11. Remove the tape and label it. Write-protect the backup tape.
12. Record any backed-up root and user passwords. Remember that these passwords become active if you use the backup to either restore this system or install another system.
You have successfully created the backup of your rootvg. Because the system backup contains a boot image, you can use this tape to start your system if for some reason you cannot boot from hard disks.

AIX servers

It is very important to keep at lease two versions of rootvg backup of a AIX server.
This backup is required when there is a total system crash, root file system corruption
or total site loss.
Backup of rootvg is done by creating a mksysb tape or image. mksysb backs up only
the mounted filesystems on rootvg and creates a bootable image on tape. The bootable tape is then used to boot & recover the AIX server.

Procedure to create mksysb image

login as root

it is recommended to close all user applications and cluster processes.

#smitty mksysb

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]
WARNING: Execution of the mksysb command will
result in the loss of all material
previously stored on the selected
output medium. This command backs
up only rootvg volume group.

* Backup DEVICE or FILE [] +/
Create MAP files? no +
EXCLUDE files? no +
List files as they are backed up? no +
Generate new /image.data file? yes +
EXPAND /tmp if needed? no +
Disable software packing of backup? no +
Number of BLOCKS to write in a single output [] #
(Leave blank to use a system default)
[BOTTOM]

F1=Help F2=Refresh F3=Cancel F4=List
Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image
Esc+9=Shell Esc+0=Exit Enter=Do

in the backup devices or file field specify the device name for backup e.g. /dev/rmt0 (tape drive) or a filename, if you want to save image on to disk.

in the EXPAND /tmp if needed field select option yes.

This will create a bootable mksysb image.

SAN Switch

It is very important to backup the configuration of SAN switches.

Procedure to backup switch configuration:

Prerequisite: You should know
a> IP address of FTP server
b> IP address of SAN switches
c> Valid username & password of FTP server
d> Directory on the ftp server where you want to save the configuration.

In this example

IP address of FTP server : 10.0.0.40
IP address of SAN switch: 10.0.0.48
Name of SAN switch : SANSW1

Telnet to the SAN switch SANSW1 from any windows client or AIX server

telnet 10.0.0.48

login: admin
password : wipro123 (verify if administrator has changed the password)

Here I am using a AIX host as ftp server. Once the config is uploaded then you can
move it to any of the server or client, where you want to keep the config safe. If
you have any windows ftp server, you can use the same to upload the config.

Please be sure of the directory on ftp server in which you want to save the config

From switch:

SANSW1:admin> configupload
Server Name or IP Address [host]: 10.0.0.40 (This is IP address of AIX FTP server)
User Name [None]: root
File Name [config.txt]: /sansw1config.txt
Password: xxxxxx (root password)
upload complete

this will save config in root directory.

Please follow the same procedure to save config of any other switch like SANSW2.

Procedure to restore switch configuration:

This is required if config has been accidentally deleted or changed or the switch has
gone bad and you are going to replace it with a new switch.

If the switch is a new switch for replacement, then please provide the ip address and subnet mask to the switch by connecting console cable and using hyper terminal. IP address of both the switch are mentioned in installation doc.

During the initial setup the switch will prompt to change the password of admin, root, factory users. Change them and note them.

After ip address is given. Telnet to the switch from any windows or unix server

telnet 10.0.0.48

username: admin
password : (password that you had given to admin user during the initial setup of switch)

from switch

SW1:admin> switchdisable

this will disable the switch. Switch disable is required for the download of
configuration.

SW0:admin> configdownload
Server Name or IP Address [host]: 10.0.0.40
User Name [None]: root
File Name [config.txt]: /yesansw1config.txt
Password: xxxxxx (root password)
Committing configuration...done.
download complete

After the download is complete, enable the switch

SW0:admin> switchenable

you can change the admin password, if you do not remember the old password that
was on previous switch, when its configuration was last uploaded.

if you remember it, the you can logout and login to check whether the config is same
as the previous switch.

SANSW1:admin> zoneshow

output should show all the zone configuration of the previous/faulty switch.

HMC

Backing up and restoring the HMC (hardware management console)
Please follow the procedures mentioned below to backup critical console data of HMC.
Importance of backing up HMC :
In case where HMC does not respond or HMC hard disk crashes, the HMC can be brought back to its original state by reinstalling HMC codes though recovery CD and then restoring the critical data which was backed up on either DVD or remote server.
We will follow the steps to backup the HMC critical data on DVD. One DVD media is shipped with HMC. We will make use of this DVD media to take backup. The other process of taking backup on remote server is for reference and can be used.
Backing up critical HMC data
Using the HMC, you can back up all important data, such as the following:
• User-preference files
• User information
• HMC platform-configuration files
• HMC log files
The Backup function saves the HMC data stored on the HMC hard disk to DVD, a remote system mounted to the HMC file system (such as NFS), or a remote site through FTP. Back up the HMC after you have made changes to the HMC or to the information associated with logical partitions.
To back up the HMC, you must be a member of one of the following roles:
• super administrator
• operator
• service representative
To backup the HMC, do the following:
1. In the Navigation area, click the Licensed Internal Code Maintenance icon.
2. In the Contents area, click the HMC Code Update icon.
3. Select Back up Critical Console Data.
4. Select an archive option. You can back up to DVD on the HMC, back up to a remote system mounted to the HMC file system (such as NFS), or a remote site through FTP.
5. Follow the instructions on the panel to back up the data.
Scheduling and reviewing scheduled HMC backups
You can schedule a backup to DVD to occur once, or you can set up a repeating schedule. You must provide the time and date that you want the operation to occur. If the operation is scheduled to repeat, you must select how often you want this backup to run (hourly, daily, weekly, or monthly).
Note: Only the most-recent backup image is stored at any time on the DVD.
To schedule a backup operation, do the following:
1. In the Navigation area, expand the HMC Management folder.
2. In the Navigation area, click the HMC Configuration icon.
3. In the Contents area, click Schedule Operations.
4. From the list, select the HMC you want to back up and click OK.
5. Select Options > New.
6. In the Add a Scheduled Operation window, select Backup Critical Console Data and click OK.
7. In the appropriate fields, enter the time and date that you want this backup to occur.
8. If you want this scheduled operation to repeat, click the Repeat tab and select the intervals at which you want the backup to repeat and press Enter.
9. When you have set the backup time and date, click Save. When the Action Completed window opens, click OK. A description of the operation displays in the Scheduled Operations window.
Restoring critical HMC data
The HMC backup data should be restored only in conjunction with a reinstallation of the HMC. Reinstallation procedure is mentioned below.
Note: For this operation, you must have the backup DVD-RAM media or access to the remote server where the archive was created.
To restore the HMC data, you must be a member of one of the following roles:
• super administrator
• operator
• service representative
Select the data-restoration procedure based on the data archiving method used:
• Restoring from DVD

Restore data that was archived to DVD.
• Restoring from a remote server

Restore data that was archived to a remote FTP or NFS sever
Reinstalling the HMC machine code
If the HMC is not responding, you can use the recovery CD to reinstall the HMC interface onto the HMC PC. After you reinstall the HMC machine code, you can restore the backup data that you created to recover your critical console information.
To reinstall the HMC machine code, you must be a member of one of the following roles:
• super administrator
• operator
• service representative
To reinstall the HMC machine code, do the following:
1. Shut down and power off the HMC.
2. Power on the HMC console and insert the HMC recovery CD.
3. The HMC powers on from the media and displays the recovery panel.
4. Press F8 to select the 1 - Install/Recover option.
5. When the following message is displayed, press F1:
PRESS F1 TO CONTINUE WITH THE RESTORE /PRELOAD PROCESS. PRESS ESC TO EXIT THE PROCESS.
6. After the installation of the first CD completes, you are prompted to insert the second installation CD into the DVD drive. Press any key to continue. The HMC reboots.
7. After the installation of the second CD completes, select 1 - Install additional software from CD media from the menu displayed to install the information center from the third CD.
8. After the information center installation, select 1 - Restore Critical Console Data from the menu displayed to restore data from a DVD. To restore from a remote server, select 2 - Finish the Installation.
Restoring from DVD
If the critical console data has been archived on a DVD-RAM, do the following:
1. Select 1 - Restore Critical Console Data from the menu displayed. This menu is displayed at the end of the HMC reinstallation.
2. Insert the DVD-RAM containing the archived console data. On first boot of the newly installed HMC, the data automatically restores.
Restoring from a remote server
If the critical console data has been archived remotely, do the following:
1. Manually reconfigure network settings to enable access to the remote server after the HMC is newly installed.
2. In the Navigation area, click the Licensed Internal Code Maintenance icon.
3. In the Contents area, click the HMC Code Update icon.
4. Select Restore Remote Console Data.
5. Select the type of remote restore.
6. Follow the directions on the panel to restore the critical console data. The data automatically restores from the remote server when the system is rebooted.

Logging in to the HMC
After power-on, the HMC login window prompts for the user ID and the password. The HMC is supplied with a predefined user ID hscroot and the default password abc123. Both the user ID and password are case sensitive and must be typed exactly as shown. After the successful login, the HMC graphical user interface opens.
Shutting down, rebooting, and logging off the HMC
This task allows you to shut down, reboot, and log off the HMC interface.
If an operating system is open and running on a partition, and you decide to shut down, reboot, or log off the HMC interface, the operating system continues to run without interruption.
To log off the HMC interface, do the following:
1. In the main menu, click Console > Exit. At this point, you can select to save the state of the console for the next session by selecting the check box next to the option.
Note:
When you exit from the HMC session locally, you can select to shut down, reboot, or log off your session. The following is a description of each option:
Shutdown Console
Powers off the HMC
Reboot Console
Shuts down the HMC and then reboots it to the login prompt
Logout
Returns the user to the login prompt without shutting down the HMC
2. Click Exit Now.

Alternate Disk Installation

On System where the downtime is critical, AIX offers an additional way to install a system. The Alternate Disk Installation allows installing a system while it is still up and running, thus decreasing the installation or down time.

Alternate Disk Installation is used for:
• Installation of mksysb image on another disk, for example a mksysb from a machine with similar hardware that already has been upgraded.
• Cloning an existing rootvg on another disk. Optionally, the cloning offers the possibility of installing updates and new file sets to the cloned rootvg.

The command used for Alternate Disk Installation is alt_disk_install.

The command creates an altinst_rootvg volume group on the destination disk and prepares the same logical volume groups as in the rootvg, except the names are prepended with alt_ (for example, alt_hd1). Similar are the file systems renamed to /alt_inst/filesystemname, and the original data (mksysb or rootvg) is copied.

After this first phase, a second phase begins where an optional configuration action can be performed.

The third phase unmounts the /alt_inst/ file systems and renames the file systems and logical volumes by removing the alt names. When this is done, the altinst_rootvg is varied off, and the bootlist is altered to boot from the new disk.

After the system is rebooted, the original rootvg is renamed to old_rootvg.

The Alternate Disk Installation requires the file sets bos.alt_disk_install.boot_images and bos.alt_disk_install.rte to be installed.

Below is the example that shows the use of alt_disk_install command, performing a cloning of running rootvg on hdisk0 to an unused hdisk1:

# lspv
hdisk0 0001fd4f703db420 rootvg active
hdisk1 0001fd4f72df8da7 None
# bootinfo –b (device from which system boot last time)
hdisk0

# bootlist -m normal –o (bootlist for normal mode)
hdisk0

# alt_disk_install -C hdisk1
+-----------------------------------------------------------------------------+
ATTENTION: calling new module /usr/sbin/alt_disk_copy. Please see the alt_disk_copy man page and documentation for more details.
Executing command: {/usr/sbin/alt_disk_copy -d "hdisk1"}
+-----------------------------------------------------------------------------+
Calling mkszfile to create new /image.data file.
Checking disk sizes.
Creating cloned rootvg volume group and associated logical volumes.
Creating logical volume alt_hd5
Creating logical volume alt_hd6
Creating logical volume alt_hd8
Creating logical volume alt_hd4
Creating logical volume alt_hd2
Creating logical volume alt_hd9var
Creating logical volume alt_hd3
Creating logical volume alt_hd1
Creating logical volume alt_hd10opt
Creating /alt_inst/ file system.
Creating /alt_inst/home file system.
Creating /alt_inst/opt file system.
Creating /alt_inst/tmp file system.
Creating /alt_inst/usr file system.
Creating /alt_inst/var file system.
Generating a list of files
for backup and restore into the alternate file system...
Backing-up the rootvg files and restoring them to the
alternate file system...
Modifying ODM on cloned disk.
Building boot image on cloned disk.
forced unmount of /alt_inst/var
forced unmount of /alt_inst/usr
forced unmount of /alt_inst/tmp
forced unmount of /alt_inst/opt
forced unmount of /alt_inst/home
forced unmount of /alt_inst
forced unmount of /alt_inst
Changing logical volume names in volume group descriptor area.
Fixing LV control blocks...
Fixing file system superblocks...
Bootlist is set to the boot disk: hdisk1

# lspv (Change in lspv after alt_disk_install command)
hdisk0 0001fd4f703db420 rootvg active
hdisk1 0001fd4f72df8da7 altinst_rootvg

# bootlist -m normal –o (New bootlist after alt_disk_install command)
hdisk1

Next time the system will boot from hdisk1 and the original rootvg will be renamed as old_rootvg.

# lspv
hdisk0 0001fd4f703db420 old_rootvg
hdisk1 0001fd4f72df8da7 rootvg active

Now you can do various testing on hdisk1 (new rootvg). For example, upgraded the Maintenance level or software and check whether the application or system is running fine?

If you want to build another server (Server2) having similar hardware setup with the same rootvg as it is in existing server (Server1). Follow the below steps in Server1:
• exportvg altinst_rootvg
• rmdev –Rdl hdisk1
• Remove hdisk1 of Server1 and replace it with hdisk0 of Server2.
• Boot Server2 from hdisk0
• Change some of the important parameters like IP address, hostname etc.

Boot process - AIX

During the boot process, the system tests the hardware, loads and runs the
operating system, and configures devices. To boot the operating system, the
following resources are required:
_ A boot image that can be loaded after the machine is turned on or reset.
_ Access to the root and /usr file systems.
There are three types of system boots:
_ Hard Disk Boot
A machine is started for normal operations with the key in the normal position.
On PCI-based systems with no key locking, this is the default startup mode.
Chapter 2. System startup problem handling 13
_ Diskless Network Boot
A diskless or dataless workstation is started remotely over a network. A
machine is started for normal operations with the key in the normal position.
One or more remote file servers provide the files and programs that diskless
or dataless workstations need to boot.
_ Service Boot
A machine is started from a hard disk, network, tape, or CD-ROM with the key
set in the service position. This condition is also called maintenance mode. In
maintenance mode, a system administrator can perform tasks, such as
installing new or updated software and running diagnostic checks.
During a hard disk boot, the boot image is found on a local disk created when the
operating system was installed. During the boot process, the system configures
all devices found in the machine and initializes other basic software required for
the system to operate (such as the Logical Volume Manager). At the end of this
process, the file systems are mounted and ready for use.
The same general requirements apply to diskless network clients. They also
require a boot image and access to the operating system file tree. Diskless
network clients have no local file systems and get all their information by way of
remote access.
The system finds all necessary information for the boot process on its disk drive.
When the system is started by turning on the power switch (a cold boot) or
restarted with the reboot or shutdown commands (a warm boot), a number of
events must occur before the system is ready for use. These events can be
divided into the following phases:
1. Read Only Storage (ROS) Kernel Init Phase
During this phase, problems with the motherboard are checked, and the ROS
initial program load searches for the bootlist. Once the bootlist is found, the
boot image is read into memory and system initialization starts.
2. Base Device Configuration Phase
All devices are configured in this phase, with the help of the cfgmgr command.
3. System Boot Phase
In this phase of the boot process, all the logical volumes are varied on, paging
is started, and the /etc/inittab file is processed.

Disk striping

In computers that use multiple hard disk systems, disk striping is the process of dividing a body of data into blocks and spreading the data blocks across several partitions on several hard disks. Each stripe is the size of the smallest partition. For example, if three partitions are selected with one partition equaling 150megabytes, another 100MB, and the third 50MB, each stripe will be 50 MB in size. It is wise to create the partitions equal in size to prevent wasting disk space. Each stripe created is part of the stripe set. Disk striping is used with redundant array of independent disks (RAID). RAID is a storage system that uses multiple disks to store and distribute data. Up to 32 hard disks can be used with disk striping.
There are two types of disk striping: single user and multi-user. Single user disk striping allows multiple hard disks to simultaneously service multiple I/O requests from a single workstation. Multi-user disk striping allows multiple I/O requests from several workstations to be sent to multiple hard disks. This means that while one hard disk is servicing a request from a workstation, another hard disk is handling a separate request from a different workstation.
Disk striping is used with or without parity. When disk striping is used with parity, an additional stripe that contains the parity information is stored on its own partition and hard disk. If a hard disk fails, a fault tolerance driver makes the lost partition invisible allowing reading and writing operations to continue which provides time to create a new stripe set. Once a hard disk fails, the stripe set is no longer fault tolerant, which means that if one or more hard disks fail after the first one, the stripe set is lost. Disk striping without parity provides no fault tolerance. The disk striping process is used in conjunction with software that lets the user know when a disk has failed. This software also allows the user to define the size of the stripes, the color assigned to the stripe set for recognition and diagnosing, and whether parity was used or not.

RAID

RAID (redundant array of independent disks; originally redundant array of inexpensive disks) is a way of storing the same data in different places (thus, redundantly) on multiple hard disks. By placing data on multiple disks, I/O (input/output) operations can overlap in a balanced way, improving performance. Since multiple disks increases the mean time between failures (MTBF), storing data redundantly also increases fault tolerance.
A RAID appears to the operating system to be a single logical hard disk. RAID employs the technique of disk striping, which involves partitioning each drive's storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order.
In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single record spans all disks and can be accessed quickly by reading all disks at the same time.
In a multi-user system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O across drives.
There are at least nine types of RAID plus a non-redundant array (RAID-0):
• RAID-0: This technique has striping but no redundancy of data. It offers the best performance but no fault-tolerance.
• RAID-1: This type is also known as disk mirroring and consists of at least two drives that duplicate the storage of data. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage. RAID-1 provides the best performance and the best fault-tolerance in a multi-user system.
• RAID-2: This type uses striping across disks with some disks storing error checking and correcting (ECC) information. It has no advantage over RAID-3.
• RAID-3: This type uses striping and dedicates one drive to storing parity information. The embedded error checking (ECC) information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O. For this reason, RAID-3 is best for single-user systems with long record applications.
• RAID-4: This type uses large stripes, which means you can read records from any single drive. This allows you to take advantage of overlapped I/O for read operations. Since all write operations have to update the parity drive, no I/O overlapping is possible. RAID-4 offers no advantage over RAID-5.
• RAID-5: This type includes a rotating parity array, thus addressing the write limitation in RAID-4. Thus, all read and write operations can be overlapped. RAID-5 stores parity information but not redundant data (but parity information can be used to reconstruct data). RAID-5 requires at least three and usually five disks for the array. It's best for multi-user systems in which performance is not critical or which do few write operations.

• RAID-6: This type is similar to RAID-5 but includes a second parity scheme that is distributed across different drives and thus offers extremely high fault- and drive-failure tolerance.
• RAID-7: This type includes a real-time embedded operating system as a controller, caching via a high-speed bus, and other characteristics of a stand-alone computer. One vendor offers this system.
• RAID-10: Combining RAID-0 and RAID-1 is often referred to as RAID-10, which offers higher performance than RAID-1 but at much higher cost. There are two subtypes: In RAID-0+1, data is organized as stripes across multiple disks, and then the striped disk sets are mirrored. In RAID-1+0, the data is mirrored and the mirrors are striped.
• RAID-50 (or RAID-5+0): This type consists of a series of RAID-5 groups and striped in RAID-0 fashion to improve RAID-5 performance without reducing data protection.
• RAID-53 (or RAID-5+3): This type uses striping (in RAID-0 style) for RAID-3's virtual disk blocks. This offers higher performance than RAID-3 but at much higher cost.
• RAID-S (also known as Parity RAID): This is an alternate, proprietary method for striped parity RAID from EMC Symmetrix that is no longer in use on current equipment. It appears to be similar to RAID-5 with some performance enhancements as well as the enhancements that come from having a high-speed disk cache on the disk array

Wednesday, June 11, 2008

File system verification and recovery

The fsck command checks and interactively repairs inconsistent file systems.
You should run this command before mounting any file system. You must be able
to read the device file on which the file system resides (for example, the /dev/hd0
device).
Normally, the file system is consistent, and the fsck command merely reports on
the number of files, used blocks, and free blocks in the file system. If the file
system is inconsistent, the fsck command displays information about the
inconsistencies found and prompts you for permission to repair them. If the file
system cannot be repaired, restore it from backup.
Mounting an inconsistent file system may result in a system crash. If you do not
specify a file system with the FileSystem parameter, the fsck command will
check all the file systems with the attribute check=TRUE in /etc/filesystems.
Note: By default, the /, /usr, /var, and /tmp file systems have the check
attribute set to false (check=false) in their /etc/filesystems stanzas. The
attribute is set to false for the following reasons:
The boot process explicitly runs the fsck command on the /, /usr, /var, and
/tmp file systems.
The /, /usr, /var, and /tmp file systems are mounted when the /etc/rc file is
run. The fsck command will not modify a mounted file system, and fsck
results on mounted file systems are unpredictable.
Chapter 7. LVM, file system, and disk problem determination 153
Fixing a bad superblock
If you receive one of the following errors from the fsck or mount commands, the
problem may be a corrupted superblock, as shown in the following example:
fsck: Not an AIX3 file system
fsck: Not an AIXV3 file system
fsck: Not an AIX4 file system
fsck: Not an AIXV4 file system
fsck: Not a recognized file system type
mount: invalid argument
The problem can be resolved by restoring the backup of the superblock over the
primary superblock using the following command (care should be taken to check
with the latest product documentation before running this command):
# dd count=1 bs=4k skip=31 seek=1 if=/dev/lv00 of=/dev/lv00
The following is an example of when the superblock is corrupted and copying the
backup helps solve the problem:
# mount /u/testfs
mount: 0506-324 Cannot mount /dev/lv02 on /u/testfs: A system call received a
parameter that is not valid.
# fsck /dev/lv02
Not a recognized filesystem type. (TERMINATED)
# dd count=1 bs=4k skip=31 seek=1 if=/dev/lv02 of=/dev/lv02
1+0 records in.
1+0 records out.
# fsck /dev/lv02
** Checking /dev/lv02 (/u/tes)
** Phase 0 - Check Log
log redo processing for /dev/lv02
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Inode Map
** Phase 6 - Check Block Map
8 files 2136 blocks 63400 free
Once the restoration process is complete, check the integrity of the file system by
issuing the fsck command:
# fsck /dev/lv00
154 IBM ^ Certification Study Guide - AIX 5L Problem Determination Tools and Techniques
In many cases, restoration of the backup of the superblock to the primary
superblock will recover the file system. If this does not resolve the problem,
recreate the file system and restore the data from a backup.
7.4.4 Sparse file allocation
Some applications, particularly databases, maintain data in sparse files. Files
that do not have disk blocks allocated for each logical block are called sparse
files. If the file offsets are greater than 4 MB, then a large disk block of 128 KB is
allocated. Applications using sparse files larger than 4 MB may require more disk
blocks in a file system enabled for large files than in a regular file system.
In the case of sparse files, the output of the ls command is not showing the
actual file size, but is reporting the number of bytes between the first and last
blocks allocated to the file, as shown in the following example:
# ls -l /tmp/sparsefile
-rw-r--r-- 1 root system 100000000 Jul 16 20:57 /tmp/sparsefile
The du command can be used to see the actual allocation, since it reports the
blocks actually allocated and in use by the file. Use du -rs to report the number
of allocated blocks on disk.
# du -rs /tmp/sparsefile
256 /tmp/sparsefile
Using the dd command in combination with your own backup script will solve this
problem.
7.4.5 Unmount problems
A file system cannot be unmounted if any references are still active within that file
system. The following error message will be displayed:
Device busy
or
A device is already mounted or cannot be unmounted
Note: The tar command does not preserve the sparse nature of any file that
is sparsely allocated. Any file that was originally sparse before the restoration
will have all space allocated within the file system for the size of the file. New
AIX 5L options for the backup and restore command are useful for sparse
files.
Chapter 7. LVM, file system, and disk problem determination 155
The following situations can leave open references to a mounted file system.
Files are open within a file system. These files must be closed before the file
system can be unmounted. The fuser command is often the best way to
determine what is still active in the file system. The fuser command will return
the process IDs for all processes that have open references within a specified
file system, as shown in the following example:
# umount /home
umount: 0506-349 Cannot unmount /dev/hd1: The requested resource is busy.
# fuser -x -c /home
/home: 11630
# ps -fp 11630
UID PID PPID C STIME TTY TIME CMD
guest 11630 14992 0 16:44:51 pts/1 0:00 -sh
# kill -1 11630
# umount /home
The process having an open reference can be killed by using the kill
command (sending a SIGHUP), and the unmount can be accomplished. A
stronger signal may be required, such as SIGKILL.
If the file system is still busy and still cannot be unmounted, this could be due
to a kernel extension that is loaded but exists within the source file system.
The fuser command will not show these kinds of references, since a user
process is not involved. However, the genkex command will report on all
loaded kernel extensions.
File systems are still mounted within the file system. Unmount these file
systems before the file system can be unmounted. If any file system is
mounted within a file system, this leaves open references in the source file
system at the mount point of the other file system. Use the mount command to
get a list of mounted file systems. Unmount all the file systems that are
mounted within the file system to be unmounted.
7.4.6 Removing file systems
When removing a JFS, the file system must be unmounted before it can be
removed. The command for removing file systems is rmfs.
In the case of a JFS, the rmfs command removes both the logical volume on
which the file system resides and the associated stanza in the /etc/filesystems
file. If the file system is not a JFS, the command removes only the associated
stanza in the /etc/filesystems file, as shown in the following example:
# lsvg -l testvg
testvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 1 1 open/syncd N/A
156 IBM ^ Certification Study Guide - AIX 5L Problem Determination Tools and Techniques
lv02 jfs 2 2 1 open/syncd /u/testfs
# rmfs /u/testfs
rmfs: 0506-921 /u/testfs is currently mounted.
# umount /u/testfs
# rmfs /u/testfs
rmlv: Logical volume lv02 is removed.
# lsvg -l testvg
testvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 1 1 closed/syncd N/A
This example shows how the file system testfs is removed. The first attempt fails
because the file system is still mounted. The associated logical volume lv02 is
also removed. The jfslog remains defined on the volume group.
7.4.7 Different output from du and df commands
Sometimes du and df commands are used to get a free block value. df is used to
report the total block count, and then the value returned by du -s
/filesystem_name is subtracted from that total to calculate the free block value.
However, this method of calculation yields a value that is greater than the free
block value reported by df. At AIX Version 4.1 and later, both df and du default to
512-byte units. Sample output from the du and df commands is below:
# du -s /tmp
152 /tmp
# df /tmp
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/hd3 24576 23320 6% 33 1% /tmp
Here (total from df) - (used from du) + (false free block count): 24576 - 152 =
24424.
24424 is greater than 23320. The reason for this discrepancy involves the
implementation of du and df. du -s traverses the file tree, adding up the number
of blocks allocated to each directory, symlink, and file as reported by the stat()
system call. This is how du arrives at its total value. df looks at the file system
disk block allocation maps to arrive at its total and free values.
7.4.8 Enhanced journaled file system
The enhanced journaled file system (JFS2) contains several architectural
differences over the standard JFS, including:
Variable number of i-nodes for enhanced journaled file system
JFS2 allocates i-nodes as needed. Therefore, the number of i-nodes
available is limited by the size of the file system itself.
Chapter 7. LVM, file system, and disk problem determination 157
Specifying file system block size
File system block size is specified during the file system's creation with the
crfs and mkfs commands or by using the SMIT. The decision of file system
block size should be based on the projected size of files contained by the file
system.
Identifying file system block size
The file system block size value can be identified with the lsfs command or
the System Management Interface Tool (SMIT). For application programs,
the statfs subroutine can be used to identify the file system block size.
Compatibility and migration
The enhanced journaled file system (JFS2) is a new file system and is not
compatible with AIX Version 4.
Device driver limitations
A device driver must provide disk block addressability that is the same or
smaller than the file system block size.
Performance costs
Although file systems that use block sizes smaller than 4096 bytes as their
allocation unit might require substantially less disk space than those using the
default allocation unit of 4096 bytes, the use of smaller block sizes can incur
performance degradation.
Increased allocation activity
Because disk space is allocated in smaller units for a file system with a block
size other than 4096 bytes, allocation activity can occur more often when files
or directories are repeatedly extended in size. For example, a write operation
that extends the size of a zero-length file by 512 bytes results in the allocation
of one block to the file, assuming a block size of 512 bytes. If the file size is
extended further by another write of 512 bytes, an additional block must be
allocated to the file. Applying this example to a file system with 4096-byte
blocks, disk space allocation occurs only once, as part of the first write
operation. No additional allocation activity is performed as part of the second
write operation since the initial 4096-byte block allocation is large enough to
hold the data added by the second write operation.
Increased block allocation map size
More virtual memory and file system disk space might be required to hold
block allocation maps for file systems with a block size smaller than 4096
bytes. Blocks serve as the basic unit of disk space allocation, and the
allocation state of each block within a file system is recorded in the file system
block allocation map.
158 IBM ^ Certification Study Guide - AIX 5L Problem Determination Tools and Techniques
Understanding enhanced journaled file system size limitations
The maximum size for an enhanced journaled file system is architecturally
limited to 4 Petabytes. I-nodes are dynamically allocated by JFS2, so you do
not need to consider how many i-nodes you may need when creating a JFS2
file system. You need to consider the size of the file system log.
Enhanced journaled file system log size issues
In most instances, multiple journaled file systems use a common log
configured to be 4 MB in size. When file systems exceed 2 GB or when the
total amount of file system space using a single log exceeds 2 GB, the default
log size might not be sufficient. In either case, scale log sizes upward as the
file system size increases. The JFS log is limited to a maximum size of 256
MB.
JFS2 file space allocation
File space allocation is the method by which data is apportioned physical
storage space in the operating system. The kernel allocates disk space to a
file or directory in the form of logical blocks. A logical block refers to the
division of a file or directory contents into 512, 1024, 2048, or 4096 byte units.
When a JFS2 file system is created the logical block size is specified to be
one of 512, 1024, 2048, or 4096 bytes. Logical blocks are not tangible
entities; however, the data in a logical block consumes physical storage
space on the disk. Each file or directory consists of zero or more logical
blocks.
Full and partial logical blocks
A file or directory may contain full or partial logical blocks. A full logical block
contains 512, 1024, 2048, or 4096 bytes of data, depending on the file system
block size specified when the JFS2 file system was created. Partial logical
blocks occur when the last logical block of a file or directory contains less than
the file system block size of data.
For example, a JFS2 file system with a logical block size of 4096 with a file of
8192 bytes is two logical blocks. The first 4096 bytes reside in the first logical
block and the following 4096 bytes reside in the second logical block.
Likewise, a file of 4608 bytes consists of two logical blocks. However, the last
logical block is a partial logical block containing the last 512 bytes of the file's
data. Only the last logical block of a file can be a partial logical block.
JFS2 file space allocation
The default block size is 4096 bytes. You can specify smaller block sizes with
the mkfs command during a file system's creation. Allowable fragment sizes
are 512, 1024, 2048, and 4096 bytes. You can use only one block’s size in a
file system.
Chapter 7. LVM, file system, and disk problem determination 159
The kernel allocates disk space so that only the last file system block of data
receives a partial block allocation. As the partial block grows beyond the limits
of its current allocation, additional blocks are allocated.
Block reallocation also occurs if data is added to logical blocks that represent
file holes. A file hole is an "empty" logical block located prior to the last logical
block that stores data. (File holes do not occur within directories.) These
empty logical blocks are not allocated blocks. However, as data is added to
file holes, allocation occurs. Each logical block that was not previously
allocated disk space is allocated a file system block of space.

Increasing the file system size

In many instances, the size of a file system needs to be increased because the
demand for storage has increased. In AIX, this is a common procedure, and it is
possible to do by using the chfs command, as in the following example:
# chfs -a size=+300000 /u/testfs
Filesystem size changed to 458752
This example shows how the file system testfs is extended with 300000 512-byte
blocks. When the file system is extended, the logical volume holding the JFS is
also extended, with the number of logical partitions that is needed to fulfill the
space request. If the system does not have enough free space, the volume group
can either be extended with an additional physical volume, or the size specified
for the chfs command must be lowered so that it matches the number of free
LPs.

Extending the number of max physical partitions

When adding a new disk to a volume group, you may encounter an error due to
there being too few PP descriptors for the required number of PVs. This may
occur when the new disk has a much higher capacity than existing disks in the
volume group.
This situation is typical on older installations, due to the rapid growth of storage
technology. To overcome this, a change of the volume group LVM metadata is
required.
The chvg command is used for this operation using the -t flag and applying a
factor value, as shown in the following example:
# lsvg testvg
VOLUME GROUP: testvg VG IDENTIFIER: 000bc6fd5a177ed0
VG STATE: active PP SIZE: 16 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 542 (8672 megabytes)
MAX LVs: 256 FREE PPs: 42 (672 megabytes)
LVs: 1 USED PPs: 500 (8000 megabytes)
OPEN LVs: 0 QUORUM: 2
TOTAL PVs: 1 VG DESCRIPTORS: 2
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 1 AUTO ON: yes
MAX PPs per PV: 1016 MAX PVs: 32
# chvg -t 2 testvg
0516-1193 chvg: WARNING, once this operation is completed, volume group testvg
cannot be imported into AIX 430 or lower versions. Continue (y/n) ?
y
0516-1164 chvg: Volume group testvg changed. With given characteristics testvg
can include upto 16 physical volumes with 2032 physical partitions
each.
144 IBM ^ Certification Study Guide - AIX 5L Problem Determination Tools and Techniques
This example shows that the volume group testvg with a current 9.1 GB disk has
a maximum number of 1016 PPs per physical volume. Adding a larger 18.2 GB
disk would not be possible; the maximum size of the disk is limited to 17 GB
unless the maximum number of PPs is increased. Using the chvg command to
increase the maximum number of PPs by a factor of 2 to 2032 PPs allows the
volume group to be extended with physical volumes of up to approximately 34
GB.
7.3 Disk replacement
AIX, like all operating systems, can be problematic when you have to change a
disk. AIX provides the ability to prepare the system for the change using the
LVM. You can then perform the disk replacement and then use the LVM to
restore the system back to how it was before the disk was changed. This process
manipulates not only the data on the disk itself, but is also a way of keeping the
Object Data Manager (ODM) intact.
The ODM within AIX is a database that holds device configuration details and
AIX configuration details. The function of the ODM is to store the information
between reboots, and also provide rapid access to system data, eliminating the
need for AIX commands to interrogate components for configuration information.
Since this database holds so much vital information regarding the configuration
of a machine, any changes made to the machine, such as the changing of a
defective disk, need to be done in such a way as to preserve the integrity of the
database.
7.3.1 Replacing a disk
The following scenario shows a system that has a hardware error on a physical
volume. However, since the system uses a mirrored environment, which has
multiple copies of the logical volume, it is possible to replace the disk while the
system is active. The disk hardware in this scenario are hot-swappable SCSI
disks, which permit the replacement of a disk in a production environment.
One important factor is detecting the disk error. Normally, mail is sent to the
system administrator (root account) from the Automatic Error Log Analysis
(diagela). Figure 7-1 on page 145 shows the information in such a diagnostics
mail.
Chapter 7. LVM, file system, and disk problem determination 145
Figure 7-1 Disk problem mail from Automatic Error Log Analysis (diagela)
Automatic Error Log Analysis (diagela) provides the capability to do error log
analysis whenever a permanent hardware error is logged. Whenever a
permanent hardware resource error is logged, the diagela program is invoked.
Automatic Error Log Analysis is enabled by default on all platforms.
The diagela message shows that the hdisk4 has a problem. Another way of
locating a problem is to check the state of the logical volume using the lsvg
command, as in the following example:
# lsvg -l mirrorvg
mirrorvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvdb01 jfs 500 1000 2 open/syncd /u/db01
lvdb02 jfs 500 1000 2 open/stale /u/db02
loglv00 jfslog 1 1 1 open/syncd N/A
The logical volume lvdb02 in the volume group mirrorvg is marked with the status
stale, indicating that the copies in this LV are not synchronized. Look at the error
log using the error-reporting errpt command, as in the following example:
# errpt
EAA3D429 0713121400 U S LVDD PHYSICAL PARTITION MARKED STALE
F7DDA124 0713121400 U H LVDD PHYSICAL VOLUME DECLARED MISSING
41BF2110 0713121400 U H LVDD MIRROR WRITE CACHE WRITE FAILED
35BFC499 0713121400 P H hdisk4 DISK OPERATION ERROR
146 IBM ^ Certification Study Guide - AIX 5L Problem Determination Tools and Techniques
This error information displays the reason why the LV lvdb02 is marked stale.
The hdisk4 had an DISK OPERATION ERROR and the LVDD could not write the
mirror cache.
Based on the information in the example, hdisk4 needs to be replaced. Before
taking any action on the physical disk of the mirrored LV are recommended that
you do a file system backup in case anything should go wrong. Since the other
disk of the mirrored LV is still functional, all the data should be present. If the LV
contains a database, then the respective database tools for backup of the data
should be used.
Removing a bad disk
If the system is a high-availability (24x7) system, you might decide to keep the
system running while performing the disk replacement, provided that the
hardware supports an online disk exchange with hot-swappable disks. However,
the procedure should be agreed upon by the system administrator or customer
before continuing. Use the following steps to remove a disk:
1. To remove the physical partition copy of the mirrored logical volume from the
erroneous disk, use the rmlvcopy command as follows:
# rmlvcopy lvdb02 1 hdisk4
The logical volume lvdb02 is now left with only one copy, as shown in the
following:
# lslv -l lvdb02
lvdb02:/u/db02
PV COPIES IN BAND DISTRIBUTION
hdisk3 500:000:000 21% 109:108:108:108:067
2. Reduce the volume group by removing the disk you want to replace from its
volume group:
# reducevg -f mirrorvg hdisk4
# lsvg -l mirrorvg
mirrorvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvdb01 jfs 500 1000 2 open/syncd /u/db01
lvdb02 jfs 500 500 1 open/syncd /u/db02
loglv00 jfslog 1 1 1 open/syncd N/A
3. Remove the disk as a device from the system and from the ODM database
with the rmdev command:
# rmdev -d -l hdisk4
hdisk4 deleted
This command is valid for any SCSI disk. If your system is using SSA, then an
additional step is required. Since SSA disks also define the device pdisk, the erroneous disk, use the rmlvcopy command as follows:
# rmlvcopy lvdb02 1 hdisk4
The logical volume lvdb02 is now left with only one copy, as shown in the
following:
# lslv -l lvdb02
lvdb02:/u/db02
PV COPIES IN BAND DISTRIBUTION
hdisk3 500:000:000 21% 109:108:108:108:067
2. Reduce the volume group by removing the disk you want to replace from its
volume group:
# reducevg -f mirrorvg hdisk4
# lsvg -l mirrorvg
mirrorvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvdb01 jfs 500 1000 2 open/syncd /u/db01
lvdb02 jfs 500 500 1 open/syncd /u/db02
loglv00 jfslog 1 1 1 open/syncd N/A
3. Remove the disk as a device from the system and from the ODM database
with the rmdev command:
# rmdev -d -l hdisk4
hdisk4 deleted
This command is valid for any SCSI disk. If your system is using SSA, then an
additional step is required. Since SSA disks also define the device pdisk, the
Chapter 7. LVM, file system, and disk problem determination 147
corresponding pdisk device must be deleted as well. Use the SSA menus in
SMIT to display the mapping between hdisk and pdisk. These menus can
also be used to delete the pdisk device.
4. The disk can now be safely removed from your system.
Adding a new disk
Continuing the scenario from the previous section, this section describes how to
add a new disk into a running environment. After hdisk4 has been removed, the
system is now left with the following disks:
# lsdev -Cc disk
hdisk0 Available 30-58-00-8,0 16 Bit SCSI Disk Drive
hdisk1 Available 30-58-00-9,0 16 Bit SCSI Disk Drive
hdisk2 Available 10-60-00-8,0 16 Bit SCSI Disk Drive
hdisk3 Available 10-60-00-9,0 16 Bit SCSI Disk Drive
Use the following steps to add a new disk:
1. Plug in the new disk and run the configuration manager cfgmgr command.
The cfgmgr command configures devices controlled by the Configuration
Rules object class, which is part of the device configuration database. The
cfgmgr command will see the newly inserted SCSI disk and create the
corresponding device. Although the command requires no option, the -v flag
specifies verbose output, which helps in troubleshooting, as shown in the
following:
# cfgmgr -v
cfgmgr is running in phase 2
----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/etc/methods/cfgprobe -c
/etc/drivers/coreprobe.ext"
Time: 0 LEDS: 0x539
Return code = 0
*** no stdout ****
*** no stderr ****
----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/etc/methods/defsys"
Time: 0 LEDS: 0x539
Return code = 0
***** stdout *****
sys0
.....
.....
The result is a new hdisk4 added to the system:
# lsdev -Cc disk
148 IBM ^ Certification Study Guide - AIX 5L Problem Determination Tools and Techniques
hdisk0 Available 30-58-00-8,0 16 Bit SCSI Disk Drive
hdisk1 Available 30-58-00-9,0 16 Bit SCSI Disk Drive
hdisk2 Available 10-60-00-8,0 16 Bit SCSI Disk Drive
hdisk3 Available 10-60-00-9,0 16 Bit SCSI Disk Drive
hdisk4 Available 10-60-00-12,0 16 Bit SCSI Disk Drive
2. The new hdisk must now be assigned to the volume group mirrorvg by using
the LVM extendvg command:
# extendvg mirrorvg hdisk4
3. To re-establish the mirror copy of the LV, use the mklvcopy command.
# mklvcopy lvdb02 2 hdisk4
The number of copies of LV is now two, but the LV stat is still marked as stale
because the LV copies are not synchronized with each other:
# lsvg -l mirrorvg
mirrorvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvdb01 jfs 500 1000 2 open/syncd /u/db01
lvdb02 jfs 500 1000 2 open/stale /u/db02
loglv00 jfslog 1 1 1 open/syncd N/A
4. To get a fully synchronized set of copies of the LV lvdb02, use the syncvg
command:
# syncvg -p hdisk4
The syncvg command can be used with logical volumes, physical volumes, or
volume groups. The synchronization process can be quite time consuming,
depending on the hardware characteristics and the amount of data.
After the synchronization is finished, verify the logical volume state using
either the lsvg or lslv command:
# lsvg -l mirrorvg
mirrorvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvdb01 jfs 500 1000 2 open/syncd /u/db01
lvdb02 jfs 500 1000 2 open/syncd /u/db02
loglv00 jfslog 1 1 1 open/syncd N/A
The system is now back to normal.
7.3.2 Recovering an incorrectly removed disk
If a disk was incorrectly removed from the system, and the system has been
rebooted, the synclvodm command will need to be run to rebuild the logical
volume control block, as shown in the following examples.
Chapter 7. LVM, file system, and disk problem determination 149
In the examples, a disk has been incorrectly removed from the system and the
logical volume control block needs to be rebuilt.
The disks in the system before the physical volume was removed is shown in the
following command output:
# lsdev -Cc disk
hdisk0 Available 30-58-00-8,0 16 Bit SCSI Disk Drive
hdisk1 Available 30-58-00-9,0 16 Bit SCSI Disk Drive
hdisk2 Available 10-60-00-8,0 16 Bit SCSI Disk Drive
hdisk3 Available 10-60-00-9,0 16 Bit SCSI Disk Drive
The allocation of the physical volumes before the disk was removed are shown
as follows:
# lspv
hdisk0 000bc6fdc3dc07a7 rootvg
hdisk1 000bc6fdbff75ee2 volg01
hdisk2 000bc6fdbff92812 volg01
hdisk3 000bc6fdbff972f4 volg01
The logical volumes on the volume group:
# lsvg -l volg01
volg01:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
logvol01 jfs 1000 1000 2 open/syncd /userfs01
loglv00 jfslog 1 1 1 open/syncd N/A
The logical volume distribution on the physical volumes is shown using the lslv
command:
# lslv -l logvol01
logvol01:/userfs01
PV COPIES IN BAND DISTRIBUTION
hdisk1 542:000:000 19% 109:108:108:108:109
hdisk3 458:000:000 23% 109:108:108:108:025
The system after a reboot has the following physical volumes:
# lspv
hdisk0 000bc6fdc3dc07a7 rootvg
hdisk1 000bc6fdbff75ee2 volg01
hdisk3 000bc6fdbff972f4 volg01
When trying to mount the file system on the logical volume, the error may look
similar to the following example:
# mount /userfs01
mount: 0506-324 Cannot mount /dev/logvol01 on /userfs01: There is an input or
output error.
150 IBM ^ Certification Study Guide - AIX 5L Problem Determination Tools and Techniques
To synchronize the logical volume, the following command should be run:
# synclvodm -v volg01
synclvodm: Physical volume data updated.
synclvodm: Logical volume logvol01 updated.
synclvodm: Warning, lv control block of loglv00 has been over written.
0516-622 synclvodm: Warning, cannot write lv control block data.
synclvodm: Logical volume loglv00 updated.
The system can now be repaired. If the file system data was spread across all
the disks, including the failed disk, it may need to be restored from the last
backup.
7.4 The AIX JFS
Similar to the LVM, most JFS problems can be traced to problems with the
underlying physical disk.
As with volume groups, various JFS features have been added at different levels
of AIX, which preclude those file systems being mounted if the volume group was
imported on an earlier version of AIX. Such features include large file enabled file
systems, file systems with non-default allocation group size, and JFS2.
7.4.1 Creating a JFS
In a journaled file system (JFS), files are stored in blocks of contiguous bytes.
The default block size, also referred to as fragmentation size in AIX, is 4096
bytes (4 KB). The JFS i-node contains an information structure of the file with an
array of eight pointers to data blocks. A file that is less then 32 KB is referenced
directly from the i-node.
A larger file uses a 4-KB block, referred to as an indirect block, for the addressing
of up to 1024 data blocks. Using an indirect block, a file size of 1024 x 4 KB = 4
MB is possible.
For files larger than 4 MB, a second block, the double indirect block, is used. The
double indirect block points to 512 indirect blocks, providing the possible
addressing of 512 x 1024 x 4 KB = 2 GB files. Figure 7-2 on page 151 illustrates
the addressing using double indirection.

Suresh's passion for AIX