Pages

Friday 22 June 2012

Kdump Configuration


Kdump Configuration 
Configure kdump to analyse Linux kernel crash and kernel panics.

Linux kernel crash and panics and the reason behind it is analyzed with the help of kdump utility.
Kdump is a crash dumping mechanism and it uses the context of another kernel at boot time to capture the crash and core dump. The context of the kernel reserves a small amount of memory, and its only purpose is to capture the core dump of the crashed kernel.

Following are the steps to configure kdump from command prompt.

Login as a user root and edit /boot/grub/grub.conf file, and add the crashkernel=M parameter to the list of kernel options. After editing grub.conf file looks like as follows:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/sda3
# initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-194.8.1.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-194.8.1.el5 ro root=/dev/sda3 crashkernel=128M
initrd /initrd-2.6.18-194.8.1.el5.img

Now configure the target location in /etc/kdump.conf file. The default file location to store dump file i.e. vmcore files in the /var/crash/ directory of the local system.
It is also possible to save the file on remote location using NFS and SCP, but these techniques are not written here.

Edit /etc/kdump.conf file:
To change the local directory in which the core dump is to be saved, remove the hash sign (“#”) from the beginning of the #path /var/crash line, and replace the value with a desired directory path.

ext3 /dev/sda4
path /usr/local/cores

To write the dump directly to a device, remove the hash sign (“#”) from the beginning of the #raw /dev/sdc5 line, and replace the value with a desired device name. For example:
raw /dev/sdb1

Also possible to configure dump core using core collector. To reduce the size of the vmcore we can use makedumpfile utility.
To enable core collector search for core_collector directive in /etc/kdump.conf file and uncomment it if it is not.

core_collector makedumpfile –c

To remove the unwanted pages from the dump file we can pass –d option to core_collector.

core_collector makedumpfile -d –c
where Numeric value is a sum of values of pages we want to omit.

Option Page type to omit
1 Zero Pages
2 Cache Pages
4 Cache Private
8 User Pages
16 Blank Pages

Now all we need is to start the kdump service on boot time.

#chkconfig kdump on
Start the kdump service.

# service kdump start
No kdump initial ramdisk found. [WARNING]
Rebuilding /boot/initrd-2.6.18-194.8.1.el5kdump.img
Starting kdump: [ OK ]

Test the kdump configuration.
To test the configuration, reboot the system with kdump enabled, and make sure that the service is running:

# service kdump status
Kdump is operational
Then type the following commands at a shell prompt:
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

The above command makes Linux kernel to crash, and the YYYY-MM-DD-HH:MM/vmcore file will be copied to the location we have selected in the configuration.

Analyze the core dump:
To analyze the crash kernel we need to packages and they are crash and kernel-debuginfo.

Now all we need to start the crash utility.

crash /var/crash/timestamp/vmcore /usr/lib/debug/lib/modules/kernel/vmlinux.

At the crash prompt we can run several commands like
Crash> log ## Display the mesg buffer
Crash>bt ## backtrace
Crash>[ps | vm | files] ## Refer man page for more options.

Sunday 17 June 2012

Repairing File Systems with fsck in AIX V5 (LED 517 or 518)

Question
Repairing File Systems with fsck in AIX V5 or V6 (LED 517 or 518)

Answer
This document covers the use of the fsck (file system check) command in Maintenance mode to repair inconsistencies in file systems. The procedure described is useful when file system corruption in the primary root file systems is suspected or, in many cases, to correct an IPL hang at LED value 517, 518, or LED value 555.

This document applies to AIX version 5.x, 6.x, and VIOS LPAR. 

Recovery procedure 

Boot your system into a limited function maintenance shell (Service, or Maintenance mode) from AIX bootable media to perform file system checks on your root file systems.

With bootable media of the same version and level as the system, boot the system. If this is a VIOS LPAR, use the correct VIOS media. The bootable media can be any ONE of the following: 

Bootable CD-ROM
NON_AUTOINSTALL mksysb
Bootable Install Tape

Follow the screen prompts to the following menu: 

Welcome to Base Operating System 
Installation and Maintenance

Choose Start Maintenance Mode for System Recovery (Option 3).
The next screen displays the Maintenance menu.


Choose Access a Root Volume Group (Option 1).
The next screen displays a warning that indicates you will not be able to return to the Base OS menu without rebooting.


Choose 0 continue.
The next screen displays information about all volume groups on the system.


Select the root volume group by number.
Choose Access this volume group and start a shell before mounting file systems (Option 2).

If you get errors from the preceding option, do not continue with the rest of this procedure. Correct the problem causing the error. If you need assistance correcting the problem causing the error, contact one of the following:
local branch office
your point of sale
your AIX support center

If no errors occur, proceed with the following steps.

Run the following commands to check and repair file systems.
NOTE: The -y option gives fsck permission to repair file system corruption when necessary. This flag can be used to avoid having to manually answer multiple confirmation prompts, however, use of this flag can cause permanent, unnecessary data loss in some situations. 

fsck /dev/hd4 
fsck /dev/hd2 
fsck /dev/hd3 
fsck /dev/hd9var 
fsck /dev/hd1

To format the default jfslog for the rootvg Journaled File System (JFS) file systems, run the following command:

 /usr/sbin/logform /dev/hd8 

Answer yes when asked if you want to destroy the log.


If your system is hanging at LED 517 or 518 during a Normal mode boot, it is possible the /etc/filesystems file is corrupt or missing. To temporarily replace the disk-based /etc/filesystems file, run the following commands: 

mount /dev/hd4 /mnt 
mv /mnt/etc/filesystems /mnt/etc/filesystems.[MMDDYY] 
cp /etc/filesystems /mnt/etc/filesystems 
umount /mnt


MMDDYY represents the current two-digit representation of the Month, Day and Year, respectively.


Type exit to exit from the shell. The file systems should automatically mount after you type exit. If you receive error messages, reboot into a limited function maintenance shell again to attempt to address the failure causes.


If you have user-created file systems in the rootvg volume group, run fsck on them now. Enter:

 fsck /dev/[LVname]


LVname is the name of your user-defined logical volume.


If you used the preceding procedure to temporarily replace the /etc/filesystems file, and you have user-created file systems in the rootvg volume group, you must also run the following command: 

imfs -l /dev/[LVname]



If you used the preceding procedure to temporarily replace the /etc/filesystems file, also run the following command: 

imfs [VGname]


The preceding commands can be repeated for each user-defined volume group on the system.


If your system was hanging at LED 517 or 518 and you are unable to activate non-rootvg volume groups in Service mode, you can manually edit the /etc/filesystems file and add the appropriate entries.

The file /etc/filesystems.MMDDYY saved in the preceding steps may be used as a reference if it is readable. However, the imfs method is preferred since it uses information stored in the logical volume control block to re-populate the /etc/filesystems file.


If your system has a mode select key, turn it to the Normal position.


Reboot the system into Normal mode using the following command:

sync;sync;sync;reboot 
Twitter Bird Gadget