Monday, 20 June 2011

Error logging in AIX


Error logging in AIX

The error logging process begins when an operating system module detects an error.The error-detecting segment of code then sends error information to either the errsave kernel service and errlast kernel service for pending system crash or the errlog subroutine to log an application error, where the information is, in
turn, written to the /dev/error special file.


This process then adds a time stamp to the collected data. The errdemon daemon constantly checks the /dev/error file for new entries, and when new data is written, the daemon conducts a series of operations.


Before an entry is written to the error log, the errdemon daemon compares the label sent by the kernel or application code to the contents of the error record template repository. If the label matches an item in the repository, the daemon collects additional data from other parts of the system.


To create an entry in the error log, the errdemon daemon retrieves the appropriate template from the repository, the resource name of the unit that detected the error, and detailed data. Also, if the error signifies a
hardware-related problem and hardware Vital Product Data (VPD) exists, the daemon retrieves the VPD from the Object Data Manager (ODM). When you access the error log, either through SMIT or with the errpt command, the error log is formatted according to the error template in the error template repository
and presented in either a summary or detailed report.


The system administrator can look at the error log to determine what caused a failure, or to periodically check the health of the system when it is running.


The software components that allow the AIX kernel and commands to log errors to the error log are contained in the fileset bos.rte.serv_aid. This fileset is automatically installed as part of the AIX installation process.


The commands that allow you to view and manipulate the error log, such as the errpt and errclear commands, are contained in the fileset called bos.sysmgt.serv_aid.


Error log file processing:-


The error log is used by system administrators. The error log contains error IDs, time stamp, error type, error class, and resource names associated with each error.

Error templates

The error template contains numbers that correspond to error messages in the codepoint catalogue.These are sometimes referred to as codepoint messages. These error messages are used to communicate possible causes and to recommend actions for an error. They are also used to explain the detailed data
that may accompany the error.

The template also is used to indicate whether or not the error should be reportable, loggable, or alertable.


Each template in the template file contains unique information that corresponds to a unique error. The contents of the error template are used to calculate the error ID of the error. The error numbers correspond to error messages indicating causes and recommendations for action.




The templates in the errtmplt file can be viewed by invoking the errpt command with the -t flag.


Error messages:-



Error messages are numbered and placed in a separate file, called the codepoint
catalogue.
The codepoint catalogue can be viewed by using the errmsg command with the -w flag.


Error log management:-



You can generate an error report from entries in an error log. The errpt command allows flags for selecting errors that match specific criteria.

Viewing the error log:-


There are two main ways of viewing the error log:


1)You can use the System Management Interface Tool (SMIT) with a fast path to run the errpt command. To  
    use the SMIT fast path, enter:
    # smit errpt
2)You can also view the error log from the command line using the errpt command.
    errpt will display a summary report
    

In addition to the summary report, the errpt command can be used with various flags to generate a customized report detailing the error log entries you are interested in:

  To display information about errors in the error log file in detailed format, enter the following command:
    # errpt -a

  In AIX 5L Version 5.1, the errpt command now supports an intermediate output format by using the -A
   flag,
   # errpt -A -j identifier
   Where identifier is the eight digit hexadecimal unique error identifier.

   To display a detailed report of all errors logged for a particular error identifier,
    enter the following command:
    # errpt -a -j identifier
    Where identifier is the eight digit hexadecimal unique error identifier.


  To clear all entries from the error log, enter the following command:
   # errclear 0

 To stop error logging, enter the following command:
  # /usr/lib/errstop

  To start error logging, enter the following command:
  # /usr/lib/errdemon

To list the current setting of error log file and buffer size and duplicate information, enter the following
  command:
  # /usr/lib/errdemon -l

If you want to change the buffer size and error log file size, you can use the errdemon command. For further detail, refer to the manual page for errdemon.



Note: When you remove the errlog file accidently, use the /usr/lib/errstop and /usr/lib/errdemon commands in sequence to recover the file. errdemon creates the errlog file if the file does not exist.


Software service aid configuration information is stored in the /etc/objrepos/SWservAt ODM database. This ODM class is used to store information about the location and size of various log files used by the system.


By default, AIX runs a cron job that deletes all hardware error log entries older than 90 days daily at 12 AM and all software and operator message error log entries older than 30 days daily at 11 AM.



Display the contents of the boot log
alog -o -t boot

Display the contents of the Display the contents of the boot log
alog -o -t boot

Display the contents of the console log
alog -o -t console

List all log types that alog knows
alog -L

Display the contents of the system error log
errpt (Add -a or -A for varying levels of verbosity)

Clear all errors up until x days ago.
errclear x

List info on error ID FE2DEE00 (IDENTIFIER column in errpt output)
errpt -aDj FE2DEE00

Put a \tail" on the error log
errpt -c

List all errors that happened today
errpt -s `date +%m%d0000%y`

To list all errors on hdisk0
errpt -N hdisk0

To list details about the error log
/usr/lib/errdemon -l

To change the size of the error log to 2 MB
/usr/lib/errdemon -s 2097152

syslog.conf line to send all messages to log le
*.debug /var/log/messages
alog -o -t console

List all log types that alog knows
alog -L

Display the contents of the system error log
errpt (Add -a or -A for varying levels of verbosity)

Clear all errors up until x days ago.
errclear x

List info on error ID FE2DEE00 (IDENTIFIER column in errpt output)
errpt -aDj FE2DEE00

Put a \tail" on the error log
errpt -c

List all errors that happened today
errpt -s `date +%m%d0000%y`

To list all errors on hdisk0
errpt -N hdisk0

To list details about the error log
/usr/lib/errdemon -l

To change the size of the error log to 2 MB
/usr/lib/errdemon -s 2097152

syslog.conf line to send all messages to log le
*.debug /var/log/messages








 








No comments:

Post a Comment

Twitter Bird Gadget