Pages

Wednesday, 31 August 2011

Linux disk cache


Experiments and fun with the Linux disk cache

Hopefully you are now convinced that Linux didn't just eat your ram. Here are some interesting things you can do to learn how the disk cache works.

Effects of disk cache on application memory allocation

Since I've already promised that disk cache doesn't prevent applications from getting the memory they want, let's start with that. Here is a C app (munch.c) that gobbles up as much memory as it can, or to a specified limit:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) {
    int max = -1;
    int mb = 0;
    char* buffer;

    if(argc > 1)
        max = atoi(argv[1]);

    while((buffer=malloc(1024*1024)) != NULL && mb != max) {
        memset(buffer, 0, 1024*1024);
        mb++;
        printf("Allocated %d MB\n", mb);
    }
    
    return 0;
}
Running out of memory isn't fun, but the OOM killer should end just this process and hopefully the rest will be unperturbed. We'll definitely want to disable swap for this, or the app will gobble up that as well.
$ sudo swapoff -a

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504       1490         14          0         24        809
-/+ buffers/cache:        656        848
Swap:            0          0          0

$ gcc munch.c -o munch

$ ./munch
Allocated 1 MB
Allocated 2 MB
(...)
Allocated 877 MB
Allocated 878 MB
Allocated 879 MB
Killed

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504        650        854          0          1         67
-/+ buffers/cache:        581        923
Swap:            0          0          0

$
Even though it said 14MB "free", that didn't stop the application from grabbing 879MB. Afterwards, the cache is pretty empty, but it will gradually fill up again as files are read and written. Give it a try.

Effects of disk cache on swapping

I also said that disk cache won't cause applications to use swap. Let's try that as well, with the same 'munch' app as in the last experiment. This time we'll run it with swap on, and limit it to a few hundred megabytes:
$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504       1490         14          0         10        874
-/+ buffers/cache:        605        899
Swap:         2047          6       2041

$ ./munch 400
Allocated 1 MB
Allocated 2 MB
(...)
Allocated 399 MB
Allocated 400 MB

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504       1090        414          0          5        485
-/+ buffers/cache:        598        906
Swap:         2047          6       2041

munch ate 400MB of ram, which was taken from the disk cache without resorting to swap. Likewise, we can fill the disk cache again and it will not start eating swap either. If you run watch free -m in one terminal, and find . -type f -exec cat {} + > /dev/null in another, you can see that "cached" will rise while "free" falls. After a while, it tapers off but swap is never touched1

Clearing the disk cache

For experimentation, it's very convenient to be able to drop the disk cache. For this, we can use the special file /proc/sys/vm/drop_caches. By writing 3 to it, we can clear most of the disk cache:
$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504       1471         33          0         36        801
-/+ buffers/cache:        633        871
Swap:         2047          6       2041

$ echo 3 | sudo tee /proc/sys/vm/drop_caches 
3

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504        763        741          0          0        134
-/+ buffers/cache:        629        875
Swap:         2047          6       2041

Notice how "buffers" and "cached" went down, free mem went up, and free+buffers/cache stayed the same.

Effects of disk cache on load times

Let's make two test programs, one in Python and one in Java. Python and Java both come with pretty big runtimes, which have to be loaded in order to run the application. This is a perfect scenario for disk cache to work its magic.
$ cat hello.py
print "Hello World! Love, Python"

$ cat Hello.java
class Hello { 
    public static void main(String[] args) throws Exception {
        System.out.println("Hello World! Regards, Java");
    }
}

$ javac Hello.java

$ python hello.py
Hello World! Love, Python

$ java Hello
Hello World! Regards, Java

$ 
Our hello world apps work. Now let's drop the disk cache, and see how long it takes to run them.
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3

$ time python hello.py
Hello World! Love, Python

real	0m1.026s
user	0m0.020s
sys	    0m0.020s

$ time java Hello
Hello World! Regards, Java

real	0m2.174s
user	0m0.100s
sys	    0m0.056s

$ 
Wow. 1 second for Python, and 2 seconds for Java? That's a lot just to say hello. However, now all the file required to run them will be in the disk cache so they can be fetched straight from memory. Let's try again:
$ time python hello.py
Hello World! Love, Python

real    0m0.022s
user    0m0.016s
sys     0m0.008s

$ time java Hello
Hello World! Regards, Java

real    0m0.139s
user    0m0.060s
sys     0m0.028s

$ 
Yay! Python now runs in just 22 milliseconds, while java uses 139ms. That's a 95% improvement! This works the same for every application!

Effects of disk cache on file reading

Let's make a big file and see how disk cache affects how fast we can read it. I'm making a 200mb file, but if you have less free ram, you can adjust it.
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504        546        958          0          0         85
-/+ buffers/cache:        461       1043
Swap:         2047          6       2041

$ dd if=/dev/zero of=bigfile bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 6.66191 s, 31.5 MB/s

$ ls -lh bigfile
-rw-r--r-- 1 vidar vidar 200M 2009-04-25 12:30 bigfile

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1504        753        750          0          0        285
-/+ buffers/cache:        468       1036
Swap:         2047          6       2041

$ 

Since the file was just written, it will go in the disk cache. The 200MB file caused a 200MB bump in "cached". Let's read it, clear the cache, and read it again to see how fast it is:
$ time cat bigfile > /dev/null

real    0m0.139s
user    0m0.008s
sys     0m0.128s

$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3

$ time cat bigfile > /dev/null

real    0m8.688s
user    0m0.020s
sys     0m0.336s

$ 
That's more than fifty times faster!

Conclusions

The Linux disk cache is very unobtrusive. It uses spare memory to greatly increase disk access speeds, and without taking any memory away from applications. A fully used store of ram on Linux is efficient hardware use, not a warning sign.


1. This is somewhat oversimplified. While newly allocated memory will always be taken from the disk cache instead of swap, Linux can be configured to preemptively swap out other unused applications in the background to free up memory for cache. The is tunable through the 'swappiness' setting, accessible through /proc/sys/vm/swappiness.
A server might want to swap out unused apps to speed up disk access of running ones (making the system faster), while a desktop system might want to keep apps in memory to prevent lag when the user finally uses them (making the system more responsive). This is the subject of much debate.

Clear filesystem memory cache


How to clear or drop the cache buffer pages from Linux memory

Introduction

Cache in Linux memory is where the Kernel stores the information it may need later, as memory is incredible faster than disk, it is great that the Linux Kernel takes care about that.
Anyway you can also manipulate how the cache behaves, there usually no need to do that, as Linux Operating system is very efficient in managing your computer memory, and will automatically free the RAM and drop the cache if some application needs memory. Let’s see how to force Linux to drop the cache from memory.


Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free.



Since Kernel 2.6.16, you can control how cache behaves, there are four possible “positions” for the switch.
0 -> Will give the Kernel full control to the cache memory 
1 -> Will free the page cache 
2 -> Will free dentries and inodes 
3 -> Will free dentries and inodes as well as page cache

So, just enter those values to the file /proc/sys/vm/drop_caches, with echo, and as root:






* sync; echo 0 > /proc/sys/vm/drop_caches


* sync; echo 1 > /proc/sys/vm/drop_caches

*sync; echo 2 > /proc/sys/vm/drop_caches

*sync; echo 3 > /proc/sys/vm/drop_caches

Better use sysctl instead of echoing:

/sbin/sysctl vm.drop_caches=3
sync
sh -c "echo 1 > /proc/sys/vm/drop_caches"

This file contains the documentation for the sysctl files in /proc/sys/vm and is valid for Linux kernel version 2.6.29. The files in this directory can be used to tune the operation of the virtual memory (VM) subsystem of the Linux kernel and the writeout of dirty data to disk. Default values and initialization routines for most of these files can be found in mm/swap.c. Currently, these files are in /proc/sys/vm: -
block_dump -
compact_memory -
dirty_background_bytes - 
dirty_background_ratio - 
dirty_bytes - 
dirty_expire_centisecs - 
dirty_ratio - 
dirty_writeback_centisecs - 
drop_caches - 
extfrag_threshold - 
hugepages_treat_as_movable - 
hugetlb_shm_group - 
laptop_mode - 
legacy_va_layout - 
lowmem_reserve_ratio - 
max_map_count - 
memory_failure_early_kill - 
memory_failure_recovery - 
min_free_kbytes - 
min_slab_ratio - 
min_unmapped_ratio - 
mmap_min_addr - 
nr_hugepages - 
nr_overcommit_hugepages - 
nr_pdflush_threads -
nr_trim_pages (only if CONFIG_MMU=n) - 
numa_zonelist_order - 
oom_dump_tasks - 
oom_kill_allocating_task - 
overcommit_memory - 
overcommit_ratio - 
page-cluster - 
panic_on_oom - 
percpu_pagelist_fraction - 
stat_interval - 
swappiness - 
vfs_cache_pressure - 
zone_reclaim_mode ============================================================== 
block_dump 
 block_dump enables block I/O debugging when set to a nonzero value. ============================================================== compact_memory 
 Available only when CONFIG_COMPACTION is set. When 1 is written to the file, all zones are compacted such that free memory is available in contiguous blocks where possible. This can be important for example in the allocation of huge pages although processes will also directly compact memory as required.
 ============================================================== dirty_background_bytes
Contains the amount of dirty memory at which the pdflush background writeback daemon will start writeback. Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only one of them may be specified at a time. When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and the other appears as 0 when read. ============================================================== dirty_background_ratio 
Contains, as a percentage of total system memory, the number of pages at which the pdflush background writeback daemon will start writing out dirty data. ============================================================== 
dirty_bytes 
Contains the amount of dirty memory at which a process generating disk writes will itself start writeback. Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be specified at a time. When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and the other appears as 0 when read. Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any value lower than this limit will be ignored and the old configuration will be retained. ============================================================== dirty_expire_centisecs 
This tunable is used to define when dirty data is old enough to be eligible for writeout by the pdflush daemons. It is expressed in 100'ths of a second. Data which has been dirty in-memory for longer than this interval will be written out next time a pdflush daemon wakes up. ============================================================== 
dirty_ratio 
Contains, as a percentage of total system memory, the number of pages at which a process which is generating disk writes will itself start writing out dirty data. ============================================================== dirty_writeback_centisecs 
The pdflush writeback daemons will periodically wake up and write `old' data out to disk. This tunable expresses the interval between those wakeups, in 100'ths of a second. Setting this to zero disables periodic writeback altogether. ============================================================== 
drop_caches 
Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free. To free pagecache: echo 1 > /proc/sys/vm/drop_caches To free dentries and inodes: echo 2 > /proc/sys/vm/drop_caches To free pagecache, dentries and inodes: echo 3 > /proc/sys/vm/drop_caches As this is a non-destructive operation and dirty objects are not freeable, the user should run `sync' first. ============================================================== extfrag_threshold 
This parameter affects whether the kernel will compact memory or direct reclaim to satisfy a high-order allocation. /proc/extfrag_index shows what the fragmentation index for each order is in each zone in the system. Values tending towards 0 imply allocations would fail due to lack of memory, values towards 1000 imply failures are due to fragmentation and -1 implies that the allocation will succeed as long as watermarks are met. The kernel will not compact memory in a zone if the fragmentation index is <= extfrag_threshold. The default value is 500. ============================================================== hugepages_treat_as_movable 
This parameter is only useful when kernelcore= is specified at boot time to create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero value written to hugepages_treat_as_movable allows huge pages to be allocated from ZONE_MOVABLE. Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge pages pool can easily grow or shrink within. Assuming that applications are not running that mlock() a lot of memory, it is likely the huge pages pool can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value into nr_hugepages and triggering page reclaim. ============================================================== hugetlb_shm_group hugetlb_shm_group 
contains group id that is allowed to create SysV shared memory segment using hugetlb page. ============================================================== 
 laptop_mode laptop_mode is a knob that controls "laptop mode". ============================================================== legacy_va_layout If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel will use the legacy (2.4) layout for all processes. ============================================================== lowmem_reserve_ratio 
For some specialised workloads on highmem machines it is dangerous for the kernel to allow process memory to be allocated from the "lowmem" zone. This is because that memory could then be pinned via the mlock() system call, or by unavailability of swapspace. And on large highmem machines this lack of reclaimable lowmem memory can be fatal. So the Linux page allocator has a mechanism which prevents allocations which _could_ use highmem from using too much lowmem. This means that a certain amount of lowmem is defended from the possibility of being captured into pinned user memory. (The same argument applies to the old 16 megabyte ISA DMA region. This mechanism will also defend that region from allocations which could use highmem or lowmem). The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is in defending these lower zones. If you have a machine which uses highmem or ISA DMA and your applications are using mlock(), or if you are running with no swap then you probably should change the lowmem_reserve_ratio setting. The lowmem_reserve_ratio is an array. You can see them by reading this file. - % cat /proc/sys/vm/lowmem_reserve_ratio 256 256 32 - Note: # of this elements is one fewer than number of zones. Because the highest zone's value is not necessary for following calculation. But, these values are not used directly. The kernel calculates # of protection pages for each zones from them. These are shown as array of protection pages in /proc/zoneinfo like followings. (This is an example of x86-64 box). Each zone has an array of protection pages like this. - Node 0, zone DMA pages free 1355 min 3 low 3 high 4 : : numa_other 0 protection: (0, 2004, 2004, 2004) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pagesets cpu: 0 pcp: 0 : - These protections are added to score to judge whether this zone should be used for page allocation or should be reclaimed. In this example, if normal pages (index=2) are required to this DMA zone and watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should not be used because pages_free(1355) is smaller than watermark + protection[2] (4 + 2004 = 2008). If this protection value is 0, this zone would be used for normal page requirement. If requirement is DMA zone(index=0), protection[0] (=0) is used. zone[i]'s protection[j] is calculated by following expression. (i < j): zone[i]->protection[j] = (total sums of present_pages from zone[i+1] to zone[j] on the node) / lowmem_reserve_ratio[i]; (i = j): (should not be protected. = 0; (i > j): (not necessary, but looks 0) The default values of lowmem_reserve_ratio[i] are 256 (if zone[i] means DMA or DMA32 zone) 32 (others). As above expression, they are reciprocal number of ratio. 256 means 1/256. # of protection pages becomes about "0.39%" of total present pages of higher zones on the node. If you would like to protect more pages, smaller values are effective. The minimum value is 1 (1/1 -> 100%). ============================================================== max_map_count: This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries. While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation. The default value is 65536. ============================================================= memory_failure_early_kill: Control how to kill processes when uncorrected memory error (typically a 2bit error in a memory module) is detected in the background by hardware that cannot be handled by the kernel. In some cases (like the page still having a valid copy on disk) the kernel will handle the failure transparently without affecting any applications. But if there is no other uptodate copy of the data it will kill to prevent any data corruptions from propagating. 1: Kill all processes that have the corrupted and not reloadable page mapped as soon as the corruption is detected. Note this is not supported for a few types of pages, like kernel internally allocated data or the swap cache, but works for the majority of user pages. 0: Only unmap the corrupted page from all processes and only kill a process who tries to access it. The kill is done using a catchable SIGBUS with BUS_MCEERR_AO, so processes can handle this if they want to. This is only active on architectures/platforms with advanced machine check handling and depends on the hardware capabilities. Applications can override this setting individually with the PR_MCE_KILL prctl ============================================================== memory_failure_recovery Enable memory failure recovery (when supported by the platform) 1: Attempt recovery. 0: Always panic on a memory failure. ============================================================== min_free_kbytes: This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size. Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads. Setting this too high will OOM your machine instantly. ============================================================= 
min_slab_ratio: This is available only on NUMA kernels. A percentage of the total pages in each zone. On Zone reclaim (fallback from the local zone occurs) slabs will be reclaimed if more than this percentage of pages in a zone are reclaimable slab pages. This insures that the slab growth stays under control even in NUMA systems that rarely perform global reclaim. The default is 5 percent. Note that slab reclaim is triggered in a per zone / node fashion. The process of reclaiming slab memory is currently not node specific and may not be fast. ============================================================= min_unmapped_ratio: This is available only on NUMA kernels. This is a percentage of the total pages in each zone. Zone reclaim will only occur if more than this percentage of pages are in a state that zone_reclaim_mode allows to be reclaimed. If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared against all file-backed unmapped pages including swapcache pages and tmpfs files. Otherwise, only unmapped pages backed by normal files but not tmpfs files and similar are considered. The default is 1 percent. 
============================================================== mmap_min_addr This file indicates the amount of address space which a user process will be restricted from mmapping. Since kernel null dereference bugs could accidentally operate based on the information in the first couple of pages of memory userspace processes should not be allowed to write to them. By default this value is set to 0 and no protections will be enforced by the security module. Setting this value to something like 64k will allow the vast majority of applications to work correctly and provide defense in depth against future potential kernel bugs. ============================================================== 
nr_hugepages Change the minimum size of the hugepage pool. See Documentation/vm/hugetlbpage.txt ============================================================== nr_overcommit_hugepages Change the maximum size of the hugepage pool. The maximum is nr_hugepages + nr_overcommit_hugepages.  ============================================================== nr_pdflush_threads The current number of pdflush threads. This value is read-only. The value changes according to the number of dirty pages in the system. When necessary, additional pdflush threads are created, one per second, up to nr_pdflush_threads_max. ============================================================== 
nr_trim_pages This is available only on NOMMU kernels. This value adjusts the excess page trimming behaviour of power-of-2 aligned NOMMU mmap allocations. A value of 0 disables trimming of allocations entirely, while a value of 1 trims excess pages aggressively. Any value >= 1 acts as the watermark where trimming of allocations is initiated. The default value is 1. See Documentation/nommu-mmap.txt for more information. 
============================================================== numa_zonelist_order This sysctl is only for NUMA. 'where the memory is allocated from' is controlled by zonelists. (This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation. you may be able to read ZONE_DMA as ZONE_DMA32...) In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following. ZONE_NORMAL -> ZONE_DMA This means that a memory allocation request for GFP_KERNEL will get memory from ZONE_DMA only when ZONE_NORMAL is not available. In NUMA case, you can think of following 2 types of order. Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA. Type(A) offers the best locality for processes on Node(0), but ZONE_DMA will be used before ZONE_NORMAL exhaustion. This increases possibility of out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small. Type(B) cannot offer the best locality but is more robust against OOM of the DMA zone. Type(A) is called as "Node" order. Type (B) is "Zone" order. "Node order" orders the zonelists by node, then by zone within each node. Specify "[Nn]ode" for node order "Zone Order" orders the zonelists by zone type, then by node within each zone. Specify "[Zz]one" for zone order. Specify "[Dd]efault" to request automatic configuration. Autoconfiguration will select "node" order in following case. (1) if the DMA zone does not exist or (2) if the DMA zone comprises greater than 50% of the available memory or (3) if any node's DMA zone comprises greater than 60% of its local memory and the amount of local memory is big enough. Otherwise, "zone" order will be selected. Default order is recommended unless this is causing problems for your system/application. ============================================================== oom_dump_tasks Enables a system-wide task dump (excluding kernel threads) to be produced when the kernel performs an OOM-killing and includes such information as pid, uid, tgid, vm size, rss, cpu, oom_adj score, and name. This is helpful to determine why the OOM killer was invoked and to identify the rogue task that caused it. If this is set to zero, this information is suppressed. On very large systems with thousands of tasks it may not be feasible to dump the memory state information for each one. Such systems should not be forced to incur a performance penalty in OOM conditions when the information may not be desired. If this is set to non-zero, this information is shown whenever the OOM killer actually kills a memory-hogging task. The default value is 1 (enabled). ============================================================== oom_kill_allocating_task This enables or disables killing the OOM-triggering task in out-of-memory situations. If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed. If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. This avoids the expensive tasklist scan. If panic_on_oom is selected, it takes precedence over whatever value is used in oom_kill_allocating_task. The default value is 0. ============================================================== overcommit_memory: This value contains a flag that enables memory overcommitment. When this flag is 0, the kernel attempts to estimate the amount of free memory left when userspace requests more memory. When this flag is 1, the kernel pretends there is always enough memory until it actually runs out. When this flag is 2, the kernel uses a "never overcommit" policy that attempts to prevent any overcommit of memory. This feature can be very useful because there are a lot of programs that malloc() huge amounts of memory "just-in-case" and don't use much of it. The default value is 0. See Documentation/vm/overcommit-accounting and security/commoncap.c::cap_vm_enough_memory() for more information. ============================================================== overcommit_ratio: When overcommit_memory is set to 2, the committed address space is not permitted to exceed swap plus this percentage of physical RAM. See above. ============================================================== page-cluster page-cluster controls the number of pages which are written to swap in a single attempt. The swap I/O size. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive. ============================================================= 
panic_on_oom This enables or disables panic on out-of-memory feature. If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive. If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet. If this is set to 2, the kernel panics compulsorily even on the above-mentioned. Even oom happens under memory cgroup, the whole system panics. The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover. panic_on_oom=2+kdump gives you very strong tool to investigate why oom happens. You can get snapshot. ============================================================= percpu_pagelist_fraction This is the fraction of pages at most (high mark pcp->high) in each zone that are allocated for each per cpu page list. The min value for this is 8. It means that we don't allow more than 1/8th of pages in each zone to be allocated in any single per_cpu_pagelist. This entry only changes the value of hot per cpu pagelists. User can specify a number like 100 to allocate 1/100th of each zone to each per cpu page list. The batch value of each per cpu pagelist is also updated as a result. It is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) The initial value is zero. Kernel does not use this value at boot time to set the high water marks for each per cpu page list. ============================================================== 
stat_interval The time interval between which vm statistics are updated. The default is 1 second. ============================================================== 
swappiness This control is used to define how aggressive the kernel will swap memory pages. Higher values will increase agressiveness, lower values decrease the amount of swap. The default value is 60. ============================================================== vfs_cache_pressure ------------------ Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes. ============================================================== zone_reclaim_mode: Zone_reclaim_mode allows someone to set more or less aggressive approaches to reclaim memory when a zone runs out of memory. If it is set to zero then no zone reclaim occurs. Allocations will be satisfied from other zones / nodes in the system. This is value ORed together of 1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4 = Zone reclaim swaps pages zone_reclaim_mode is set during bootup to 1 if it is determined that pages from remote zones will cause a measurable performance reduction. The page allocator will then reclaim easily reusable pages (those page cache pages that are currently not used) before allocating off node pages. It may be beneficial to switch off zone reclaim if the system is used for a file server and all of memory should be used for caching files from disk. In that case the caching effect is more important than data locality. Allowing zone reclaim to write out pages stops processes that are writing large amounts of data from dirtying pages on other nodes. Zone reclaim will write out dirty pages if a zone fills up and so effectively throttle the process. This may decrease the performance of a single process since it cannot use all of system memory to buffer the outgoing writes anymore but it preserve the memory on other nodes so that the performance of other processes running on other nodes will not be affected. Allowing regular swap effectively restricts allocations to the local node unless explicitly overridden by memory policies or cpuset configurations.





Monday, 29 August 2011

Configuring Multi-Path I/O for AIX client logical partitions


Scenario: Configuring Multi-Path I/O for AIX client logical partitions

Multi-Path I/O (MPIO) helps provide increased availability of virtual SCSI resources by providing redundant paths to the resource. This topic describes how to set up Multi-Path I/O for AIX® client logical partitions.
In order to provide MPIO to AIX client logical partitions, you must have two Virtual I/O Server logical partitions configured on your system. This procedure assumes that the disks are already allocated to both the Virtual I/O Server logical partitions involved in this configuration.
To configure MPIO, follow these steps. In this scenario, hdisk5 in the first Virtual I/O Server logical partition, and hdisk7 in the second Virtual I/O Server logical partition, are used in the configuration.
The following figure shows the configuration that will be completed during this scenario.
An illustration of an MPIO configuration with two Virtual I/O Server logical partitions.
Using the preceding figure as a guide, follow these steps:
  1. Using the HMC, create SCSI server adapters on the two Virtual I/O Server logical partitions.
  2. Using the HMC, create two virtual client SCSI adapters on the client logical partitions, each mapping to one of the Virtual I/O Server logical partitions.
  3. On either of the Virtual I/O Server logical partitions, determine which disks are available by typinglsdev -type disk. Your results look similar to the following:
    name            status     description
    
    hdisk3          Available  MPIO Other FC SCSI Disk Drive
    hdisk4          Available  MPIO Other FC SCSI Disk Drive
    hdisk5          Available  MPIO Other FC SCSI Disk Drive
    Select which disk that you want to use in the MPIO configuration. In this scenario, we selected hdisk5.
  4. Determine the ID of the disk that you have selected. For instructions, see Identifying exportable disks. In this scenario, the disk does not have an IEEE volume attribute identifier or a unique identifier (UDID), so we determine the physical identifier (PVID) by running the lspv hdisk5command. Your results look similar to the following:
    hdisk5          00c3e35ca560f919                    None
    The second value is the PVID. In this scenario, the PVID is 00c3e35ca560f919. Note this value.
  5. List the attributes of the disk using the lsdev command. In this scenario, we typed lsdev -dev hdisk5 -attr. Your results look similar to the following
    ..
    lun_id          0x5463000000000000               Logical Unit Number ID           False
    ..
    ..
    pvid            00c3e35ca560f9190000000000000000 Physical volume identifier       False
    ..
    reserve_policy  single_path                      Reserve Policy                   True
    Note the values for lun_id and reserve_policy. If the reserve_policy attribute is set to anything other than no_reserve, then you must change it. Set the reserve_policy to no_reserve by typingchdev -dev hdiskx -attr reserve_policy=no_reserve.
  6. On the second Virtual I/O Server logical partition, list the physical volumes by typing lspv. In the output, locate the disk that has the same PVID as the disk identified previously. In this scenario, the PVID for hdisk7 matched:
    hdisk7          00c3e35ca560f919                    None
    Tip: Although the PVID values should be identical, the disk numbers on the two Virtual I/O Serverlogical partitions might vary.
  7. Determine if the reserve_policy attribute is set to no_reserve using the lsdev command. In this scenario, we typed lsdev -dev hdisk7 -attr. You see results similar to the following:
    ..
    lun_id          0x5463000000000000               Logical Unit Number ID           False
    ..
    pvid            00c3e35ca560f9190000000000000000 Physical volume identifier       False
    ..
    reserve_policy  single_path                      Reserve Policy                   
    If the reserve_policy attribute is set to anything other than no_reserve, you must change it. Set the reserve_policy to no_reserve by typing chdev -dev hdiskx -attr reserve_policy=no_reserve.
  8. On both Virtual I/O Server logical partitions, use the mkvdev to create the virtual devices. In each case, use the appropriate hdisk value. In this scenario, we type the following commands:
    • On the first Virtual I/O Server logical partition, we typed mkvdev -vdev hdisk5 -vadapter vhost5 -dev vhdisk5
    • On the second Virtual I/O Server logical partition, we typed mkvdev -vdev hdisk7 -vadapter vhost7 -dev vhdisk7
    The same LUN is now exported to the client logical partition from both Virtual I/O Server logical partitions.
  9. AIX can now be installed on the client logical partition. 
  10. After you have installed AIX on the client logical partition, check for MPIO by running the following command:
    lspath
    You see results similar to the following:
    Enabled hdisk0 vscsi0
    Enabled hdisk0 vscsi1
    If one of the Virtual I/O Server logical partitions fails, the results of the lspath command look similar to the following:
    Failed  hdisk0 vscsi0
    Enabled hdisk0 vscsi1
    Unless a health check is enabled, the state continues to show Failed even after the disk has recovered. To have the state updated automatically, type chdev -l hdiskx -a hcheck_interval=60 -P. The client logical partition must be rebooted for this change to take effect.

rootvg: Creating a mksysb backup to tape

rootvg: Creating a mksysb backup to tape: Question Mksysb related questions / how to create and restore. Answer This document discusses the ‘mksysb’ command when ran to a tape dr...

Creating a mksysb backup to tape


Question
Mksysb related questions / how to create and restore.
Answer
This document discusses the ‘mksysb’ command when ran to a tape drive (rmt device).

What is a mksysb and why create one ?
Mksysb tape structure
Files important to the mksysb
Important information concerning mksysb flags
Creating a mksysb to a tape drive in AIX V5
Creating a mksysb to a tape drive in AIX V6
Verification of a mksysb
Restoring a mksysb
Restore menus
Restoring individual files or directories from a mksysb tape
FAQ


*Note : For all examples the tape drive will be refered to as /dev/rmt0. This may not be the case in your environment. Simply substitute the correct tape drive # as needed. Furthermore, this document does not cover restoring mksysb images to systems other than the one it was taken from (cloning). 

What is a mksysb and why create one ?
A mksysb is a bootable backup of your root volume group. The mksysb process will backup all mounted JFS and JFS2 filesystem data. The file-system image is in backup-file format. The tape format includes a boot image, system/rootvg informational files, an empty table of contents, followed by the system backup (root
volume group) image. The root volume group image is in backup-file format, starting with the data files and then any optional map files.

When a bootable backup of a root volume group is created, the boot image reflects the currently running kernel. If the current kernel is the 64-bit kernel, the backup's boot image is also 64-bit, and it only boots 64-bit systems. If the current kernel is a 32-bit kernel, the backup's boot image is 32-bit, and it can boot both 32-bit and 64-bit systems.

In general the mksysb backup is the standard backup utility used to recover a system from an unusable state - whether that be a result of data corruption, a disk failure, or any other situation that leaves you in an unbootable state. You should create a mksysb backup on a schedule in line with how often your rootvg data changes, and always before any sort of system software upgrade.

A mksysb tape can also be used to boot a system into maintenance mode for work on the rootvg in cases where the system can not boot into normal mode.

Mksysb tape structure

When creating a mksysb to tape, 4 images are created in total.+---------------------------------------------------------+ | Bosboot | Mkinsttape | Dummy TOC | rootvg | | Image | Image | Image | data | |-----------+--------------+-------------+----------------| |<----------- Block size 512 ----------->| Blksz defined | | | by the device | +---------------------------------------------------------+

Image #1: The bosboot image contains a copy of the system's kernel and specific device drivers, allowing the user to boot from this tape.blocksize: 512 format: raw image files: kernel device drivers 

Image #2:
The mkinsttape image contains files to be loaded into the RAM file system when you are booting in maintenance. blocksize: 512 format: backbyname files: ./image.data, ./tapeblksz, ./bosinst.data and other commands required to initiate the restore. 

Image #3:
The dummy image contains a single file containing the words "dummy toc". This image is used to make the mksysb tape contain the same number of images as a BOS Install tape. This is merely reference to pre-AIX V4 days when AIX was installed from tape.

Image #4: The rootvg image contains data from the rootvg volume group (mounted JFS/JFS2 file systems only). blocksize: determined by tape drive configuration on creation format: backbyname (backup/restore) files: rootvg, mounted JFS/JFS2 filesystems WARNING: If the device blocksize is set to 0, mksysb will use a hardcoded value of 512 for the fourth image. This can cause the create and restore to take 5-10 times longer than expected. You should set your tape drive’s block size to the recommended value for optimal performance.

Files important to the mksysbThere are a few files that the mksysb uses in order to successfully
rebuild your rootvg environment. These files are located on the 2nd image of your mksysb tape. Three of the files you may find yourself working with are described below.

bosinst.data : This file can be used to pre-set the BOS menu options. Selections such as which disk to install to, kernel settings, and whether or not to recover TCP related information can all be set here.
This file is mainly used for non-prompted installations. Any option selected during a prompted install will override the corresponding setting in this file. 

 image.data : This file is responsible for holding information used to rebuild the rootvg structure before the data is restored. This information includes the sizes, names, maps, and mount points of logical volumes and file systems in the root volume group. It is extremely important that this file is up to date and correct, otherwise
the restore can fail. It is common to edit this file when it is necessary to break mirroring during a restore.

 tapeblksz : This is a small text file that indicates the block size the tape drive was changed to in order to write the 4th image of the mksysb. This information would be useful if you wanted to restore
individual files/directories from your mksysb image.

Important information concerning mksysb flags

It is very important that you understand the use and intent of a few of the flags used by the mksysb command. Improper use, lack of use, or use of certain flags in certain situations could cause your mksysb to be difficult to restore. In some cases it may cause your mksysb to be unrestorable.

-i : Calls the ‘mkszfile’ command, which updates the image.data file with current filesystem sizes and characteristics. This flag should always be used unless there is a very specific reason you do not wish to have this information updated. Failure to have an accurate image.data file can cause your mksysb restore to fail with “out of space” errors. 

-e : Allows you to exclude data by editing the /etc/exclude.rootvg file.

A few tips on excluding data from your mksysb are listed below :

There should be one entry per line of the file. It can be either a single file or directory name.
The correct format of each entry should be ^./<path>
Never use wildcards.
Do not leave extra spaces or blank lines in the file. 

While the /etc/exclude.rootvg file excludes data, bear in mind that it does not exclude the fact that a filesystem exists. For example if you have a 50Gig filesystem “/data” and add an entry in your /etc/exclude.rootvg file :
^./data

This will exclude all files in /data but it will still recreate the /data filesystem as a 50Gig filesystem (except it will now be empty).
The only way to truly exclude a filesystem from your mksysb would be to unmount the filesystem before initiating your mksysb.
-p : Using this flag disables the software compression algorithms.

When creating a mksysb during any level of system activity it is recommended to use the “-p” flag. Failure to do so can cause “unpacking / file out of phase” errors during your mksysb restore.

These errors are fatal (unrecoverable) errors. No warning is given during the creation of the mksysb that notifies you of the possibility of having these errors during the restore.

You may want to make the “-p” flag compulsory when running your mksysb command so you do not run into this situation.

-X : This flag will cause the system to automatically expand the /tmp filesystem if necessary. The /tmp filesystem will require approximately 32Mb of free space.

For more information about these and other mksysb command flags, please refer the the mksysb man page.

Creating a mksysb to a tape drive in AIX V51. Using SMITTY : # smitty mksysb Backup DEVICE or FILE.........................[/dev/rmt0] Create MAP files?.............................no EXCLUDE files?................................no (-e) List files as they are backed up?.............no Verify readability if tape device?............no Generate new /image.data file?................yes (-i) EXPAND /tmp if needed?........................no (-X) Disable software packing of backup?...........no (-p) Backup extended attributes?...................yes Number of BLOCKS to write in a single output..[]
The only required selection here would be the tape drive to use for
the backup. Default flags are listed above. Change flags as necessary
for your environment / situation.
*Please refer to the section above entitled “Important Information Regarding Mksysb Flags"

2. From command line :# mksysb -i /dev/rmt0 

This command reflects the options listed in the above “smitty mksysb” output. This does not take into account any customization flags. Please review the section above entitled “Important Information
Regarding Mksysb Flags” to be best informed concerning the flags that you should use.

Creating a mksysb to a tape drive in AIX V61. Using SMITTY : # smitty mksysb Backup DEVICE or FILE.........................[/dev/rmt0] Create MAP files?.............................no EXCLUDE files?................................no (-e) List files as they are backed up?.............no Verify readability if tape device?............no Generate new /image.data file?................yes (-i) EXPAND /tmp if needed?........................no (-X) Disable software packing of backup?...........no (-p) Backup extended attributes?...................yes Number of BLOCKS to write in a single output..[] Location of existing mksysb image.............[] File system to use for temporary work space...[] Backup encrypted files?.......................yes Back up DMAPI filesystem files?...............yes
The only required selection here would be the tape drive to use for
the backup. Default flags are listed above. Change flags as necessary
for your environment / situation.
*Please refer to the section above entitled “Important Information Regarding Mksysb Flags"

There are a few extra options with V6 mksysb using SMIT. The most notable being the option “Location of existing mksysb image”. You can now use an existing mksysb taken to file and copy that to tape. An
attempt will be made to make the tape a bootable tape. You should use a system at the same or higher technology level as the mksysb image if you choose to do this. The command line flag would be “-F”.

This does require a minimum of 100Mb free in /tmp. See the manpage

for further information. This flag was introduced as a command line option in AIX V5 (5300-05).

2. From command line :# mksysb -i /dev/rmt0 

This command reflects the options listed in the above “smitty mksysb” output. This does not take into account any customization flags. Please review the section above entitled “Important Information Regarding Mksysb Flags” to be best informed concerning the flags that you should use.
Verification of a mksysbThere is no true verification of the “restorability” of a mksysb other than actually restoring it. Taking cautions such as understanding the flags used for the creation of the mksysb, checking your error report for any tape drive related errors before running the mksysb, regular cleaning of the tape drive, and verifying the readability of the mksysb after creation are all good checks. If your system is in good health your mksysb should be in good health. Similarly, if you attempt to create a mksysb of a system
logging hundreds of disk errors, or a system with known filesystem corruption, your mksysb will likely retain that corruption.

To verify the readability of your backup run the following command :
# listvgbackup -Vf /dev/rmt0

Any errors that occur while reading the headers of any of the files will be displayed, otherwise only the initial backup header information will be displayed. Keep in mind that this check tests the readability of the
file only, not the writeability.
Restoring a mksysbTo restore a mksysb image you simply need to boot from the tape and verify your selections in the BOS menus. Next, we’ll cover two booting scenarios. One in which your system is currently up and operational, the next in which your system is down.

1. If your system is currently running and you need to restore your mksysb, simply change the bootlist to reflect the tape drive and reboot the system.
# bootlist -m normal rmt0 
# shutdown -Fr 
 
2. If your system is in a down state you should boot to the SMS menus and set your bootlist to reflect the tape drive. The SMS menu options are listed below. Your menu options may be different (depending on
your level of firmware), however it should be clear enough by following this document to figure out what options should be chosen if yours differ.

SMS - SYSTEM MANAGEMENT SERVICES - 1. Select Language 2. Change Password Options 3. View Error Log 4. Setup Remote IPL (RIPL (Remote Initial Program Load)) 5. Change SCSI Settings 6. Select Console --> 7. Select Boot Options
The next menu should come up : --> 1. Select Install or Boot Device 2. Configure Boot Device Order 3. Multiboot Startup
The next menu will have the following : Select Device Type : 1. Diskette 2. Tape 3. CD/DVD 4. IDE 5. Hard Drive 6. Network --> 7. List all Devices 

The system will scan itself to determine which devices are available to boot from. All of your available boot devices will be displayed here. This menu can be a little tricky. If you have a device pre-selected it
will have a 1 next to it under the “Current Position” column. Use the “Select Device Number” listing to choose the device you want to boot from to change that. 

The next screen will offer you three choices :
1. Information --> 2. Normal Mode Boot 3. Service Mode Boot

Restore menusI. From the Installation and Maintenance Menu, select (2):
1) Start Installation Now with Default Settings 
2) Change/Show Installation Settings and Install 
3) Start Maintenance Mode for System Recovery 
 
II. From the System Backup Installation and Settings, you’ll see the default options that are taken from your “bosinst.data” file. If these are correct select (0) and skip down to step 6 below.
If you need to change any options such as the disks you would like to install to select (1):

Setting: Current Choice(s): 1. Disk(s) where you want to install... hdisk0 Use Maps............................ No 2. Shrink File Systems................. No 0. Install with the settings listed above. 
 
To shrink the file systems to reclaim allocated free space, select option 2 so the setting is set to Yes. For the file systems to be restored with the same allocated space as the original system, make
sure option 2 is set to No.

III. Change Disk(s) Where You Want to Install.Type one or more numbers for the disks to be used for installation and press Enter. 
The current choice is indicated by >>>. To deselect a choice, type the corresponding number and press Enter. At least one bootable disk must be selected. 
Choose the location by its SCSI ID. Name Location Code Size (MB) VG Status Bootable >>> 
1. hdisk0 00-01-00-0,0 70008 rootvg yes >>> 
2. hdisk1 00-01-00-1,0 70008 rootvg yes 0. Continue with the choices indicated above
After the desired disks have been chosen, select (0) to continue.

IV. System Backup Installation and Settings, select (0 to continue)Setting: Current Choice(s): 1. Disk(s) where you want to install......... hdisk0... 2. Use Maps.................................. No 3. Shrink File Systems....................... No 0. Install with the settings listed above.

Restoring individual files or directories from a mksysb tapeYou may at some point need to restore a file, several files, or directories from your mksysb. You’ll need to first find the block size the rootvg data was written at (4th image). Files will be restored relative to your current location on the system when the restore command is executed. If you would like the files to return to their original location run the restore command (step 3) from /, otherwise cd down to the path you wish the file(s) to be restored.


1. Display the contents of the ./tapeblksz file on the mksysb to determine the correct block size the tape drive should be set to for the restore command.

# cd /tmp 
 # tctl -f /dev/rmt0 rewind 
# chdev -l rmt0 -a block_size=512 
 # restore -s2 -xqvdf /dev/rmt0.1 ./tapeblksz 
 # cat ./tapeblksz 
 
The output that is given is the blocksize to which the tape drive was set when the mksysb was made.

2. Next, set the blocksize of the tape drive accordingly by running the following command :
# chdev -l rmt0 -a block_size=<number in the ./tapeblksz file> 

3. Restore the files or directories by running the following commands :

# cd / (if the file is to be restored to its original place) 
# tctl -f /dev/rmt0 rewind # restore -s4 -xqdvf /dev/rmt0.1 ./<pathname> 

You can specify multiple <pathname> entries for multiple file(s)/directory structures to restore. Simply separate each entry with a space. Remember to always use a “./” before each pathname.

**As an alternative you can also use the 'restorevgfiles' command. In the interest of keeping this document "relatively" short - no further examples will be given. Please see the manpage for use of this
command.

FAQThis section is included to provide answers to common questions asked concerning mksysb. This section is not intended to diagnose any problem or perform any problem determination. These questions/answers are intended to hopefully prevent the need to call up and open a problem ticket for a short duration / short answer
question. If you have any questions that you feel might be helpful, please submit feedback on this document and it may be added. 

1. The rootvg on my mksysb tape has all JFS filesystems, and I’d like to change them to JFS2 filesystems. 

How can I do this ?
The only supported method of changing rootvg system filesystems from JFS to JFS2 would be to run a “New and Complete Overwrite” installation.

2. Does the mksysb command backup nfs mountpoints ? 
No, nfs mountpoints are not followed.

3. Will my non-root volume groups automatically mount after the restore completes ?
That volume group setting is held on the VGDA of the disk the volume group is held. There is a new option that will allow this to be set in the BOS menus, so this should no longer be an issue. 

4. The document mentions I can restore files from my mksysb.
Are there any restrictions to what I should/should not restore ?
Absolutely. You do not want to restore any files that are critical to the system running.
Examples of files you do not want to restore: most library files, ODM files, applications, the kernel...
Examples of files safe to restore : /etc/group, /etc/passwd, cron related files, /home, any data filesystems you created....

5. How long will my mksysb take to restore ?
That is dependent on many factors - the amount of data that needs to be restored being the major player in the restore time. A ballpark rule of thumb would be 1.5 - 2x the time it took to create the mksysb. You
also have to consider reboot time.

6. The restore appears to be hung at 83%, what do I do ?
First you want to make sure this is a “true” hang. This point in the restore can take anywhere from 10 minutes to even upwards of 60 minutes depending on the size of the rootvg. Make sure you’ve given
it ample time to bypass this portion of the restore before becoming concerned.

7. I have a mksysb tape but I don’t know anything about it. Are there any commands that I can run to get information about the rootvg it contains ?
There are some very helpful ‘lsmksysb’ commands that can provide all sorts of information. Some of the things you can find out : - the ‘lslpp -L’ output to see what filesets are installed on that rootvg
- ‘lsvg -l rootvg’ output will show: 
volume group information and oslevel total backup size and size of volume group if shrunk to minimum

rootvg: Replace a failing PV from a non-root VG

rootvg: Replace a failing PV from a non-root VG: Steps required to replace a failing PV from a non-root VG Backup all filesystems found on the failing PV.
lspv -l (lspv -l hdisk2) ...

rootvg: Replace a failing PV from a non-root VG

rootvg: Replace a failing PV from a non-root VG: Steps required to replace a failing PV from a non-root VG Backup all filesystems found on the failing PV.
lspv -l (lspv -l hdisk2) ...

Replace a failing PV from a non-root VG

Steps required to replace a failing PV from a non-root VG
  1. Backup all filesystems found on the failing PV.
  2. lspv -l <PV> (lspv -l hdisk2) To determine which filesystem(s) are found on the PV in question.
  3. Find out how the LVs are laid-out on the PV so you will know how to re-create them when the time comes.
  4. lslv -m <LV> (lslv -m oracle) You will use the output of the lslv command to serve as a template for creating a map file for this LV later on.
  5. Unmount all filesystems on that PV. umount <FILESYSTEM_NAME> to unmount each filesystem from the PV in question.
  6. Remove all LVs found on that PV. rmlv <LV> (rmlv /oracle)
  7. Remove the questionable PV from the system. reducevg <VG> <PV>
  8. Remove the PV entry from the ODM database. rmdev -l <PV_NAME> -d (rmdev -l hdisk2 -d)
  9. Shutdown the system: Shutdown -F
  10. Remove the bad PV and install the new PV
  11. Add the new PV to the VG in question. extendvg <VG> <PV> (eg., extendvg datavg hdisk2)
  12. Re-create the LVs removed from the OLD PV to the NEW PV.
    mklv -y<LV> -m<MAP_FILE> <VG> <PP_NUM> <PV> (mklv -y oracle -m oracle.map oraclevg 200 hdisk2)
    The map file is assembled from the output generated from the lslv -m command in step 2 above. Do this for each LV that existed on the removed PV.
  13. Re-size the filesystems on the new PV. mkfs /dev/<LV> (mkfs /dev/oracle) Do this for each file system that existed on the removed PV.
  14. Perform a filesystem check before mounting it. fsck -f /dev/<LV> (fsck -f /dev/oracle)
  15. Mount all filesystems on that PV. mount <FILESYSTEM_NAME> (mount /oracle)
  16. Now restore the data you backed up

Friday, 26 August 2011

rootvg: Test 000-104: AIX 6.1 Administration

rootvg: Test 000-104: AIX 6.1 Administration: Test 000-104: AIX 6.1 Administration What you see in ¡§Red¡¨ are the objectives of the exam defined by IBM: http://www-03.ibm.com/cert...

Test 000-104: AIX 6.1 Administration


Test 000-104: AIX 6.1 Administration


What you see in ¡§Red¡¨ are the objectives of the exam defined by IBM:

http://www-03.ibm.com/certify/tests/obj104.shtml

I have tried to add notes to each item to some extent. This is not a replacement for IBM documents or
courses, but can be used as a wrap-up for the exam or as a reference for some admin tasks. The
document was not intended for public use in the first place, that is why you will typo mistakes,
formating or other problems in it. Hope these notes help you pass the exam with a better score :)

Note: References are mostly IBM redbooks, man pages and other freely-available IBM web resources.


Backup and Recovery (5%)


a. Recover from a lost root password
1. Boot the LPAR from AIX media, mksysb tape or NIM server. The boot resource should
have the same version and TL as the system you want to recover. For example, an AIX
6.1 with TL6 cannot be recovered by AIX 6.1 TL2 media or NIM resource.
2. Choose Start Maintenance Mode for System Recovery .
3. Select Access a Root Volume Group. A message displays explaining that you will not be
able to return to the Installation menus without rebooting if you change the root
volume group at this point.
4. Type 0 and press Enter.
5. Type the number of the appropriate volume group from the list and press Enter.
6. Select Access this Volume Group and start a shell by typing 1 and press Enter.
7. At the # (number sign) prompt, type the passwd command at the command line prompt
to reset the root password. For example:
# passwd
Changing password for "root"
root's New password:
Enter the new password again:
8. To write everything from the buffer to the hard disk and reboot the system, type the
following:
# sync;sync;sync;reboot

b. Backup AIX OS and data using AIX commands (mksysb, mkcd, tar, backup, etc)
mksysb:
Backup to tape (Note: Not all tape drives are bootable!):
# mksysb -iXV /dev/rmt0

Backup to filesystem (the filesystem path can be local or NFS-mounted):
# mksysb -iX /backups/mksysb31Mar2011.mksysb

Backup a client from NIM server (Note: /mksysbs in the following command should be NFS
exported to testlpar):
# nim -o define -t mksysb -a server=master -a location=/mksysbs/testlpar31Mar2011.mksysb -a
source=testlpar -a mk_image=yes -a mksysb_flags=XeA testlpar_31Mar2011_mksysb

Check the NIM resource in NIM server:
# lsnim -t mksysb

testlpar_31Mar2011_mksysb resources mksysb
Note: mksysb only backs up files and directories in rootvg that are mounted.


There are other methods to clone an AIX systems:
o Alternate Disk Install
o Tivoli Sysback
o Taking mirror disks of rootvg to another system!
o And probably more¡K
„h Mksysb image can be extracted from tape to be used in NIM server.:
o First you should find the block size of the tape when the mksysb has been performed:

# chdev -l rmt0 -a block_size=512
# tctl -f /dev/rmt0 rewind
# restore -s2 -xqvf /dev/rmt0.1 ./tapeblksz
# cat tapeblksz
1024 NONE
It means the mksysb backup has been made using block size of 1024.
# chdev -l rmt0 -a block_size=1024
# tctl -f /dev/rmt0 rewind
# dd if=/dev/rmt0.1 of=/mksysbs/mksysb1 bs=1024 fskip=3
¡P It is possible to show information about a mksysb image:
# lsmksysb -lf /tmp/mksysbfile <-- this will show infromation about filesystems and OS
level of the image.
(Actually lsmksysb is a soft link to listvgbackup. It means you could use "listvgbackup -lf
/tmp/mksysbfile" instead of above command as well)
savevg and restvg:
- The volume group should be vary-on and filesystems should be mounted.
- This will backup testvg into a file called vgbackup1:
o # savevg -if /backups/vgbackup1 testvg
- In order to exclude files, edit /etc/excluce.testvg.
- If you destroy the volume group, it can be restored by restvg:
o # restvg -f /backups/vgbackup1 hdisk1
mkszfile and mkvgdata:
When you use ¡§-i¡¨ switch with mksysb and savevg, they call mkszfile and mkvgdata respectively.
It will create /image.data for rootvg, /tmp/vgdata/testvg/testvg.data for a user-created
volume group like testvg and /tmp/wpardata/wpar1/image.data for a workload partition called
wpar1. If you need to change the characteristics of the restored volumes group, above files
should be edited and then mksysb, savevg are used without ¡§-i¡¨ switch.
Note: /usr/bin/mkszfile is a shell script that has two aliases: mkvgdata and mkwpardata. The
script runs differently based on the name of invoker file:
¡K
NAME=`/usr/bin/basename $0`
¡K
if [ $NAME = "mkszfile" ]
then
set -- `${getopt} XfmN $*` # mkszfile options
¡K
savewpar
savewpar cannot be used to create bootable tapes.
The command switches are very similar to savevg.
Example:
# savewpar ¡Vief /backups/wpar1backup wpar1
Note:
How to exclude files from and volume group or wpar backup:
- Create a file called /etc/exclude.rootvg, /etc/exclude.testvg or /etc/exclude.wpar1
- Put the ¡§pattern¡¨ you would like to exclude:
^./home „hexcludes /home filesystem
testfs „h excludes any file or directory that grep finds ¡§testfs¡¨ pattern it their path.
- # mksysb ¡VeX /mksysbs/newbackup
- # savevg ¡Vief /backups/vgbackup1 testvg
- # savewpar ¡Vief /backups/wpar1backup wpar1
Another way to exclude filesytems in a backup is to remove filesystem and its associated logical
volumes information from image.data (of rootvg or a workload partition) or testvg.data for a
user-created volume group named testvg.
mkcd /mkdvd
- Create multi-volume CDs from a mksysb, savevg, or savewpar backup image.
- Can generate a new backup or alternatively use existing mksysb, savevg or savewpar image.
- Generate CD or DVD images
o Images can be burnt now
o Images can be saved for later use
- # mkdvd ¡Vd /dev/cd0 „h bootable rootvg backup
- # mkdvd ¡Vd /dev/cd0 ¡VW wpar1
- # mkdvd ¡VS ¡VI /backups/ -C /backup -W wpar1 „h stop to burn and keeps the images in
/backups.
- # mkdvd ¡VSI /backups ¡VC /bakcups ¡Vv testvg
- There are so many command switches. You can use smit for more convenience.
Note:
mkdvd is an alias to mkcd
tar
# tar ¡Vcvf /dev/rmt0 /data „h backs up /data tree to rmt0 tape
# tar ¡Vtvf /dev/rmt0 „h lists the table of content
# tar ¡Vxvf /dev/rmt0 „h extracts (restores) /data
Note:
- When you use relative path, be careful when you restore the backup. You should go the
same directory to restore it.
- Tar can backup to file:
o # tar ¡Vcvf /backups/newbackup.tar /data
- You can use tar without the dash charater ¡§-¡§:
o # tar tvf /dev/rmt1
- You can backup many files and create a very big tar file, but each file cannot be bigger than
8GB. To dodge this problem you can use GNU tar. I have tested it with files of 80GB, and it
did not complain.
backup
Backup files by name:
- Use ¡§-i¡¨ flag.
- # find /home/Salehi | backup -ivqf /dev/rmt0
Backup filesysems by i-node:
- Need the filesysem to be un-mounted.
- ¡§backup -2¡¨ means level 2. If you use -u, it performs an incremental backup. ¡§u¡¨ means
update /etc/dumpdates
- # backup -1 -u -f /dev/rmt0 /data
c. Restore AIX OS and data using AIX commands, including listing backup media contents (restvg,
restore, tar, etc)
To restore a mksysb tape, just try boot from it. If the tape is not bootable, boot from AIX DVD
and then in SMS menus try to restore the mksysb by selecting the tape drive:
Normal Mode Boot „h Yes „h Start Maintenance Mode for System Recovery „h Install from a
System Backup
restvg
# restvg -f /backups/vgbackup1 hdisk1
restore
- To show the contents of a backup:
o # restore ¡VTvqf /backups/mydata.bak
- To extract all mine directory and its contents:
o # restore ¡Vxvqf /backups/mydata.bak /data/mine/
restwpar
# restwpar -f /backups/wpar1.bak -n wapr2 -d /newbasedir
System Initialization and Boot (7%)
a. Describe and modify the /etc/inittab and rc files
b. Describe the different run levels and boot modes
a,b,c and h are not true runlevels:
¡P they are processed only by telinit (not by init)
¡P A process started by these runlevels is not killed when init command changes runlevels.
c. Use commands to manage the boot list and create boot logical volumes (incl. changing the
boot list)
d. Describe the boot process (BIST, POST, mounts, cfgmgr)
AIX boot process:
1. POST and hardware checking
2. System ROS locates and loads the bootstrap code. It is operating system independent.
3. Software ROS (bootstrap) creates RAMFS, locates the BLV and turns control to it.
4. RAM filesystem includes a reduced version of ODM (such as PdDv), rc.boot ¡K
5. Base devices are configured and ¡§init¡¨ process will be started from RAMFS.
6. There is still no rootvg! But disks have been configured and are ready.
Now rc.boot will be called three times:
7. Phase1:
a. init process is already running. So it forks rc.boot 1
b. ODM is copied to RAMFS from BLV
c. ¡§cfgmgr ¡Vf¡¨ configures the necessary items to have rootvg disks.
8. Phase 2:
a. Rootvg is varied on.
b. fsck ¡Vf /dev/hd4 (root filesystem)
c. hd4 is mounted on /mnt in RAMFS
d. /usr and /var are checked and mounted
e. /var is checked and mounted
f. If system has been dumped before, ¡§copycore¡¨ command copies the dump from
/dev/hd6 (default) to /var/adm/ras.
g. /var is unmounted.
h. The primary paging space h6 is activated.
i. All /dev files are copied from RAMFS to disk
j. All customized ODM files from the RAM file system are copied to disk. Both ODM
versions from hd4 and hd5 are now synchronized.
k. Root filesystems are mounted.
9. Phase 3:
a. Rc.boot 3 (from disk)
b. /tmp is mounted
c. Syncvg rootvg
d. Cfgmgr ¡Vp2 for the rest of devices for normal boot. For service mode ¡Vp3 is invoked.
e. Cfgcon configures the console and boot messages are sent to the console
f. ODM of BLV and / are synched.
g. Syncd and errdemon are started.
h. Init turns the control to the next line of inittab
e. Interrupt the boot process and use SMS
f. Describe booting from different media (disk, network, tape, cd)
g. Perform system or partition startups, shutdowns and reboots
bootlist: Displays and alters the list of boot devices available to the system
bootlist has some modes:
normal: When the system is booted in normal mode
service: When the system is booted in service mode
prevboot: ¡§Some hardware platforms may attempt to boot from the previous boot
device before looking for a boot device in one of the other lists.¡¨
To show the normal bootlist:
# bootlist -m normal -o
To set the normal mode bootlist:
# bootlist -m normal cd0 hdisk0
To clear (invalidate) the service mode bootlist:
# bootlist -m service ¡Vi
When a partition is activated, you can choose the boot mode:
Normal: Uses ¡§normal mode¡¨ bootlist stored in NVRAM
SMS: Boot process stops at System Management Services menus.
DIAG_STORED: Uses ¡§service mode¡¨ bootlist and eventually shows diag menus.
DIAG_DEFAULT: Like DIAG_STORED, it is used for diag, but uses default boot list (not what you
have set using boot -m service)
OPEN_FIRMWARE: System boots to Open Firmware (used by service personnel)
Useful shutdown switches:
# shutdown -l (creates /etc/shutdown.log for diagnostics. ¡§-l¡¨ stands for ¡§log¡¨).
# shutdown -Fr (fast reboot)
System and Device Configuration (9%)
a. Add or remove devices (printers, tape, adapters, using cfgmgr, etc)
Add a device:
- Physically attach the device to the system. (The device may be hot-pluggable or not)
- If the system is powered-off, power it on. It will run cfgmgr by default. Otherwise, run
cfgmgr which will introduce the device into AIX ODM.
o If the device driver of the attached device does not exist in the system, install it
explicitly or have cfgmgr to install it:
# cfgmgr -i /dev/cd0
Remove a device:
# rmdev -l rmt0 (notice! This command only unconfigures the device, and do not removes it)
# rmdev -dl rmt0 (removes the device from ODM)
# rmdev -Rdl fcs0 (removes fcs0 and all its children recuresively)
# rmdev -p fcs0 (just removes the children, not fcs0 itself)
b. Determine / chance device attributes, including WWN, MAC addresses, etc. (lsdev, chdev,
lscfg, lsattr)
Chdev:
Changing the attributes of a device if it is busy:
# chdev -l ent0 -a ... -P (P stands for permanent)
Determine WWPN or FC adapter:
# fcstat fcs1 | grep -i "world wide port name"
World Wide Port Name: 0x10000000C97A34BF
Or:
# # lscfg -vl fcs1 | grep -i "network address"
Network Address.............10000000C97A34BF
Determining WWNN of FC adapter:
# fcstat fcs1 | grep -i "world wide node name"
World Wide Node Name: 0x20000000C97A34BF
Or:
# lscfg -vl fcs1 | grep -i z8
Device Specific.(Z8)........20000000C97A34BF
Determining Ethernet adapter MAC address:
# entstat -d ent0 | grep -i "hardware address"
Hardware Address: 00:14:5e:53:9d:40
Or:
# lscfg -vl ent0 | grep -i "network address"
Network Address.............00145E539D40
c. List, define and change paging space
List paging space:
# lsps -a  shows detailed output
# lsps -s  shows a summary
# mkps -s 1 -n -a testvg hdisk1  defines a paging space with one PP, starts now and at restart
# chps -s 1 paging00  adds one PP to the paging space
# chps -d 1 paging00  removes one PP from the paging space
# swapon /dev/paging00  activate the paging space now
# swapoff /dev/paging00
# rmps paging00  remove the paging space
d. Configure and manage print subsystem (print queues, default printer, print job management)
e. Configure system environment (timezone, /etc/environment, etc.)
f. Add / remove disks (including data migration tasks, using cfgmgr)
Network Administration (9%)
a. Configure the network (TCP/IP daemons, /etc/hosts, hostname, ifconfig, route,
/etc/resolv.conf, etc/netsvc.conf, /etc/ntpd.conf)
/etc/hosts:
You can add, change or delete entries from this file by hostent command. (Manual editing is still
available).
This adds a record to /etc/hosts with primary hostname of ¡§salehi¡¨ and an alias named ¡§mypc¡¨:
# hostent ¡Va 10.0.62.14 ¡§salehi mypc¡¨
To show the record associated with Salehi:
# hostent ¡Vs salehi
10.0.62.14 salehi mypc
Reserved host names:
timeserver
If you set timeserver in /etc/hosts, you get run setclock to get its time and set it to the current
system.
printserver
Identifies the default host to receive print requests.
hostname:
- ¡§hostname¡¨ command can show or ¡§temporarily¡¨ set the hostname of a system:
o # hostname newhostname (next reboot will roll it back. It is not permanent.)
- Another way to permanently set hostname:
o # chdev -l inet0 -a hostname=newhostname
o This will not change /etc/hosts
- Another way:
o # smit mkhostname
o This will not change /etc/hosts
- Another way:
o # mktcpip -h newhostname -a 10.0.84.79 -m 255.255.255.0 -i en0
o This will change /etc/hosts. (Actually adds the new host name as an alias of previous
value in /etc/hosts.)
Conclusion:
When you change hostname, always check /etc/hosts.
ifconfig:
To list all interfaces that are ¡§up¡¨ with details:
# ifconfig -au
To add IP to en0:
# ifconfig en0 10.1.2.3 netmask 255.255.255.0 up
To bring a network interface down:
# ifconfig en0 down
Note:
Changes made by ifconfig will be gone in next restart.
route:
To list the routing table:
# netstat ¡Vnr
To find the default gateway:
# netstat -nr | grep default | awk '{print $2}'
To establish a default gateway:
# route add 0 192.168.1.1
Add route to a destination (like 11.25.12.1) via a gateway (like 10.10.10.1):
# route add 11.25.12.1 10.10.10.1
To reach a network (like 50.1.3.0) via a gateway like 172.16.16.1 via en0:
# route add -net 192.168.10.0 10.0.62.14 ¡Vinterface 0
Or:
# chdev -l inet0 -a route=net,-hopcount,0,,-if,en0,,,,-static,50.1.3.0,172.16.16.1
To delete above route:
# route delete -net 50.1.3.0
# chdev -l inet0 -a delroute=net,-hopcount,0,,,50.1.3.0,172.16.16.1
Note:
The effect of route command is not permanent. Sometimes it is desirable to set routing via a
script when needed (like in HACMP environment). If you need to make it permanent, use ¡§chdev
-l inet0 ¡K¡¨ instead.
resolv.conf:
AIX uses some methods to map host names to their IP address:
- /etc/hosts
- DNS
- NIS
- LDAP
If /etc/resolv.conf does not exist:
it means the network is ¡§flat¡¨ and therefore /etc/hosts will be used for name resolution.
If /etc/resolv.conf exists:
We have ¡§domain network¡¨ and therefore resolver algorithm will be used.
File format:
A ¡§domain¡¨ entry tells the resolver routines which default domain name to append to names
that do not end with a . (period). There can be only one domain entry. This entry is of the form:
domain my.domain.com
¡§search¡¨ is another entry of this file that is mutually exclusive with ¡§domain¡¨. With ¡§search¡¨ you
can specify many domains to search within when you are resolving a name. The first domain in
the search list, is default domain.
¡§nameserver¡¨ entry specifies the remote domain name server.
- The address is dotted decimal
- You can specify more than one name server:
nameserver 192.9.21.1
nameserver 192.9.21.2
Note:
- If both ¡§domain¡¨ and ¡§search¡¨ entries exist, the one that appear last will be considered.
- If there is no default domain in /etc/resolv.conf, you should set it in the hostname.
- If you use LDAP, /etc/resolv.ldap should be configured.
- Name resolution order is specified in irs.conf and netsvc.conf and NSORDER environment
variable. NSORDER overrides the settings of netsvc.conf and netsvc.conf overrides irs.conf.
netsvc.conf:
It is used to specify the ordering of name resolution.
Syntax:
hosts = value [, value]
alias = value [, value]
Sample:
#checks /etc/hosts and then DNS for name resolution:
Hosts = local, bind
# checks /etc/aliases and then NIS to resolve aliases for sendmail:
alias = files, nis
/etc/aliases:
/etc/aliases is a link to /etc/mail/aliases
Contains the required aliases for the sendmail command.
moi: salehi
NSORDER:
If NSORDER environment variable is set, it overrides the settings of netsvc.conf and irs.conf
Example:
# export NSORDER=bind,nis,local
ntp.conf:
# startsrc -s xntpd
# lssrc -ls xntpd | grep peer
Sys peer: no peer, system is insane „h insane means ntp configuration is wrong!
In ntp.conf:
- Add this:
server 127.127.1.0
- and comment this:
#broadcastclient
# stopsrc -s xntpd
# startsrc -s xntpd -a ¡Vx (-x can be very important)
Wait for one or two miutes and then:
# lssrc -ls xntpd | grep peer
Sys peer: 127.127.1.0
flags: (configured)(refclock)(sys peer)
On ntp client side:
# ntpdate ¡Vd node1
If offset is more than 1000 seconds, change the time date manually and then try above
command again.
Note:
You can set the client to automatically sync the time with your server.
- Add a server entry in /etc/ntp.conf, but this time the address of your timeserver.
- Uncomment broadcastclient
- # stopsrc -s xntpd
- # startsrc -s xntpd -a ¡Vx (-x can be very important)
In order to start xntpd in system startup, change /etc/rc.tcpip. This can be done both in client
and server.
b. Configure network security (/etc/hosts.equiv, .rhosts, etc.)
First /etc/hosts.equiv and then $HOME/.rhosts will be checked to see whether the remote
r-command request is from a trusted host or not.
Sample:
toaster # all users from toaster are allowed
machine1 bob # only bob from machine1
+ lester # user lester from all machines
tron ¡Vjoel # user joel from tron host is not allowed.
tron # all userd from trom are allowd.
Note:
- For root user, only /.rhosts is checked.
- If /etc/hosts.equiv and $HOME/.rhosts have write permission for group or others, password
will be asked!
- The deny, or - (minus sign), statements must precede the accept, or + (plus sign),
- statements in the lists
- Generally it is not secure to use this kind of password-less communication. You can use SSH
key pairs, instead.
c. Verify network availability and debug network problems (ping, ifconfig, netstat, tcpdump,
iptrace)
tcpdump:
It prints the headers of packets on a network interface.
Example:
# tcpdump -i en0
To print all packets arriving at or departing from Salehi:
# tcpdump host salehi
Iptrace:
It provides interface-level packet tracing for IP protocol. It generates a log file that can be very
big.
iptrace can be started by issuing ¡§iptrace¡¨ command itself or by SRC. If not started by SRC, the
process should be stopped by ¡§kill -15¡¨. (-15 is SIGTERM or software termination signal).
Example:
# startsrc -s iptrace -a "/tmp/nettrace"
# stopsrc -s iptrace
# iptrace -i en0 -p telnet -s airmail /tmp/telnet.trace
# kill -15 234343
d. Understand and configure Etherchannel and teaming
e. Configure NFS (/etc/exports/, biod, nfsd, showmount, etc.)
/etc/exports:
If this file is present, at system startup /etc/rc.nfs brings up nfsd and mountd.
The entries of this file are like this:
Directory options
Example:
/soft # exports to the world
/usr2 -access=hermes:zip:tutorial # exports only to these systems
/usr/tps -root=hermes:zip # root access only to these systems
Important daemons and commands:
- nfsd:
o Services client requests for file system operations.
o Each daemon handles one request at a time. You can tune the max threads by chnfs
or chssys.
- mountd:
o It is an RPC that answers a client request to mount a filesystem.
- chnfs:
o # chnfs -n 10 -I (sets the number of nfsd daemons).
- exportfs:
o Exports and unexports directories to NFS clients.
o # exportfs -a (exports all in the /etc/exports)
o # exportfs /dir1 (exports only /dir1 which is in the /etc/exports)
o # exportfs -i /dir2 (exports only /dir1 which is not in the /etc/exports)
o # exportfs ¡Vu /dir2 (unexports /dir2)
Note:
You cannot export either a parent directory or a subdirectory of an exported directory within
the same file system.
biod:
It handles client requests for files. It is an old daemon and might be removed in future AIX
releases.
showmont:
# showmount -a (shows all clients that have mounted something on this server)
# showmount -e nfssrv1 (show which filesystems are exported from nfssrv1)
/etc/xtab:
Contains entries for currently mounted NFS directories. exportfs -u removes entries from this
file.
f. Configure and use CIFS (very basic)
Install bos.cifs_fs package in AIX and then ¡§smit cifs_fs¡¨. That¡¦s it! This will enable AIX to mount
Windows shared directories.
These ports should be opened: 137,138,139 and 445
Security and User Management (7%)
a. Add, delete, change user and group accounts
# mkuser -a mehdi <== mehdi will be admin
# mkuser -R LDAP Nava <== Nava will be authenticated by LDAP
# chuser shell=/usr/bin/bash mehdi <== changes the user's shell
How to reset the failed login count:
# chsec -f /etc/security/lastlog -a "unsuccessful_login_count=0" -s mehdi
b. Describe and modify user and group management related files, profiles, and set or change the
shell environment (/etc/security/user, /etc/security/limits, /etc/security/passwd,
/etc/profile/, .profile)
c. Demonstrate in-depth knowledge of the login process (is getty running, order of the
environment being set, etc.)
Login process:
1- When getty ¡V which is a long running process - detects a connection, it prompts for a
username and runs the login program to authenticate the user. So, getty is the first step
started from inittab:
cons:0123456789:respawn:/usr/sbin/getty /dev/console
2- getty prints a herald message from /etc/security/login.cfg to get the user name from
input.
3- getty calls login process to check whether password is needed to login or not. If
password is needed, another prompt will ask for it.
Note: If the second field of /etc/passwd is null, the user can login without password:
testuser::208:1::/home/testuser:/usr/bin/ksh
This method works only with telnet. ssh asks always for password.
4- Login process do the validation process
a. If login fails, a record is added to /etc/security/failedlogin
b. If login is successful:
a. /etc/environment
b. /etc/security/environ
c. /etc/security/limits
a. /etc/security/user
b. /etc/profile
c. $HOME/.profile (or .dtprofile for CDE)
b. Set permissions (in more depth than operator)
c. Configure RBAC (role-based access control)
The majority of the Enhanced RBAC commands are included in the bos.rte.security fileset.
Authorizations are assigned to roles, which may then be assigned to user.
KST stands for Kernel Security Tables
o lskst
Enhanced RBAC security database to be stored in LDAP
o System-defined authorizations cannot be stored in LDAP and will remain local to
each client system.
If enhanced_RBAC of sys0 is true, RBAC is enhanced. You can change it to false to go back to
Legacy RBAC.
Predefined roles:
o ISSO (Information System Security Officer)
„h The most powerful role
o SA: (System Administrator)
„h Cannot change passwords
o SO: (System Operator)
To list the roles:
- # lsrole ALL | awk '{print $1}'
AccountAdmin
BackupRestore
DomainAdmin
FSAdmin
SecPolicy
SysBoot
SysConfig
isso
sa
so
Add role to a user: (for example add shutdown and reboot privilege to user salehi)
- # lssecattr -c /usr/sbin/reboot | awk '{print $2}'
accessauths=aix.system.boot.reboot
- # lssecattr -c /usr/sbin/shutdown | awk '{print $2}'
accessauths=aix.system.boot.shutdown
- There might be an existing role that contains above authorizations:
# lsrole ALL | grep ¡§aix.system.boot.reboot¡¨ | awk '{print $1}'
SysBoot
- Assign the role:
# lsuser -a roles salehi
salehi roles=SysBoot
# chuser roles=SysBoot Salehi
# lsuser -a roles salehi
salehi roles=SysBoot
The user itself can list the roles:
# su - salehi -c "rolelist"
SysBoot System Boot Administration
Activate the role:
- If the user does not activate a role, it is still an ordinary user without any role.
- # swrole SysBoot (switches to SysBoot role)
- # swrole ALL (switches to all user roles)
- # rolelist ¡Ve (lists effective roles)
SysBoot System Boot Administration
Role authentication:
Be default user should provide password to activate a role. Because auth_mode=INVOKER.
# lsrole -a auth_mode SysBoot
SysBoot auth_mode=INVOKER
You can change it:
# chrole auth_mode=NONE SysBoot
# lsrole -a auth_mode SysBoot
SysBoot auth_mode=INVOKER
Create a user-defined role:
The goal is to assign a role to a user to enable him to change cron settings:
# lsauth ALL | grep cron | cut -f1 -d' '
aix.system.config.cron
Only ¡§sa¡¨ (system administrator) has this authorization:
# lsrole ALL | grep aix.system.config.cron | cut -f1 -d' '
sa
So we need to define a role:
# mkrole authorizations="aix.system.config.cron" cronRole
Assign the role to the user:
# chuser roles=cronRole salehi
Read the RBAC security database files and load the information from the database files into the
Kernel Security Tables (KST):
# setkst
Now Salehi can change root¡¦s crontab:
# su ¡V Salehi
# swrole ALL
# crontan ¡Ve root
Another example:
Grant write access to /etc/hosts to operator2 (you need to create a new authorization for it):
root:/> mkauth newauth
root:/> setsecattr -f writeauths=newauth /etc/hosts
root:/> mkrole authorizations=newauth etchostsRole
root:/> chuser roles=etchostsRole operator2
root:/> setkst
root:/> su - operator2
operator1:/home/operator2> swrole ALL
operator1:/home/operator2> vi /etc/hosts
Install and Maintain AIX (11%)
a. Determine correct installation source (CD/DVD, NIM, cloning, alternate disk install, etc)
Minimum memory supported by AIX 6.1 is 265 MB.
b. Determine correct installation type (preservation, migration, new/complete overwrite)
¡§New and complete overwrite¡¨ destroys everything on the specified disks.
¡§Migration¡¨ changes the AIX version and/or release (like from 5.3 to 6.1)
¡§Preservation¡¨ method keeps user data in rootvg intact. But removes /usr, /, /var and /tmp
c. Install, check and remove updates, TLs and fixes. Describe lpp statuses and tasks (commit,
apply, or reject using lslpp), and debug install errors using lppchk
# installp -r <package_name> <== rejects an applied software
# installp -c all <== commits all
# installp -C <== cleanup after a fialed or interrupted software install
# installp -acgYd /dev/cd0 cluster.* (install, commit, requisite install, accept license, path of
source media)
d. Describe various options to acquire updates and fixes (SUMA, FLRT)
List the SUMA global configuration settings:
# suma ¡Vc
Change SUMA global configuration settings:
# suma -c -a HTTP_PROXY=http://user:pass@proxysrv:8080
Download critical fixes now:
# suma -x -a Action='download' -a RqType=' Critical'
To see the difference between available fixes and what you in /soft/AIX/6.1/AIX61TL6:
# suma -x -a Action='Preview' -a DLTarget='/TL' -a FilterDir='/soft/AIX/6.1/AIX61TL6'
FLRT stands for Fix Level Recommendation Tool an IBM useful page.
e. Install additional IBM and Open Source licensed program products (rpm, rte, bff, etc.)
f. Install and configure a basic NIM environment (what it is and what must be configured)
nimconfig: (configures the nim master. requires bos.sysmgt.nim.master)
To define a NIM master only:
# nimconfig -a netname=NIMnet0 -a pif_name=en0
niminit: (configures the nim client)
# niminit -a name=testlpar -a master=nimsrv1 -a pif_name=en0 -a netboot_kernel=mp
nim: (performs operations on NIM resources)
# nim -o allocate -a spot=spot1 -a lpp_source=lppAIX61 nimclient1
# nim -Fo reset nimclient1
# nim -Fo deallocate -a subclass=all testlpar
Lots of operations are possible, like: define, change, create, restvg, ...
nimclient: (performs NIM operations in NIM client side)
# nimclient -l (shows the resrouces)
# nimclient -Fo reset (resets the NIM client)
g. Obtain and validate system and device firmware, including considerations for 'deferred' and
'concurrent' maintenance.
Concurrent update:
Firmware that can be applied and activated on running systems.
Deferred update:
Firmware can be concurrently applied but contains some fixes that can't be activated until the
next IPL because the fixes affect the IPL path.
Disruptive upgrade/update:
A platform IPL is required to activate. None of the content contained in the release/service pack
will be activated until the next IPL.
Activated Level of firmware:
The level running in memory. Normally when you apply the firmware, it is saved in NRAM, but
in next IPL it will be loaded to memory.
Accepted Leve of firmware:
The level saved on p-side of flash.
Logical Volume, File and Filesystem management (7%)
a. Enlarge and reduce file systems
b. Describe and differentiate between physical volumes and LVMs, logical volumes, physical and
logical partitions, and physical disk and physical partition size.
c. Manage Volume Groups including mirroring (mkvg, varyonvg, varyoffvg, extendvg, exportvg,
importvg, lsvg)
Volumg group quorum:
# chvg ¡VQn testvg <== truns off quorum
If quorum if set to "y", when the volume group loses quorum of VGDAs, it will be automatically
varied off.
If a volume group loses its quorm of disks, it can be varied on only force (varyonvg -f)
d. Describe and manage different types of Logical Volumes, including mirroring.
e. Describe and manage different types of filesystems and different logging methods (mkfs, chfs,
fsck, mount, snapshot, etc.)
# umount -f <== forces the umount, even if the path busy or for remote filesysems if the remote
server is not present.
# fcsk -p <== Does not display messages about minor problems but fixes them automatically.
mounting an ISO image:
Method1 (for older AIX versions):
Create a logical volume, dd the ISO image to the LV, then mount the LV:
# mklv -y dvd_lv testvg 5G
# dd if=isofile of=/dev/dvd_lv bs=1m
# mount -v cdrfs -o ro /dev/dvd_lv /mnt
How to unmount:
¡§umount¡¨ command is used to unmount the image.
Method2 (recommended):
Using loopback device in AIX 6.1 TL4+ and VIOS:
# mkdev -c loopback -s node -t loopback # this creates loop0 once forever.
# lsdev -Cc loopback
loop0 Available Loopback Device
# loopmount -i /soft/TSM/TSMserver.iso -l loop0 -o "-V cdrfs -o ro" -m /mnt
How to unmount:
If you unmount the image using ¡§umount¡¨ command, loop0 device will not be
unconfigured. You can use loopumopunt instead:
# loopumount -l loop -m /mnt
mounting an USB flash:
snapshot:
Split-mirror backup:
# chfs -a snapdir=/backup -a copy=3 /testfs
Now you can backup /backup. When you remove /backup, /testfs will be resynced automatically
which might take a very long time with unwanted I/O load.
Question: Is there any limitation for the number of snapshots of a filesystem? something like 15
or 16?
Yes: The maximum number of external snapshots per file system is 15, while the maximum
number of internal snapshots per file system is 64.
There is another method which uses "snapshot" command and used copy-on-wirte algorithm:
Changes will go to the snapshot storage. From AIX 6.1 onwards, you can use internal snapshots,
it means the space to store snapshot is inside the filesystem itself.
Create external snapshot:
# mklv -y newsnaplv -t jfs2 datavg 4
# snapshot -o snapfrom=/mksysbs newsnaplv <== newsnaplv is the snapshot device
or
# snapshot -o snapfrom=/mksysbs -o size=128MB <== create the snapshot LV automatically
Verify:
# snapshot -q /mksysbs
Snapshots for /mksysbs
Current Location 512-blocks Free Time
/dev/newsnaplv 2097152 2096384 Mon May 16 12:37:13 2011
* /dev/fslv06 524288 523520 Mon May 16 12:38:37 2011 <==
* means current snapshot
you can mount a snapshot:
# mount -o snapshot /dev/fslv06 /mnt
¡P /mnt will contain the contents of /mksysbs when you created the snapshot. (remember
the copy-on-write method).
¡P It is mounted as read-only by default.
How to rollback: <== this will remove the snapshot
You have changed something in /mksysbs filesystem and want to rollback:
# umount /mksysbs
# rollback -v /mksysbs /dev/fslv06
Delete the snapshot:
# snapshot -d /dev/fslv06
Note:
Internal snapshot should be enabled only at filesystem creation time:
# crfs -v jfs -m /testfs -g rootvg -A yes -a isnapshot=yes -a size=1G
copcy some file to /testfs.
# snapshot -o snapfrom=/testfs -n monsnap
# rollback -v -n monsnap /testfs
Shrinking filesystem and defragfs with a snapshot is not supported.
In order to backup the snapshot of a filesystem, use "backsnap" command.
f. Configure and manage symbolic and hard links
Hard link: Two file names that refer to the same i-node
- Source and target should be in the same filesystem
- ln: cannot hard link directory (only files)
- # ln source target
- If you remove source or target, the other one still refers to the i-node and works fine. I-node
will be removed if all references (links) are deleted.
Soft/symbolic link:
- points to the name of source file/directory, not the i-node
- can be used across filesystems
- # ln -s source target
- If source is removed, target will become a dangling reference (= a pointer that points to
something that does not exist).
g. Demonstrate understanding of multipath I/O
Multipath I/O or MPIO means establishing more than one path between the two ends of an I/O
stream like between AIX and a disk subsystem. The purpose of MPIO is to provide more
resilience and/or better I/O throughput.
- AIX native MPIO supports only failover (and no load balancing) for all MPIO-capable disk
subsystems.
- Each disk vendor should provide a special device driver to provide more advanced
algorithms like round-robin, extended round-robin. Examples are IBM SDDPCM (Subsystem
Device Driver Path Control Module), Hitachi HDLM (Dynamic Link Manager), EMC
PowerPath and so forth.
- AIX native MPIO commands:
# lspath
# mkpath
# chpath
# rmpath
Problem Determination and Resolution (15%)
a. Use logs to identify problems (errlog, alog, syslog, etc.)
b. Use the diag utility
c. Use traces, truss, snap and kdb
trace:
The trace daemon records selected system events.
Trace has different data collection modes:
- Alternate (default):
o All trace events are captured in the trace log file.
o If the log file reaches the max size, file is overwritten from beginning.
- Circular:
o Circular logging occurs within trace ¡§buffer¡¨. Log file is generated only when trace is
stopped.
o Useful when user knows when the problem occurs. So, if they stop the trace exactly
after they encounter the problem, buffer contains useful information that will be
save in log file.
o # trace -l
- Single buffer:
o Trace stops when the in-memory trace buffer fills up.
o The contents of the buffer are captured in the trace log file.
o # trace -f
- Buffer Allocation:
o By default, buffers are allocated from the kernel heap.
o If requested size is not fit into kernel heap, it will be allocated in separate segments
from pinned memory.
o # trace -b or -B
The default trace log file is /var/adm/ras/trcfile. This is a binary file that should be viewed by
trcrpt.
Running trace in interactive mode:
# trace
> ! anycommand
> q
Running trace in background:
# trace -a -o /tmp/my_trace_log; anycmd; trcstop
trcrpt:
Formats a report from the trace log with the format the is implied from /etc/trcfmt.
# trcrpt -o /tmp/newfile
truss:
truss command is useful for tracing system calls in one or more processes:
A simple example:
# truss -ea hostname
execve("/usr/bin/hostname", 0x2FF22C90, 0x20012ED8) argc: 1
argv: hostname
envp: AUTHSTATE=compat TERM=xterm SHELL=/usr/bin/bash
SSH_CLIENT=10.0.62.14 1781 22 SSH_TTY=/dev/pts/0
LOCPATH=/usr/lib/nls/loc USER=root ODMDIR=/etc/objrepos
MAIL=/usr/spool/mail/root
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java5/jre/bin:/usr/java5/bin:
LOGIN=root PWD=/home/salehi LANG=C TZ=CST6CDT
PS1=\[\]\u\[\]@\[\]\h\[\]:$PWD\[\]>
SHLVL=1 HOME=/ LC__FASTMSG=true MAILMSG=[YOU HAVE NEW MAIL]
LOGNAME=root SSH_CONNECTION=10.0.62.14 1781 10.0.84.79 22
DISPLAY=salehi:0 _=/usr/bin/truss OLDPWD=/ AIXTHREAD_SCOPE=S
NLSPATH=/usr/lib/nls/msg/%L/%N:/usr/lib/nls/msg/%L/%N.cat
gethostname(0x2FF22AE4, 256) = 0
kioctl(1, 22528, 0x00000000, 0x00000000) = 0
testlpar
kwrite(1, " t e s t l p a r\n", 9) = 9
kfcntl(1, F_GETFL, 0x2FF22FFC) = 67110914
kfcntl(2, F_GETFL, 0x2FF22FFC) = 67110914
_exit(0)
As you see, ¡§-e¡¨ could be useful to find out what environment vairiables are passed to a
command or program.
snap:
snap command gathers extensive system configuration information.
To gather HACMP information:
# snap -e
To gather all system configuration except HACMP and create a compressed pax output:
# snap ¡Vca
The output pax file will be stored in /tmp/ibmsupt.
snap can be used to restore from dump device:
???
kdb:
kdb is an interactive utility that allows for the examining of a system or live dump or a running
kernel.
d. Describe and use ODM
e. Configure and use system dump devices
sysdumpdev -l and os forth...
f. Recover from a full file system
g. Troubleshoot common boot LED codes and access a system that will not boot
LEDs: 0c0...0c9 and 0cc are all related to dump
LED Description
201 Invalid boot image
223-229 Invalid boot list
551-555-557 Corrupted filesystem or JFS log
552-554-556 Corrupted superblock or ODM
553 Invalid /etc/inittab
C40 configuration files are being restored
C41 Could not determine the boot device
C42 Extracting data files from diskette
C43 Cannot access the install tape
C44 Initializing configuration database for target disks
C45 Cannot configure the console
C46 Normal installation processing
C47 Could not create PVID on disk
C48 Prompting for user input
C49 Could not create or form the JFS log
C50 Creating root volume group
C51 No paging devices were found
C52 Changing from RAM environment to disk environment
C53 /tmp is small for preservation installation
C54 Installation BOS or other packages
C55 Could not remove an LV in preservation installation
C56 Running user-defined customization
C57 Failure to restore BOS
C58 Displaying message to turn the key
C59 Could not copy info from RAM to disk
C61 Failure to create boot image
C62 Loading debug files
C63 Loading data files
C64 Failed to load data files
h. Troubleshoot installation hangs and failures
i. Debug shell script common interpreter problems (ksh, etc)
j. Recover a logical volume
k. Find and correct corrupted filesystems, superblocks, etc.
Process and Performance Management and Tuning (9%)
a. Use the system resource manager
b. Understand and use Workload Manager (WLM) at a basic level
# wlmassign --> Manually assigns processes to a Workload Management class
# mkclass -> Creates a Workload Management class
# lsclass
# chclass
# rmclass
# lswlmconf
# wlmstat
# wlmcntrl -->Starts or stops the Workload Manager.
# confsetcntrl
c. Use cron and at at a detailed level
The format of crontab file:
minute hour day_of_month month weekday command
d. Use tuning tools and parameters (ioo, vmo, no, /etc/tunables, etc)
e. Use performance monitoring tools (topas, netstat, vmstat, lvmstat, iostat, svmon, nmon)
f. Monitor and change process execution (ps, nice, kill)
Planning and Documentation (11%)
a. Understand Workload Partitions (WPARs) and when to use them
WPAR products consists of two parts:
The part that is included in AIX 6.1
WPAR products consists of two parts:
¡P The part that is included in AIX 6.1
¡P Workload Partition Manager.
¡P WPAR managre help "Live Application Mobilty" (even automatic mobility)
¡P Each WPAR uses /usr and /opt as read-only.
WPAR types:
¡P System partitoin
It is a miniture copy of AIX.
Create --> (defined state) --> run (active state) --> stop --> (defined state) --> remove
¡P application partition
The idea is that we put a WPAR around an application. When the applications start, WPAR is
created, and when it stoped, WPAR would be removed.
Basic commands:
# mkwpar -n wpar1
# lswpar
# startwpar wpar1
# stopwpar wpar1
Applicatioin mobity:
chkpwpar <-- checkpoints (or freezed the partitoin to a statefile)
restartwpar <-- resumes a WPAR probably on a different machine.
When you create a WPAR, in order to mark it as a mobile workload partition you need to specify
an NFS server. This NFS server will hold the state of WPAR during mobility.
You cannot move a WPAR to a different hardware version (like POWER5 to POWER6).
b. Plan HMC configuration (networking, redundancy, users, security, etc.)
c. Describe the use and function of VIO
d. Partition planning (micropartitioning, memory planning, HEA/IVE, processor allocation, etc)
e. Document a system (sysplan, etc)
f. Find appropriate resources (info center, key center, etc.)
g. Determine system redundancy requirements (avoiding single points of failure)
h. Describe applicability and use of Capacity on Demand
Permanent:
¡P It is a purchage agreement
¡P You cannot turn it off
¡P One processor or one GB or memroy
Trial CoD
¡P 30 contiguous days
On/Off CoD
¡P Temporary additonal processor or memory
¡P Activity is reported monthly to IBM
¡P Charged vased on number of days, even one minutes!
¡P Monthly charge
Utility CoD
¡P Similar to on/off, but charge is based on minutes rather than days.
¡P For Power6+
Capacity Backup:
¡P Reserve capacity for backup server
¡P Works up to 90 days
HMC and Partition Management (6%)
a. Apply HMC and Server fixes
b. Define, add, remove resources from an LPAR (DLPAR and partition profiles, etc.)
c. Backup and restore the HMC
d. Use the HMC and ASMI interface,
e. Understand and use IVM (options, functions, etc.)
f. Configure and use electronic service agent
ESA is a free software on AIX 5.3 TL6+ and if configured properly, sends error information to IBM to
aid in problem resolution.
ESA client is freely available on all IBM systems plus DS8000.
# smit esa_main
Starting electronic service agent:
# startsrc -s IBM.ESAGENT
Miscellaneous:
multibos:
¡P Manipulates multiple versions of BOS in rootvg. It means you have more than one operating
system in the rootvg disks. Except /, /usr, /var and /opt, all other filesystems and logical volumes
would be shared between BOS instances.
¡P It is like alternate disk install, but does not require additional disks.
¡P choosing between BOS instances is possible when you set boot list
¡P Setup:
# multibos -R <== Removes all standby BOS objects
# multibos -sXp <==To perform a standby BOS setup operation preview
# multibos -sX <==To perform a standby BOS setup operation
# multibos -sXp -M /soft/mksysb1 <==To perform a standby BOS setup operation preview from
an existing mksysb
# bootlist -m normal -o
hdisk0 blv=bos_hd5 pathid=1
hdisk0 blv=hd5 pathid=1
To make sure you are booting from the right instance, compare the boot device when AIX is
starting in SMS with what bootlist shows:
# bootlist -m normal -ov
'ibm,max-boot-devices' = 0x5
NVRAM variable: (boot-device=/vdevice/v-scsi@30000002/disk@8100000000000000:4
/vdevice/v-scsi@30000002/disk@8100000000000000:2)
Path name: (/vdevice/v-scsi@30000002/disk@8100000000000000:4)
match_specific_info: ut=disk/vscsi/vdisk
hdisk0 blv=bos_hd5 pathid=1
Path name: (/vdevice/v-scsi@30000002/disk@8100000000000000:2)
match_specific_info: ut=disk/vscsi/vdisk
hdisk0 blv=hd5 pathid=1
# alog -of /etc/multibos/logs/op.alog <== to view the log
# lsvg rootvg -l | grep bos_
bos_hd5 boot 1 1 1 closed/syncd N/A
bos_hd4 jfs2 10 10 1 closed/syncd /bos_inst
bos_hd2 jfs2 70 70 1 closed/syncd /bos_inst/usr
bos_hd9var jfs2 12 12 1 closed/syncd /bos_inst/var
bos_hd10opt jfs2 13 13 1 closed/syncd /bos_inst/opt
# multibos -S <== initiates an interactive session to the standby BOS
# multibos -Xac -l /TL <== applies a TL on standby BOS
How to change back the bootlist:
# bootlist -m normal -o hdisk0 blv=hd5
Encrypted filesystem:
EFS helps to protect data on filesystem by assigning each user a unique encryption key. When a user
requests access to a file, kernel checks the credentials. The cryptographic information is kept in the
extended attribute of the file. This is an additional granularity and flexibility to traditional access
permissions.
- How to enable EFS:
# efsenable -av
This will create /var/efs directory (that keeps keystores) and alters /etc/security/user and
group.
- Create two EFS-enabled filesystem:
# crfs -v jfs2 -g rootvg -m /sales -a size=100M -a efs=yes
# crfs -v jfs2 -g rootvg -m /finance -a size=100M -a efs=yes
- Make users to access each filesystem:
# mkuser saleman; passwd salesman
# mkuser financeman; passwd financeman
- passwd in previous step, causes to create a separate directory (here called keystore) for the
user in /etc/efs/users:
# ls /var/efs/users/
total 0
-rw------- 1 root system 0 Apr 26 05:52 .lock
drwx------ 2 root system 256 Apr 26 06:08 finance
drwx------ 2 root system 256 Apr 26 05:52 root
drwx------ 2 root system 256 Apr 26 06:08 sales
- demostration:
# mount /finance
# su -finance
# mkdir -p /finance/yearlyreport
# chmod -R 777 /finance/yearlyreport „h look at full permission
# efsmgr -E /finance/yearlyreport „h enables efs for the directory
# efsmgr -L /finance/yearlyreport „h list
EFS inheritance is set with algorithm: AES_128_CBC
Login back:
# su - finance
# touch /finance/yearlyreport/anewfile
touch: /finance/yearlyreport/anewfile cannot create
But you can load the keystore and run a command:
# efskeymgr ¡Vo <thecommand>
# efskeymgr ¡Vo bash „h this will open a bash session
Now you can touch the file.
# ls ¡VU „h for security information
drwxrwxrwxe 2 finance staff 256 Apr 26 08:29 yearlyreport
Some HMC tips:
¡P HMC web acces port is 443
¡P Each POWER system has three users by default in ASM: admin, general and HMC. The HMC user
is the one hardware management console uses to be authenticated against when it discovers
the machine.
Trusted Execution:
Trusted Execution is a security feature of AIX 6.1. To some extent it is similar to TCB, but:
¡P TCB should enabled at installation phase.
¡P TCB checks the integrity in time intervals using cron.
¡P TE check the integrity of command when they are invoked.
SEA on HEA:
Is SEA possible on HEA in promiscuous mode?
Answer: Yes
sugroup:
http://www.ibm.com/developerworks/aix/library/au-sugroup/index.html
/etc/objrepos/errnotify:
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.baseadmn/doc/b
aseadmndita/HT_baseadmn_missingpv.htm
and
http://www.blacksheepnetworks.com/security/resources/aix-error-notification.html
Disabling JFS2 logging:
# mount -o log=NULL /testfs
Add more ¡K.
Hope this helps,
Mehdi
Twitter Bird Gadget