Checklist for DLPAR setup (RSCT)
hsc version
to find out the HMC's version number. The LPAR AIX should also be upgraded to 5.1F or 5.20 or later.Table 1. Requirements
Verify | > Command -> Result -> Action | |
1 | HMC level | Click Help then About on the HMC main window. This should be release 3 version 2.4 or higher. I encourage you to get the latest software, at HMC Corrective Service. |
2 | AIX filesets required for DLPAR (on each LPAR/SMP) | AIX level should 5.2 or later. These filesets should be on partitions: > lslpp -l rsct.core* > lslpp -l csm.client The missing fileset can be installed from the AIX installation CD. |
Table 2. Verify whether the required software is functional
Verify | Verify Result and Action | |||||||||||||||||||||||||
1 | HMC daemons are running | > su - root > lssrc -a
> startsrc -s <subsystem name> . For example, startsrc -s ctrmc If a daemon can't be started, please contact IBM service personnel. | ||||||||||||||||||||||||
2 | AIX daemons are running (on each LPAR/SMP with AIX 5.20) | > su - root > lssrc -a | grep rsct
If AIX is 51G, 52B or above, ctcas is also a lazy started RM, meaning it can stay inoperative until they get used, so it's not necessary to start it if your LPAR is at these levels or later. If any of the daemons show as inoperative, use > startsrc -s <subsystem name> . For example, startsrc -s IBM.CSMAgentRM . |
ctrmc
- Is a Resource Monitoring and Control (RMC) subsystem.
- ctcas
- Is for security verification. It is a lazy started resource manager and does not have to run in order for DLPAR to work.
- IBM.DMSRM
- Is for tracking status of partitions.
- IBM.LparCmdRM
- Is for DLPAR operation on HMC.
- IBM.CSMAgentRM
- Is for handshaking between the LPAR and HMC.
- IBM.DRM
- Is for executing the DLPAR command on the LPAR.
- IBM.HostRM
- Is for obtaining OS information.
Table 3. Verify your RMC/DLPAR network/hostname setup
Verify | >Result and Actions | |
1 | HMC: List partitions authenticated by RMC | > /opt/csm/bin/lsnodes -a Status partition01 1 partition02 0 partition03 1 Where 1 means LPAR is activated and authenticated for DLPAR; 0 means otherwise. If the LPAR is activated and still shows Active<0>, you could have either network or hostname setup problems. If you have just rebooted the HMC, wait for a few minutes. If nothing changes after that, check your hostname/network setup in Setting up HMC/partitions hostname and network. |
2 | HMC: List partitions recognized by DLPAR | > lspartition -dlpar <#0> Partition:<001, partition01.company.com, 9.3.206.300> Active:<1>, OS:<AIX, 5.2>, DCaps:<0xf>, CmdCaps:<0x1, 0x0> <#1> Partition:<002, partition02.company.com, 9.3.206.300> Active:<0>, OS:<AIX, 5.2>, DCaps:<0xf>, CmdCaps:<0x1, 0x0> <#2> Partition:<003, partition03.company.com, 9.3.206.300> Active:<0>, OS:<, 5.1F>, DCaps:<;0x0>, CmdCaps:<0x0, 0x0> If all active AIX 5.2 partitions are listed as Active<1>, ..., DCaps:<0xf> your system has been set up properly for DLPAR, and you can skip the checklist now. (In this example, LPAR 002 is being shut down, and LPAR 003 is not activated because it is at AIX 5.1.) If you're missing some active partitions or some partitions are reported as Active<0>, your system probably still has a network/hostname set up problem. See Setting up the HMC/partitions hostname and network. (If your LPAR is Active<1> but the GUI is still not DLPAR capable, do a rebuild to get around this problem. See the Appendixes in this article for more information.) If you still can't get partitions recognized by DLPAR after verifying the checklist, contact IBM service personnel. |
3 | AIX: Ensure /var directory is not 100% full (on each LPAR/SMP) | > df If /var
is 100% full, use smitty to expand it. If there is no more space
available, visit subdirectories to remove unnecessary files (using trace.* , core , and so on). After expanding the /var directory, execute the following commands to fix possibly corrupted files:> rmrsrc -s "Hostname!='t' " IBM.ManagementServer > /usr/sbin/rsct/bin/rmcctrl -z > rm /var/ct/cfg/ct_has.thl > rm /var/ct/cfg/ctrmc.acls > /usr/sbin/rsct/bin/rmcctrl -A |
4 | AIX: Verify if you have network problem (from each LPAR/SMP) | > ping <hmc_hostname> If ping fails, check your hostname/network setup. See Setting up HMC/partitions hostname and network. |
5 | AIX: Verify LPAR(s) to HMC authentication (from each LPAR/SMP) | > CT_CONTACT=<HMC name> lsrsrc IBM.ManagedNode You should get a list of resource classes on HMC.If there is any error, you probably have network/hostname problem, please refer to section Setting up HMC /Partition(s) Hostname and Network. |
6 | HMC: Verify network setup by telnet-ing into each LPAR(s) from the HMC | > telnet <hostname> > Ctrl c or exit to end If you can't telnet, you have a network problem. See Setting up HMC/partitions hostname and network. |
7 | HMC: Verify HMC to LPAR(s) authentication | > CT_CONTACT =<lpar_hostname> lsrsrc IBM.ManagementServer If nothing is displayed or if there are any errors, you probably have a hostname problem. See Setting up HMC/partitions hostname and network. |
First, find out the IP address and hostname format of the HMC and its LPAR(s), which can be determined by using the command
hostname
on the HMC and AIX system respectively, then use host return_from_hostname
to verify it. For example, > hostname Partition.company.com > host Partition.company.com Partition.company.com has address 9.3.14.199 |
If DNS is off or if the HMC and partitions are on different subnets
The HMC and LPAR(s) /etc/hosts files need to be modified to contain the correct entries for the HMC and all partitions' hostnames. Put the host name in /etc/hosts for HMC and all partitions (names are case sensitive).
Make sure that on the LPAR(s) (just LPAR), the file /etc/netsvc.conf exists with one line:
hosts=local,bind
.Refresh RMC by either rebooting or
> /usr/sbin/rsct/bin/rmcctrl -z > /usr/sbin/rsct/bin/rmcctrl -A |
Customers should add all the LPAR's hostnames to the /etc/hosts file on the HMC. The HMC hostname must be added to each LPAR /etc/hosts file. Because the customer does not have DNS data, we do not have a domain name, only a short hostname, so the DNS enabled box will not be enabled.
To access Linux on the HMC
To access the xterm on the HMC (Command Line Entry) you will need a PE passcode, which can be obtained from IBM support. To access the Linux command line:
- Log on HMC as hscpe user (user created by customer)
- Select Problem Determination (In "Service Applications" folder at Release 2.0 and above)
- Select Microcode Maintenance
- Enter serial number of the HMC and PE password obtained from support
- Select Launch xterm shell
If you have ssh set up, from the upper left corner click on Console then select Open Terminal Session, and enter your HMC hostname.
Use LAN Surveillance to check your network problem
Since HMC version 3.1, the LAN Surveillance feature has been added into Service Focal Point (SFP) to alert users if an LPAR is having a network/hostname setup/RMC authentication problem by reporting a SURVALNC Serviceable Event to the HMC. (Users can have e-mail set up for notification of this type of problem.) You can use "List Serviceable Events" to check for these errors; if there are none, you should not have problems with DLPAR/SFP. If there are errors, please go through the checklist to diagnose and correct the problem.
A quick way to verify if the system Network/Hostname is set up properly
From the HMC console, select "Server Management" then expand it to the LPAR level. Left click on an AIX 520 LPAR to get the pop-up menu, then select one of the items under "Dynamic Logical Partition" (for example, Memory). If you get the error messages
HMCERRV3DLPAR016: The selected logical partition is not enabled for dynamic logical partitioning operations |
then there's a good chance the system is having a network/hostname setup problem. Please go through the checklist to diagnose and correct the problem. It's best to perform this procedure right after the HMC gets rebooted.
Check serial cable connection between HMC and CEC
On HMC:
> query_cecs - returns cecname > get_cec_mode -m cecname - verifies connection to service processor |
Checking for HMC version from ssh
Use the
hsc version
command.Is there a relation between DLPAR/LparCmdRM and SFP/ServiceRM?
No, there is no relation between DLPAR and SFP. They are two independent daemons serving two different components. But, they're using the same RMC framework and thus subjected to the same authentication process, as well as the same network/hostname setup.
DLPAR is only supported on AIX 5.2; AIX 5.1x partitions will be not be initialized for DLPAR. It is correct to assume that if SFP works, DLPAR would work, but if DLPAR works, SFP might not be fully functional.
Authentication and authorization process between HMC and partitions
- On HMC: DMSRM pushes down the secret key and HMC hostname to NVRAM when it detects a new CEC. This process is repeated every 5 minutes. Each time an HMC is rebooted or DMSRM is restarted, a new key is used.
- On AIX: CSMAgentRM, through RTAS, reads the key and HMC hostname out from NVRAM. It will then authenticate the HMC. This process is repeated every 5 minutes on LPAR to detect new HMC(s) and key changes. An HMC with a new key is treated as a new HMC and will go though the authentication and authorization processes again.
- On AIX: After authenticating the HMC, CSMAgentRM will contact the DMSRM on HMC to create a ManagedNode resource in order to identify itself as an LPAR of this HMC. (At the creation time, the ManagedNode's Status attribute will be set to 127.) CSMAgentRM then creates a compatible ManagementServer resource on AIX.
- On AIX: After the creation of the ManagedNode and ManagementServer resources on HMC and AIX respectively, CSMAgentRM grants HMC permission to access necessary resource classes on the LPAR. After the granting HMC permission, CSMAgentRM will change its ManagedNode, on HMC, Status to 1. Without proper permission on AIX, the HMC would be able to establish a session with the LPAR but will not be able to query for OS information, DLPAR capabilities, or execute DLPAR commands afterward.
- On
HMC: After the ManagedNode Status changed to 1, LparCmdRM querries for
OS information, DLPAR capabilities, notifies CIMOM about the DLPAR
capabilities of the LPAR, then waits for a DLPAR command from users. If the partitions support DLPAR capabilities,
lsparittion -dlpar
will list partitions with Active:<1> and DCaps:<0xf>.
Intended as a development tool, the output of
lspartition -dlpar
has following meaning: <#0> Partition:<002, lpar.company.com, 9.8.206.215> Active:<1>, OS:<AIX, 5.2>, DCaps:<0xf>, CmdCaps:<0x1, 0x0> |
- Partition
- <LParID, lpar_hostname, lpar IPaddress>
- Active
- <#>: - 0 means no session to lpar; 1 means otherwise
- OS
- <OSType, OSLevel>: Should be <AIX, 5.2> if it's Active<1>. If Active<1> and OS information is empty, this means the IBM.HostRM could have a problem on AIX. (I have not seen this happens yet!)
- DCaps
- <#>: - Value 0x0 means the LPAR does not support DLPAR operation. Value 0xf means all DLPAR operations are supported. Usually, this value goes together with the Active<1> above. The session must be established first before the information can be queried from the LPAR.
- CmdCaps
- <0x1, 0x0>: - No significant meaning for Release 3, version 1.x.x. In Release 3, version 2.2.x or above, the 0x1 means remote shutdown of the AIX LPAR can be done from the HMC.
No. IBM.DRM and IBM.HostRM are lazy start resource managers and could be in inoperative state if they're not used. They will be started as soon as the first IBM.DRM request is made from HMC. If
lssrc -a
on
an LPAR shows IBM.DRM as inoperative, it is likely that HMC has never
made a connection with the LPAR since it was rebooted or upgraded to a
new level of AIX. In this case, on HMC, the command lspartition -dlpar
would show the LPAR as Active<0>. To diagnose and fix this type of problem, see "Table 3. Verify Your RMC/DLPAR Network/Hostname setup" in Checklist for DLPAR setup.
Does IBM.CSMAgentRM need to be active on HMC?
No. IBM.CSMAgentRM is not required for DLPAR/SFP. The CSMAgentRM .cdef file is shipped on HMC to support Distributed RMC, and therefore it will be listed as "inoperative."
Note: This section applies only to earlier versions of HMC and is here for reference only.Most DLPAR problems we've encountered from the test labs have been improper network and hostname(s) setup. This section triess to reduce these network setup and configuration problems. For older releases of AIX (pre 5.1F and 2.0) or HMC (release 3, version 1.x or earlier), the hostname format, long or short, also requires some setup.
First, find out the hostname format that the HMC and its LPAR(s) are using -- short or long name. Your setup depends largely on the format of the hostname. The hostname format can be determined by typing the command
hostname
on the HMC and AIX system respectively, then use host return_from_hostname
to verify it. For example: > hostname Partition.company.com > host Partition.company.com Partition.company.com has address 9.3.14.199 |
If DNS is On
This section is mostly applicable if the LPAR AIX level is at 5.1F/5.20 or ealier ,or HMC is at Release 3, version 1.x or ealier.
If the HMC and the partition(s) both use a long name
- No hostname entry is needed in /etc/hosts on either AIX or the HMC.
- If
/etc/hosts has the hostname entry, the longname must be before the
short name for HMC and all partitions (host names are case sensitive).
For example,
10.10.10.11 mymachine.mycompany.com mymachine
- After you update /etc/host file, refresh RMC by either rebooting or
> /usr/sbin/rsct/bin/rmcctrl -z > /usr/sbin/rsct/bin/rmcctrl -A
If the hostname command returns the short name, put the short name before the long name in /etc/hosts on both HMC and the partitions. If the
hostname
command returns the long name, put the long name before the short name
in the /etc/hosts file on HMC and all partitions (names are case
sensitive). For example, 10.10.10.11 mymachine mymachine.mycompany.com |
Make sure that on the LPAR(s) (just lpar(s)), the file /etc/netsvc.conf exists with one line: hosts=local,bind
Refresh RMC by either rebooting or by commands
> /usr/sbin/rsct/bin/rmcctrl -z > /usr/sbin/rsct/bin/rmcctrl -A |
If DNS is Off
This section is mostly applicable if the LPAR AIX level is at 5.1F/5.20 or ealier, or HMC is at Release 3, version 1.x or ealier.
- The HMC and LPAR(s) /etc/hosts file need to be modified to contain the correct entries for the HMC and all partitions' hostnames. If the hostname command returns the short name, put the short name before the long name in /etc/hosts for HMC and all partitions (names are case sensitive). If the hostname command returns the long name, put the long name before the short name in /etc/hosts for HMC and all partitions (names are case sensitive).
- Make sure that on the LPAR(s) (just LPAR), the file /etc/netsvc.conf exists with one line: hosts=local,bind.
- Refresh RMC by either rebooting or
> /usr/sbin/rsct/bin/rmcctrl -z > /usr/sbin/rsct/bin/rmcctrl -A
No comments:
Post a Comment