Pages

Wednesday 15 August 2012

Checklist for DLPAR setup (RSCT)

 Checklist for DLPAR setup (RSCT)

It is strongly recommended that the Hardware Management Console (HMC) be upgraded to Release 3, Level 2.2.x or above to minimize setup time for dynamic logical partitioning (DLPAR) operations. Run hsc version to find out the HMC's version number. The LPAR AIX should also be upgraded to 5.1F or 5.20 or later.

Table 1. Requirements

Verify> Command -> Result -> Action
1HMC levelClick Help then About on the HMC main window. This should be release 3 version 2.4 or higher. I encourage you to get the latest software, at HMC Corrective Service.
2AIX filesets required for DLPAR (on each LPAR/SMP)AIX level should 5.2 or later. These filesets should be on partitions:
> lslpp -l rsct.core*
> lslpp -l csm.client
The missing fileset can be installed from the AIX installation CD.

Use the tables in this section as a checklist for your DLPAR setup.


Table 2. Verify whether the required software is functional

VerifyVerify Result and Action
1HMC daemons are running> su - root
> lssrc -a
SubsystemGroupPIDStatus
ctrmcrsct822active
IBM.DMSRMrsct_rm906active
IBM.LparCmdRMrsct_rm901active
If these daemons are active go to step 2. If any of the daemons show as inoperative, start it manually by > startsrc -s <subsystem name>. For example, startsrc -s ctrmc
If a daemon can't be started, please contact IBM service personnel.
2AIX daemons are running (on each LPAR/SMP with AIX 5.20)> su - root
> lssrc -a | grep rsct
SubsystemGroupPIDStatus
ctrmcrsct21044active
IBM.CSMAgentRMrsct_rm 21045active
IBM.ServiceRMrsct_rm11836active
IBM.DRMrsct_rm20011active
IBM.HostRMrsct_rm 20012active
IBM.DRM and IBM.HostRM are "lazy started" resource managers; they're only started when they get used. If IBM.DRM and IBM.HostRM are inoperative, there's a good chance you're having network/hostname setup problem between HMC and the LPAR. To correct this, see Table 3. Verify Your RMC/DLPAR setup, below.
If AIX is 51G, 52B or above, ctcas is also a lazy started RM, meaning it can stay inoperative until they get used, so it's not necessary to start it if your LPAR is at these levels or later.
If any of the daemons show as inoperative, use > startsrc -s <subsystem name>. For example, startsrc -s IBM.CSMAgentRM.

ctrmc
Is a Resource Monitoring and Control (RMC) subsystem.
ctcas
Is for security verification. It is a lazy started resource manager and does not have to run in order for DLPAR to work.
IBM.DMSRM
Is for tracking status of partitions.
IBM.LparCmdRM
Is for DLPAR operation on HMC.
IBM.CSMAgentRM
Is for handshaking between the LPAR and HMC.
IBM.DRM
Is for executing the DLPAR command on the LPAR.
IBM.HostRM
Is for obtaining OS information.

Table 3. Verify your RMC/DLPAR network/hostname setup

Verify>Result and Actions
1HMC: List partitions authenticated by RMC> /opt/csm/bin/lsnodes -a Statuspartition01 1
partition02 0
partition03 1
Where 1 means LPAR is activated and authenticated for DLPAR; 0 means otherwise.
If the LPAR is activated and still shows Active<0>, you could have either network or hostname setup problems. If you have just rebooted the HMC, wait for a few minutes. If nothing changes after that, check your hostname/network setup in Setting up HMC/partitions hostname and network.
2HMC: List partitions recognized by DLPAR> lspartition -dlpar <#0> Partition:<001, partition01.company.com, 9.3.206.300> Active:<1>, OS:<AIX, 5.2>, DCaps:<0xf>, CmdCaps:<0x1, 0x0>
<#1> Partition:<002, partition02.company.com, 9.3.206.300> Active:<0>, OS:<AIX, 5.2>, DCaps:<0xf>, CmdCaps:<0x1, 0x0>
<#2> Partition:<003, partition03.company.com, 9.3.206.300> Active:<0>, OS:<, 5.1F>, DCaps:<;0x0>, CmdCaps:<0x0, 0x0>
If all active AIX 5.2 partitions are listed as Active<1>, ..., DCaps:<0xf> your system has been set up properly for DLPAR, and you can skip the checklist now. (In this example, LPAR 002 is being shut down, and LPAR 003 is not activated because it is at AIX 5.1.)
If you're missing some active partitions or some partitions are reported as Active<0>, your system probably still has a network/hostname set up problem. See Setting up the HMC/partitions hostname and network. (If your LPAR is Active<1> but the GUI is still not DLPAR capable, do a rebuild to get around this problem. See the Appendixes in this article for more information.)
If you still can't get partitions recognized by DLPAR after verifying the checklist, contact IBM service personnel.
3AIX: Ensure /var directory is not 100% full (on each LPAR/SMP)> dfIf /var is 100% full, use smitty to expand it. If there is no more space available, visit subdirectories to remove unnecessary files (using trace.*, core, and so on). After expanding the /var directory, execute the following commands to fix possibly corrupted files:
> rmrsrc -s "Hostname!='t' " IBM.ManagementServer
> /usr/sbin/rsct/bin/rmcctrl -z
> rm /var/ct/cfg/ct_has.thl
> rm /var/ct/cfg/ctrmc.acls
> /usr/sbin/rsct/bin/rmcctrl -A
4AIX: Verify if you have network problem (from each LPAR/SMP)> ping <hmc_hostname>If ping fails, check your hostname/network setup. See Setting up HMC/partitions hostname and network.
5AIX: Verify LPAR(s) to HMC authentication (from each LPAR/SMP)> CT_CONTACT=<HMC name> lsrsrc IBM.ManagedNodeYou should get a list of resource classes on HMC.
If there is any error, you probably have network/hostname problem, please refer to section Setting up HMC /Partition(s) Hostname and Network.
6HMC: Verify network setup by telnet-ing into each LPAR(s) from the HMC> telnet <hostname>
> Ctrl c or exit to end If you can't telnet, you have a network problem. See Setting up HMC/partitions hostname and network.
7HMC: Verify HMC to LPAR(s) authentication> CT_CONTACT =<lpar_hostname> lsrsrc IBM.ManagementServerIf nothing is displayed or if there are any errors, you probably have a hostname problem. See Setting up HMC/partitions hostname and network.

For older releases of AIX (pre 5.1F and 2.0) or HMC (release 3, version 1.x or earlier), the hostname format, long or short, also requires some setup. Please refer to the Appendixes for further set up instructions if your HMC or partitions are at these levels.
Most DLPAR problems we've encountered from the test labs have been improper network. This section tries to reduce these network setup and configuration problems.
First, find out the IP address and hostname format of the HMC and its LPAR(s), which can be determined by using the command hostname on the HMC and AIX system respectively, then use host return_from_hostname to verify it. For example,
> hostname
Partition.company.com
> host Partition.company.com
Partition.company.com has address 9.3.14.199

If DNS is off or if the HMC and partitions are on different subnets
The HMC and LPAR(s) /etc/hosts files need to be modified to contain the correct entries for the HMC and all partitions' hostnames. Put the host name in /etc/hosts for HMC and all partitions (names are case sensitive).
Make sure that on the LPAR(s) (just LPAR), the file /etc/netsvc.conf exists with one line: hosts=local,bind.
Refresh RMC by either rebooting or
  > /usr/sbin/rsct/bin/rmcctrl -z
  > /usr/sbin/rsct/bin/rmcctrl -A

Customers should add all the LPAR's hostnames to the /etc/hosts file on the HMC. The HMC hostname must be added to each LPAR /etc/hosts file. Because the customer does not have DNS data, we do not have a domain name, only a short hostname, so the DNS enabled box will not be enabled.

This appendix offers a variety of tips about accessing Linux and a variety of other things.
To access Linux on the HMC
To access the xterm on the HMC (Command Line Entry) you will need a PE passcode, which can be obtained from IBM support. To access the Linux command line:
  1. Log on HMC as hscpe user (user created by customer)
  2. Select Problem Determination (In "Service Applications" folder at Release 2.0 and above)
  3. Select Microcode Maintenance
  4. Enter serial number of the HMC and PE password obtained from support
  5. Select Launch xterm shell
Alternate way to open an Xterm on the HMC
If you have ssh set up, from the upper left corner click on Console then select Open Terminal Session, and enter your HMC hostname.
Use LAN Surveillance to check your network problem
Since HMC version 3.1, the LAN Surveillance feature has been added into Service Focal Point (SFP) to alert users if an LPAR is having a network/hostname setup/RMC authentication problem by reporting a SURVALNC Serviceable Event to the HMC. (Users can have e-mail set up for notification of this type of problem.) You can use "List Serviceable Events" to check for these errors; if there are none, you should not have problems with DLPAR/SFP. If there are errors, please go through the checklist to diagnose and correct the problem.
A quick way to verify if the system Network/Hostname is set up properly
From the HMC console, select "Server Management" then expand it to the LPAR level. Left click on an AIX 520 LPAR to get the pop-up menu, then select one of the items under "Dynamic Logical Partition" (for example, Memory). If you get the error messages
HMCERRV3DLPAR016: The selected logical partition is not enabled for 
dynamic logical partitioning operations

then there's a good chance the system is having a network/hostname setup problem. Please go through the checklist to diagnose and correct the problem. It's best to perform this procedure right after the HMC gets rebooted.
Check serial cable connection between HMC and CEC
On HMC:
> query_cecs  -  returns cecname
> get_cec_mode -m cecname  -  verifies connection to service processor

Checking for HMC version from ssh
Use the hsc version command.

This appendix provides answers to some common questions about RMC/DLPAR from the field.
Is there a relation between DLPAR/LparCmdRM and SFP/ServiceRM?
No, there is no relation between DLPAR and SFP. They are two independent daemons serving two different components. But, they're using the same RMC framework and thus subjected to the same authentication process, as well as the same network/hostname setup.
DLPAR is only supported on AIX 5.2; AIX 5.1x partitions will be not be initialized for DLPAR. It is correct to assume that if SFP works, DLPAR would work, but if DLPAR works, SFP might not be fully functional.
Authentication and authorization process between HMC and partitions
  1. On HMC: DMSRM pushes down the secret key and HMC hostname to NVRAM when it detects a new CEC. This process is repeated every 5 minutes. Each time an HMC is rebooted or DMSRM is restarted, a new key is used.
  2. On AIX: CSMAgentRM, through RTAS, reads the key and HMC hostname out from NVRAM. It will then authenticate the HMC. This process is repeated every 5 minutes on LPAR to detect new HMC(s) and key changes. An HMC with a new key is treated as a new HMC and will go though the authentication and authorization processes again.
  3. On AIX: After authenticating the HMC, CSMAgentRM will contact the DMSRM on HMC to create a ManagedNode resource in order to identify itself as an LPAR of this HMC. (At the creation time, the ManagedNode's Status attribute will be set to 127.) CSMAgentRM then creates a compatible ManagementServer resource on AIX.
  4. On AIX: After the creation of the ManagedNode and ManagementServer resources on HMC and AIX respectively, CSMAgentRM grants HMC permission to access necessary resource classes on the LPAR. After the granting HMC permission, CSMAgentRM will change its ManagedNode, on HMC, Status to 1. Without proper permission on AIX, the HMC would be able to establish a session with the LPAR but will not be able to query for OS information, DLPAR capabilities, or execute DLPAR commands afterward.
  5. On HMC: After the ManagedNode Status changed to 1, LparCmdRM querries for OS information, DLPAR capabilities, notifies CIMOM about the DLPAR capabilities of the LPAR, then waits for a DLPAR command from users. If the partitions support DLPAR capabilities, lsparittion -dlpar will list partitions with Active:<1> and DCaps:<0xf>.
What does the output of lspartition -dlpar mean?
Intended as a development tool, the output of lspartition -dlpar has following meaning:
<#0> Partition:<002, lpar.company.com, 9.8.206.215>
        Active:<1>, OS:<AIX, 5.2>, DCaps:<0xf>, CmdCaps:<0x1, 0x0>

Partition
<LParID, lpar_hostname, lpar IPaddress>
Active
<#>: - 0 means no session to lpar; 1 means otherwise
OS
<OSType, OSLevel>: Should be <AIX, 5.2> if it's Active<1>. If Active<1> and OS information is empty, this means the IBM.HostRM could have a problem on AIX. (I have not seen this happens yet!)
DCaps
<#>: - Value 0x0 means the LPAR does not support DLPAR operation. Value 0xf means all DLPAR operations are supported. Usually, this value goes together with the Active<1> above. The session must be established first before the information can be queried from the LPAR.
CmdCaps
<0x1, 0x0>: - No significant meaning for Release 3, version 1.x.x. In Release 3, version 2.2.x or above, the 0x1 means remote shutdown of the AIX LPAR can be done from the HMC.
Do IBM.DRM and IBM.HostRM need to be "Active" on partition for DLPAR to work?
No. IBM.DRM and IBM.HostRM are lazy start resource managers and could be in inoperative state if they're not used. They will be started as soon as the first IBM.DRM request is made from HMC. If lssrc -a on an LPAR shows IBM.DRM as inoperative, it is likely that HMC has never made a connection with the LPAR since it was rebooted or upgraded to a new level of AIX. In this case, on HMC, the command lspartition -dlpar would show the LPAR as Active<0>.
To diagnose and fix this type of problem, see "Table 3. Verify Your RMC/DLPAR Network/Hostname setup" in Checklist for DLPAR setup.
Does IBM.CSMAgentRM need to be active on HMC?
No. IBM.CSMAgentRM is not required for DLPAR/SFP. The CSMAgentRM .cdef file is shipped on HMC to support Distributed RMC, and therefore it will be listed as "inoperative."

Note: This section applies only to earlier versions of HMC and is here for reference only.
Most DLPAR problems we've encountered from the test labs have been improper network and hostname(s) setup. This section triess to reduce these network setup and configuration problems. For older releases of AIX (pre 5.1F and 2.0) or HMC (release 3, version 1.x or earlier), the hostname format, long or short, also requires some setup.
First, find out the hostname format that the HMC and its LPAR(s) are using -- short or long name. Your setup depends largely on the format of the hostname. The hostname format can be determined by typing the command hostname on the HMC and AIX system respectively, then use host return_from_hostname to verify it. For example:
> hostname
Partition.company.com
> host Partition.company.com
Partition.company.com has address 9.3.14.199

If DNS is On
This section is mostly applicable if the LPAR AIX level is at 5.1F/5.20 or ealier ,or HMC is at Release 3, version 1.x or ealier.
If the HMC and the partition(s) both use a long name
  • No hostname entry is needed in /etc/hosts on either AIX or the HMC.
  • If /etc/hosts has the hostname entry, the longname must be before the short name for HMC and all partitions (host names are case sensitive). For example,
    10.10.10.11    mymachine.mycompany.com      mymachine
  • After you update /etc/host file, refresh RMC by either rebooting or
      > /usr/sbin/rsct/bin/rmcctrl -z
      > /usr/sbin/rsct/bin/rmcctrl -A
Other hostname formats
If the hostname command returns the short name, put the short name before the long name in /etc/hosts on both HMC and the partitions. If the hostname command returns the long name, put the long name before the short name in the /etc/hosts file on HMC and all partitions (names are case sensitive). For example,
10.10.10.11    mymachine      mymachine.mycompany.com

Make sure that on the LPAR(s) (just lpar(s)), the file /etc/netsvc.conf exists with one line: hosts=local,bind
Refresh RMC by either rebooting or by commands
 > /usr/sbin/rsct/bin/rmcctrl -z
 > /usr/sbin/rsct/bin/rmcctrl -A

If DNS is Off
This section is mostly applicable if the LPAR AIX level is at 5.1F/5.20 or ealier, or HMC is at Release 3, version 1.x or ealier.
  • The HMC and LPAR(s) /etc/hosts file need to be modified to contain the correct entries for the HMC and all partitions' hostnames. If the hostname command returns the short name, put the short name before the long name in /etc/hosts for HMC and all partitions (names are case sensitive). If the hostname command returns the long name, put the long name before the short name in /etc/hosts for HMC and all partitions (names are case sensitive).
  • Make sure that on the LPAR(s) (just LPAR), the file /etc/netsvc.conf exists with one line: hosts=local,bind.
  • Refresh RMC by either rebooting or
      > /usr/sbin/rsct/bin/rmcctrl -z
      > /usr/sbin/rsct/bin/rmcctrl -A
Customers should add all the LPAR's hostnames to the /etc/hosts file on the HMC. The HMC hostname must be added to each LPAR /etc/hosts file. Because the customer does not have DNS data, we do not have a domain name, only a short hostname, therefore the DNS enabled box will not be enabled. 

No comments:

Post a Comment

Twitter Bird Gadget