IBM POWER9 Systems LC Server Firmware

Applies to:  AC922 (8335-GTH) and AC922 (8335-GTX)

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.

 

Contents

1.0 Systems Affected

1.2 Fix level Information on IBM OpenPOWER Components and Operating systems

1.3 Minimum xCAT level 2.13.4 for use in firmware updates

1.4 Required NVIDIA CUDA driver level for the Tesla V100 GPU

1.5 Enabling Trusted Boots in RHEL7.5-ALT

1.6 Provide Additional Cooling for PCIe4 Adapters By Setting Thermal Modes

2.0 Important Information

2.1 Warnings on known problems at this release level

2.1.1 Possible GPU fault on an initial IPL of system

2.1.2 Possible Error when creating user accounts with IPMI

2.1.3 LDAP Bind Failure on First Configuration Attempt From BMC GUI

2.1.4 Possible system crash when using "perf record" with Trace_IMC event and "-p" option on the Linux host

2.2 Upgrade Option for 8335-GTG to 8335-GTH

2.2.1 Processor feature conversions

2.2.2 Files to support the GTG to GTH upgrade

3.0 Firmware Information

3.1 Firmware Information and Description

4.0 Operating System Information

4.1 Linux Operating System

4.2 How to Determine the Level of a Linux Operating System

4.3 How to Determine if the opal-prd (Processor Recovery Diagnostics) package is installed

5.0 How to Determine The Currently Installed Firmware Level

6.0 Downloading the Firmware Package

7.0 Installing the Firmware

7.1 IBM Power Systems Firmware maintenance

7.2 OpenBMC System Firmware Update using openbmctool

8.0 System Management and Virtualization

8.1 BMC Service Processor IPMI

8.1.1 Security Risks Associated with Using IPMI

8.2 OpenPOWER Abstraction Layer (OPAL)

8.3 Intelligent Platform Management Interface (IPMI)

8.4 Petitboot bootloader

9.0 Quick Start Guide for Installing Linux on the LC 8335 server

10.0 Change History

 

1.0 Systems Affected

This package provides firmware for the Power System AC922 (8335-GTH) and AC922 (8335-GTX) servers only.

These systems, if updating from OP920, will have P9 DD2.2 processors.  If updating from OP30, the processors will either be P9 DD2.2 or P9 DD2.3. New systems manufactured at the OP940 level can have P9 DD2.2 and/or P9 DD2.3 processors.

 

OP940 supports DD2.2 and DD2.3 processor versions.


The firmware level in this package is:

This section specifies the "Minimum ipmitool Code Level" required by the System Firmware for managing the system.   OpenPOWER requires ipmitool level v1.8.15 or later to execute correctly on the OP910  and later firmware.  It must be capable of establishing a IPMI v2 session with the ipmi support on the BMC.

 Verify your ipmitool level on your Linux workstation using the following command:

 

bash-4.1$ ipmitool -V

ipmitool version 1.8.15

 

If you are need to update or add impitool to your Linux workstation , you can compile ipmitools (current level 1.8.15) for Linux as follows from Sourceforge:

 

1.1.1  Download impitool tar from http://sourceforge.net/projects/ipmitool/  to  your linux system

1.1.2  Extract tarball on Linux system

1.1.3  cd to top-level directory

1.1.4 ./configure

1.1.5  make

1.1.6  ipmitool will be under src/ipmitool        

 

You may also get the ipmitool package directly from your workstation Linux packages.

 

1.2 Fix level Information on IBM OpenPOWER Components and Operating systems

For specific fix level information on key components of IBM Power Systems LC and Linux operating systems,  please refer to the documentation in the IBM Knowledge Center for the AC922 (8335-GTH) and AC922(8335-GTX):

 

https://www.ibm.com/support/knowledgecenter/en/POWER9/p9hdx/8335_gth_landing.htm

https://www.ibm.com/support/knowledgecenter/en/POWER9/p9hdx/8335_gtx_landing.htm

 

1.3 Minimum xCAT level 2.13.4 for use in firmware updates

 

If using xCAT on the host OS to do firmware updates, the minimum xCAT level that should be used is 2.13.4 because it has stability improvements for the firmware update process.  See the xCAT 2.13.4 release notes below for more information.

 

https://github.com/xcat2/xcat-core/wiki/XCAT_2.13.4_Release_Notes

1.4 Required NVIDIA CUDA driver level for the Tesla V100 GPU

The Linux OS has a NVIDIA CUDA driver that must be at recommended level 396.44 or later, or minimum level 396.26 to be compatible with OP920.00 and later levels.  Without this driver, a GPU which has faulted and gone through a GPU reset can cause a Terminate Immediate (TI)  for the system.  The recommended level for the NVIDIA CUDA driver is level 396.44 to get ATS performance improvements.

 

The Power AC922 server delivers four Tesla V100 with NVLink GPUs supported in two processor sockets.

 

The Tesla CUDA driver can be obtained at the download NVIDIA link of “https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/tesla/396.44/nvidia-driver-local-repo-rhel7-396.44-1.0-1.ppc64le.rpm&lang=us&type=Tesla”

 

 The NVIDIA "http://www.nvidia.com/Download/index.aspx?lang=en-us" link using the following information can be used to do a manual search for the driver:

 

Manually find drivers for my NVIDIA products.

Product Type:  Tesla

Product Series: V-Series

Product: Tesla V100

Operating System:  Linux POWER LE RHEL 7

CUDA Toolkit:  9.2

Language: English(US)

 

Search results:

Version: 396.44

Release Date: 2018.8.6

Operating System: Linux POWER LE RHEL 7

CUDA Toolkit: 9.2

Language: English (US)

File Size: 47.28 MB

 

1.5 Enabling Trusted Boots in RHEL7.5-ALT

On Red Hat Enterprise Linux (RHEL) for PPC, RHEL-Alt 7.5, The Trusted Platform Module (TPM) device driver is not loaded automatically at boot time.  Without this driver, the TPM device will not be accessible.

 

This affects any user-space application needing to access the TPM, as well as kernel security functions, such as the Integrity Measurement Architecture subsystem (IMA) in the Linux kernel.  Without the TPM driver loaded, IMA will be unable to record trusted measurements to the TPM.

 

To load the driver manually, as root:

# modprobe tpm_i2c_nuvoton

 

To load the driver automatically at boot time:

# echo "tpm_i2c_nuvoton" > /etc/modules-load.d/tpm.conf"

 

The TPM device driver will be integrated as a built-in kernel module in a future release 7 of RHEL-Alt.  Once this is done, it will be loaded automatically and this procedure will no longer be necessary.

1.6 Provide Additional Cooling for PCIe4 Adapters By Setting Thermal Modes

PCIe4 adapters with feature codes that include  #EC62, #EC64, #EC6E, and #EC6G for both air and water cooled systems may require additional cooling over what is given by the system default thermal mode settings.  To handle the cooling needs of these adapters, the openbmctool can be used to set a  recommended thermal mode to provide the cooling.

Please refer to IBM  Knowledge Center for selection of the appropriate thermal mode for the PCIe4  adapter, system models, and cable types:

https://www.ibm.com/support/knowledgecenter/en/POWER9/p9ei3/p9ei3_thermal_mode.htm.  

The Knowledge Center article  provides the guidance for the thermal mode to select and how to use the openbmctool to select it.  Be aware that the thermal modes needed to supply additional cooling are not the system default and they have to be manually selected.  Once the new thermal mode is selected with the system powered off or on,  it is persistent until either changed by the user or a factory reset of the system  is done.  v1.14  of the openbmctool is the minimum version necessary to select the thermal mode and the recommended version is v1.17. There is no functional difference between these versions for thermal modes but error messages were updated in  v.1.17 to reflect that with OP940  release level systems and later do not have to be powered on to set the thermal mode as was required in earlier releases.  The thermal modes that can be selected include "DEFAULT", "CUSTOM", "HEAVY_IO", and "MAX_BASE_FAN_FLOOR".  The system comes pre-selected with "DEFAULT" which may not provide enough cooling for some adapters and configurations.

Note:  If your system has one of the affected PCIe4 adapters and you are using optical cables for that adapter, a thermal mode other than "DEFAULT" must be selected with openbmctool to get the proper cooling for the system.

 

2.0 Important Information


Downgrading firmware from any given release level to an earlier release level is not recommended.

If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.

 

Notes:  

1) Upgrades to the OP940 level are supported from the OP920 and OP930 release levels.

2) Downgrades from OP940 to earlier release levels are not supported for the systems manufactured at the OP940 level.  This may result in a system hang or other failures.

Concurrent Firmware Updates not available for LC servers.

Concurrent system firmware update is not supported on LC servers.

 

2.1 Warnings on known problems at this release level

2.1.1 Possible GPU fault on an initial IPL of system

Problem Description:

During one of the first IPLs of the system, in rare cases, the system may either kernel panic or crash due to a HMI and reboot.  The BMC console signatures are as follows:

[  170.281625] Faulting instruction address: 0xc00000000822a0c0

[  170.281698] Thread overran stack, or stack corrupted

[  170.281767] Oops: Kernel access of bad area, sig: 11 [#443]

[  170.281834] LE SMP NR_CPUS=2048 NUMA PowerNV

[  170.281890] Modules linked in:

[  170.281892] Unable to handle kernel paging request for data at address 0xfffffffffffffff8

 

OR

 

[285402.253760069,7] HMI: Received HMI interrupt: HMER = 0x8040000000000000

...

[285402.267399301,3] HMI:  _________________________

[285402.267438236,3] HMI: < It's Driver Debug time! >

[285402.267469320,3] HMI:  -------------------------

[285402.267510142,3] HMI:        \   ,__,

[285402.267542350,3] HMI:         \  (oo)____

[285402.267578707,3] HMI:            (__)    )\

[285402.267612119,3] HMI:               ||--|| *

[285402.267796791,3] HMI: NPU2: [Loc: UOPWR.787081A-Node0-Proc1] P:8 FIR#0 FIR 0x0000100008000000 mask 0x009a48180f03ffff

[285249[285402.267899526,3] HMI: NPU2: [Loc: UOPWR.787081A-Node0-Proc1] P:8 ACTION0 0x7f60b04500ae0000, ACTION1 0xff65b04700fe0000

.644496]  Hypervisor Maintenance interrupt [Recovered]

[285249.644928]  Error detail: Malfunction Alert

[285249.644988]         HMER: 8040000000000000

[285249.645032]         Unknown Malfunc[285402.268002011,3] HMI:

 

In this console output, if any of the the following register addresses have the first nibble set to 0x2 then it is the error of concern.

 

[2019-06-06T13:46:28-04:00] [4135415.111242179,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011017=0x0000000000000000

[2019-06-06T13:46:28-04:00] [4135415.111244973,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011047=0x0000000000800000

[2019-06-06T13:46:28-04:00] [4135415.111247753,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011077=0x0000000000000000

[2019-06-06T13:46:28-04:00] [4135415.111250499,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x050110a7=0x0000000000000000

[2019-06-06T13:46:28-04:00] [4135415.111253198,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011217=0x0000000000000000

[2019-06-06T13:46:28-04:00] [4135415.111255899,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011247=0x0000000000800000

[2019-06-06T13:46:28-04:00] [4135415.111258642,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011277=0x2000010000000000

[2019-06-06T13:46:28-04:00] [4135415.111261405,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x050112a7=0x2000010000000000

[2019-06-06T13:46:28-04:00] [4135415.111264176,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011417=0x2000020000000000

[2019-06-06T13:46:28-04:00] [4135415.111266962,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011447=0x2000020000800000

[2019-06-06T13:46:28-04:00] [4135415.111269712,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x05011477=0x0000000000000000

[2019-06-06T13:46:28-04:00] [4135415.111272427,3] HMI: NPU2: [Loc: UOPWR.789938A-Node0-Proc0] P:0 0x050114a7=0x0000000000000000

In the case above, registers 0x05011277, 0x050112a7, 0x05011417, 0x05011447 are examples of this.

 

NOTE: This will not result with any hardware being deconfigured and the system should simply reboot back into a working state. In the SEL logs,  events "FQPSPAA0007G - A system checkstop occurred" and "FQPSPAA0001M - An unknown problem occurred" can be seen.

 

Customer Action

If the system has rebooted, no action is needed for recovery.  Otherwise, reboot the system.  This failure is rare and is not expected to recur after a reboot of the system.

2.1.2 Possible Error when creating user accounts with IPMI

 

Set the access privilege when using IPMI to create user accounts to avoid disrupting user account viewing.

 

When using IPMI to create user accounts, there are three steps needed to create a valid user account that does not interfere with the display of the user account records.

The first step is to create the user account.  The second step is assign the privilege level for the user.  The third step is to enable the user.  Failure to assign the privilege level will result in disruption of the viewing of the user account records through the BMC gui or through Redfish.  The problem with the user account viewing is resolved once the privilege has been assigned for the account in IPMI  using the ipmitool  channel command.

 

Here is an example of setting a user name and assigning privilege, and then enabling the user.  In this example, the userid "4" is assigned name "username" and password "password" and given administrator access on channel 1:

 

# ipmitool user set name 4 username

# ipmitool user set password 4 password

# ipmitool channel setaccess 1 4 link=on ipmi=on  privilege=4

# ipmitool user enable 4

 

If the ipmitool channel command is not used to set the access privilege, this will cause issues when displaying the user accounts.

 

2.1.3 LDAP Bind Failure on First Configuration Attempt From BMC GUI

 

This problem applies to systems that have had a factory reset or  to systems that are newly shipped from manufacturing when using LDAP for the first time from the BMC GUI.

The GUI is not setting the LDAP the Bind Password.  The Bind Password is needed for LDAP Authentication with the LDAP server.

 

The work-around to this problem is to use openbmctool to configure LDAP.

 

Here is an example of using the openbmctool command for LDAP configuration creation:

 

openbmctool  -U root  -P "0penBmc"  -H <BMC IP address>  ldap enable -a "<LDAP server URI>" -B "<bind DN of the LDAP server>" -b "<base DN of the LDAP server>" -p "<bind password of the LDAP server>" -S sub -t OpenLDAP

 

Attempting login...

{

  "data": null,

  "message": "200 OK",

  "status": "ok"

}

User root has been logged out

 

2.1.4 Possible system crash when using "perf record" with Trace_IMC event and "-p" option on the Linux host

 

When a Root user or a user with "CAP_SYS_ADMIN" privileges executes the "perf" command with the trace_imc performance monitoring unit event to monitor applications or KVM threads, this execution may result in a checkstop (System crash) when using an IBM Power9 DD2.2  system(and greater) with Firmware OP930 (and greater) and Red Hat Enterprise 8.1.

 

The Linux host system can crash if the guest is rebooted while the host is monitoring the guest with 'perf kvm record' on qemu process and using the "-p" option.

Example of failure sequence:

1)  On the Linux host, do "perf kvm record -e trace_cycles -p <Guest qemu pid>"

2)  Run the guest for a duration of time while tracing it from the host

3)  Reboot the guest and the host system may crash

 

This problem can be circumvented by using the  "-i" or "--no-inherit" parameter on the "perf kvm record" command so that child processes do not inherit the performance counters:

 

perf kvm record -e trace_cycles -p <Guest qemu pid> -i

 

2.2 Upgrade Option for 8335-GTG to 8335-GTH

 

The 8335-GTG model is only supported for the OP910 release.  However, the 8335-GTG may be upgraded to a 8335-GTH model by a SSR.  These steps involve replacing the hardware processor features in the 8335-GTG and then updating to the alternative PNOR and BMC images, which can be found in Fix Central as part of the initial OP920.00 delivery.  At the successful conclusion of the upgrade steps, the system model will be 8335-GTH with the OP920.00 release firmware. You may then update to this latest level of firmware.

2.2.1 Processor feature conversions

The existing processors being replaced during a model or feature conversion become the property of IBM and must be returned.   Feature conversions are always implemented on a "quantity of one for quantity of one" basis. Multiple existing features may not be converted to a single new feature. Single existing features may not be converted to multiple new features.

 

Feature conversions for 8335-GTG to 8335-GTH for processor features:

From FC

To FC

Return Parts?

EP0K - 16-core 2.60 GHz

(3.09 GHz Turbo) POWER9 Processor

EP0P - 16-core 2.7 GHz

(3.3 GHz Turbo) POWER9 Processor

Yes

EP0M - 20-core 2.0 GHz

(2.87 GHz Turbo) POWER9 Processor

EP0R - 20-core 2.4 GHz

(3.0  GHz Turbo) POWER9 Processor

Yes

 

2.2.2 Files to support the GTG to GTH upgrade

               (Note that these files exist only under the OP920.00 delivery in Fix Central)

3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For the LC server systems, the installation of system firmware is always disruptive.

 

3.1 Firmware Information and Description

The BMC and PNOR image tar files are used to update the primary side of the PNOR and the primary side of the BMC only, leaving the golden sides unchanged.

 

Filename

Size

Checksum

obmc-witherspoon-ibm-op940.00.ubi.mtd.tar

22753280

 989f9489a04fc7f41f22c21e1a79fdd8

witherspoon-IBM-OP9-v2.4-4.31_prod.pnor.squashfs.tar

27852800

eecbe6928917f299099a1e97b65aa8bd

 

 

 

 

Note: The Checksum can be found by running the Linux/Unix/AIX md5sum command against the Hardware Platform Management (hpm) file (all 32 characters of the checksum are listed), ie: md5sum <filename>

 

After a successful update to this firmware level, the PNOR components and BMC should be at the following levels.  

 

To display the PNOR level, use the following BMC command:  "cat /var/lib/phosphor-software-manager/pnor/ro/VERSION | grep -A 12 IBM"

And the BMC command line command "cat" can be used to display the BMC level:  "cat /etc/os-release".

 

Note:  FRU information for the PNOR level does not show the updated levels via the fru command until the system has been booted once at the updated level.

 

PNOR firmware level:         driver content

    IBM-witherspoon-OP9-v2.4-4.31-prod

    op-build-v2.3-rc2-409-g6ad3be1

    buildroot-2019.05.2-11-g8e3337d

    skiboot-v6.5.1

    hostboot-a8024a3

    occ-3472e6c

    linux-5.3.7-openpower1-pbf4fa9d

    petitboot-v1.10.4

    machine-xml-c622cb5

    hostboot-binaries-hw080119a.940

    capp-ucode-p9-dd2-v4

    sbe-53c1726

    hcode-hw102619a.op940

           

BMC firmware level :                     driver content        

 

BMC Primary side version:

 

ID="openbmc-phosphor"

NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)"

VERSION="op940.00-16"

VERSION_ID="op940.00-16-0-gd1c59b7"

PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) op940.00-16"

BUILD_ID="op940.00-16"

OPENBMC_TARGET_MACHINE="witherspoon"

 

 

OP940
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url: 
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs

 

 

PNOR OP9-v2.4-4.31 with BMC op940.00 / OP940.00

 

11/22/2019

Impact:  New    Severity:  New

 

New features and functions:

Support for the Bittware 250-SoC 2x100Gb PCIe x16 OpenCAPI FPGA Adapter with Feature Code #EC4W.

 

Support was added for using DD2.2 and DD2.3 version P9 processors in the same system.

 

Support was added for Redfish APIs on the BMC.  OpenBMC-based systems can be managed by using the DMTF Redfish APIs.

   Redfish is a REST API used for platform management and is standardized by the Distributed Management Task Force, Inc. (http://www.dmtf.org/standards/redfish).  For more information, see the IBM Knowledge Center at the following link:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_redfish.htm

Support was added for SSL certificate upload and generation.  For more information, see the IBM Knowledge Center at the following link:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_ssl.htm

 

Support was added for IPMI version 2.0 on the BMC. IPMI network access can be disabled.  The command to disable IPMI network access is: "ipmitool lan set 1 access off".  You can use in-band IPMI if you want to re-enable IPMI network access: "ipmitool lan set 1 access on".

 

Support was added for Lightweight Directory Access Protocol (LDAP) on the BMC.  For more information on using LDAP,  see the  IBM Knowledge Center at the following link:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_ldap.htm

 

Support was added for creating multiple local user accounts on the BMC.  For more information on managing user accounts to add or remove new users, modify user settings, manage user account policy settings, and view privilege role descriptions,  see the IBM Knowledge Center at the following link:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_usermanage.htm

 

Support was added for KVM (Keyboard, Video and Mouse) consoles  on the BMC .  For more information on how to use  the remote KVM console.,  see the  IBM Knowledge Center at the following link:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_kvm.htm

 

Support was added for virtual media devices on the BMC .  For more information on how to use virtual media device to start a session,  see the  IBM Knowledge Center at the following link:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_virtualmedia.htm

 

Support was added for remote logging in the BMC gui.

 

Support was added for boot options in the BMC gui.

 

Support was added for SNMP traps and alerts on the BMC.  For more information on SNMP settings on the BMC,  see the  IBM Knowledge Center at the following link:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_snmpsettings.htm

 

New thermal modes added to support  PCIe4 adapters with feature codes #EC62, #EC64, #EC6E, and #EC6G  for both air and water cooled systems.  Please refer to IBM  Knowledge Center for selection of the appropriate thermal mode for the PCIe4  adapter, system models, and cable types:

https://www.ibm.com/support/knowledgecenter/en/POWER9/p9ei3/p9ei3_thermal_mode.htm.  

This article  provides the guidance for the thermal mode to select and how to use the openbmctool to select it.  Be aware that the thermal modes needed to supply additional cooling are not the system default and they have to be manually selected.  Once the new thermal mode is selected at  with the system powered off or on,  it is persistent until either changed by the user or a factory reset of the system  is done.  v1.14  of the openbmctool is the minimum version necessary to select the thermal mode and the recommended version is v1.17. There is no functional difference between these versions for thermal modes but error messages were updated in  v.1.17 to reflect that with OP940  release level systems and later do not have to be powered on to set the thermal mode as was required in earlier releases.  The thermal modes that can be selected include "DEFAULT", "CUSTOM", "HEAVY_IO", and "MAX_BASE_FAN_FLOOR".  The system comes pre-selected with "DEFAULT" which may not provide enough cooling for some adapters and configurations.

Note:  If your system has one of the affected PCIe4 adapters and you are using optical cables for that adapter, a thermal mode other than "DEFAULT" must be selected with openbmctool to get the proper cooling for the system.

 

 

 

 

4.0 Operating System Information

OS levels supported by the LC 8335 servers:

 

-  Red Hat Enterprise Linux 8 for POWER, or later, with all available maintenance updates

-  Red Hat Enterprise Linux 7 for POWER9, version 7.6, or later, with all available maintenance updates

 

-  NVIDIA Telsa CUDA  recommended driver level 396.44 or later, or minimum driver level 396.26 from the CUDA 9.2 toolkit

 

Additional OS level supported by the AC922(8335-GTH) server:

 - Ubuntu Server 18.04.1, with all available maintenance updates.

 

IBM Power LC 8335 servers supports Linux which provides a UNIX like implementation across many computer architectures.  Linux supports almost all of the Power System I/O and the configurator verifies support on order.  For more information about the software that is available on IBM Power Systems, see the Linux on IBM Power Systems website:

        http://www.ibm.com/systems/power/software/linux/index.html

 

4.1 Linux Operating System

The Linux operating system is an open source, cross-platform OS. It is supported on every Power Systems server IBM sells. Linux on Power Systems is the only Linux infrastructure that offers both scale-out and scale-up choices.  

A supported version of Linux on the Power LC 8335 is Red Hat Enterprise Linux 7.5 for IBM Power LE (POWER9)  (RHEL 7.5-ALT LE).

For additional questions about the availability of this release and supported Power servers, consult the Red Hat Hardware Catalog at

https://access.redhat.com/products/red-hat-enterprise-linux/#addl-arch.

 

For the AC922 (8335-GTH) that is configured without GPUs, there is the option of using Linux Ubuntu 18.04 or later as the OS.

 

 

For more information about Linux on Power, see the Linux on Power developer center at https://developer.ibm.com/linuxonpower/

 

For information about the features and external devices that are supported by Linux, see this website:

http://www.ibm.com/systems/power/software/linux/index.html

 

4.2 How to Determine the Level of a Linux Operating System

 

Use one of the following commands at the Linux command prompt to determine the current Linux level:

 

 

The output string from the command will provide the Linux version level.

 

4.3 How to Determine if the opal-prd (Processor Recovery Diagnostics) package is installed

The opal-prd package on the Linux system collects the OPAL Processor Recovery Diagnostics messages to log file /var/log/syslog.  It is recommended that this package be installed if it is not already present as it will help with maintaining the system processors by alerting the users to processor maintenance when needed.

 

On Red Hat Linux, perform command "rpm -qa | grep -i opal-prd ".  The command output indicates the package is installed on your system if the rpm for opal-prd is found and displayed.  This package provides a daemon to load and run the OpenPOWER firmware's Processor Recovery Diagnostics binary. This is responsible for run-time maintenance of Power hardware.   If the package is not installed on your system, the following command can be run on Red Hat to install it:

        sudo yum update opal-prd

 

5.0 How to Determine The Currently Installed Firmware Level

 

To display the PNOR level, use the following BMC command:  "cat /var/lib/phosphor-software-manager/pnor/ro/VERSION"

And the BMC command line command "cat" can be used to display the BMC level:  "cat /etc/os-release".

 

Note: the "cat" commands are run after ssh to the BMC as root and the default password is 0penBmc (where 0 is the zero character).

 

6.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

 

7.0 Installing the Firmware

7.1  IBM Power Systems Firmware maintenance

The updating and upgrading of system firmware depends on several factors, such as the current firmware that is installed, and what operating systems is running on the system.

These scenarios and the associated installation instructions are comprehensively outlined in the firmware section of Fix Central, found at the following website:

http://www.ibm.com/support/fixcentral/

 

Any hardware failures should be resolved before proceeding with the firmware updates to help insure the system will not be running degraded after the updates.

 

7.2 OpenBMC System Firmware Update using openbmctool

The process of updating firmware on the OpenBMC managed servers is documented below.

The sequence of events that must happen is the following:

 

•Power off the Host

•Update and Activate BMC

•Update and Activate PNOR

•Reboot the BMC (applies new BMC image)

•Power on the Host (applies new PNOR image)

 

The OpenBMC firmware updates (BMC and PNOR)  for the LC 8335 servers can be managed via the command line with the openbmctool.

 

The openbmctool is obtained using the IBM Support Portal.

 

  1. 1.Go to the IBM Support Portal. 

  2. 2.In the search field, enter your machine type and model.  Then click the correct product support entry for your system. 

  3. 3.From the Downloads list, click the openbmctool for your machine type and model. 

  4. 4.Follow the instructions to install and run the openbmctool.  You will need to provide the file locations of the BMC firmware image tar and PNOR firmware image tar that must be downloaded from Fix Central for the update level needed. 

 

 

Information on the openbmctool and the firmware update process can be found in the IBM Knowledge Center:  

https://www.ibm.com/support/knowledgecenter/POWER9/p9ei8/p9ei8_update_firmware_openbmctool.htm .

8.0 System Management and Virtualization

The service processor, or baseboard management controller (BMC), provides a hypervisor and operating system-independent layer that uses the robust error detection and self-healing functions that are built into the POWER processor and memory buffer modules. OpenPOWER application layer (OPAL) is the system firmware in the stack of POWER processor-based Linux-only servers.

 

8.1  BMC Service Processor IPMI

The service processor, or baseboard management controller (BMC), is the primary control for autonomous sensor monitoring and event logging features on the LC server.

The BMC supports the Intelligent Platform Management Interface (IPMI) for system monitoring and management.  The BMC monitors the operation of the firmware during the boot process and also monitors the OPAL hypervisor for termination.

 

8.1.1  Security Risks Associated with Using IPMI

Various risks that are associated with the Intelligent Platform Management Interface (IPMI) have been identified and documented in the information technology (IT) security community.

Possible risks includes the following three common vulnerabilities and exposures (CVEs):

1) CVE-2013-4037:

The Remote Authenticated Key-Exchange Protocol (RAKP), which is specified by the IPMI standard for authentication, has flaws. Although the system does not allow the use of null passwords, a hacker might reverse engineer the RAKP transactions to determine a password. The authentication process for IPMI requires the management controller to send a hash of the requested password of the user to the client before the client authenticates. This process is a key part of the IPMI specification. The password hash can be broken by using an offline brute force or dictionary attack.

2) CVE-2013-4031:

IBM Power Systems and OpenPower Systems are preconfigured with one IPMI user account, which has the same default login name and password on all affected systems. If a malicious user gains access to the IPMI interface by using this preconfigured account, the user can power off or on, or restart the host server, and create or change user accounts possibly preventing legitimate users from accessing the system. On OpenPower Systems, the default IPMI user name is root.  Additionally, if a user fails to change the default user name and password on each of the systems that is deployed, the user has the same login information for each of those systems.

3) CVE-2013-4786:

The IPMI 2.0 specification supports RMCP+ Authenticated Key-Exchange Protocol (RAKP) authentication, which allows remote attackers to obtain password hashes and conduct offline password guessing attacks by obtaining the hash-based message authentication code (HMAC) from a RAKP message 2 response from a BMC.

If a user is not managing a server by using the IPMI, one can configure the system to disallow IPMI network access from the user accounts. This task can be accomplished by using the IPMItool utility or a similar utility for managing and configuring the IPMI management controllers.  Use the following IPMItool command to disable the network access for an IPMI user:

ipmitool channel setaccess 1 #user_slot# privilege=15

For more information on the IPMI security vulnerabilities and configuration options and best practices to minimize the risks of this interface, go to the IBM Knowledge Center at the following URL:

https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_security.htm

8.2 OpenPOWER Abstraction Layer (OPAL)

The OpenPOWER Abstraction Layer (OPAL) provides hardware abstraction and run time services to the running host Operating System.

For the 8335 servers, only the OPAL bare-metal installs can be used.

 

Find out more about OPAL skiboot here:

https://github.com/open-power/skiboot

8.3 Intelligent Platform Management Interface (IPMI)

The Intelligent Platform Management Interface (IPMI) is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS.  The LC 8335 servers provide one 10M/100M baseT IPMI port.

The ipmitool is a utility for managing and configuring devices that support IPMI. It provides a simple command-line interface to the service processor. You can install the ipmitool from the Linux distribution packages in your workstation, sourceforge.net, or another server (preferably on the same network as the installed server).

For installing ipmitool from sourceforge, please see section 1.1 "Minimum ipmitool Code Level".

 

For more information about ipmitool, there are several good references for ipmitool commands:

 

The man page

The built-in command line help provides a list of IPMItool commands:
# ipmitool help

You can also get help for many specific IPMItool commands by adding the word help after the command:  
# ipmitool channel help

For  a list of common ipmitool commands and help on each, you may use the following link:  
www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabpcommonipmi.htm

 

To connect to your host system with IPMI, you need to know the IP address of the server and have

a valid password. To power on the server with the ipmitool, follow these steps:

1. Open a terminal program.

2. Power on your server with the ipmitool:

ipmitool -I lanplus -H bmc_ip_address -P ipmi_password power on

3. Activate your IPMI console:

ipmitool -I lanplus -H bmc_ip_address -P ipmi_password sol activate

 

8.4 Petitboot bootloader

Petitboot is a kexec based bootloader used by IBM POWER9 systems for doing the bare-metal installs on the 8335 servers.

After the POWER9 system powers on, the petitboot bootloader scans local boot devices and network interfaces to find boot options that are available to the system. Petitboot returns a list of boot options that are available to the system. If you are using a static IP or if you did not provide boot arguments in your network boot server, you must provide the details to petitboot. You can configure petitboot to find your boot with the following instructions:

https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootadvanced.htm

 

You can edit petitboot configuration options, change the amount of time before Petitboot automatically boots, etc. with these instructions:

https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootconfig.htm

 

After you select to boot the ISO media for the Linux distribution of your choice, the installer wizard for that Linux distribution walks you through the steps to set up disk options, your root password, time zones, and so on.

You can read more about the petitboot bootloader program here:

https://www.kernel.org/pub/linux/kernel/people/geoff/petitboot/petitboot.html

 

9.0 Quick Start Guide for Installing Linux on the LC 8335 server

This guide helps you install Linux on Power Systems server.

Overview

Use the information found in http://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/liabwkickoff.htm to install Linux  on a non-virtualized (bare metal) IBM Power LC server.

 

 

10.0 Change History

Date

Description

11/22/2019

New for AC922 LC servers for the OP940.00 release