Power10 System Firmware

Applies to:   9080-HEX

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for IBM Power System E1080 (9080-HEX) server only.

The firmware level in this package is:

1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update.  If the HMC managing the server targeted for the System Firmware update is running a code level lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code levels for this firmware for HMC x86,  ppc64 or ppc64le are listed below.

x86 -  This term is used to reference the legacy HMC that runs on x86/Intel/AMD hardware for the Virtual HMC that can run on the Intel hypervisors (KVM, XEN, VMWare ESXi).
ppc64 or ppc64le - describes the Linux code that is compiled to run on Power-based servers or LPARS (Logical Partitions)
The Minimum HMC level supports the following HMC models:
HMC models: 7063-CR1 and 7063-CR2
x86 - KVM, XEN, VMWare ESXi (6.0/6.5)
ppc64le - vHMC on PowerVM (POWER8,POWER9, and POWER10 systems)

For information concerning HMC releases and the latest PTFs,  go to the following URL to access Fix Central:
https://www.ibm.com/support/fixcentral/

For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
https://esupport.ibm.com/customercare/flrt/home


NOTES:

                -You must be logged in as hscroot in order for the firmware installation to complete correctly.
                - Systems Director Management Console (SDMC) does not support this System Firmware level.

2.0 Important Information

NovaLink levels earlier than "NovaLink 1.0.0.16 Feb 2020 release" with partitions running certain SR-IOV capable adapters is NOT supported at this firmware release

NovaLink levels earlier than "NovaLink 1.0.0.16 Feb 2020 release" do not support IO adapter FCs EC2R/EC2S, EC2T/EC2U, EC66/EC67 with FW1010 and later. 

Live Partition Mobility (LPM) support restrictions for FW1010.00:

Live Partition Mobility (LPM) support restrictions for FW1010.00 have been removed for FW1010.10 and later releases.
The LPM restrictions for FW1010.00 have been removed for FW1010.10.

Note:  The following IBM document article for the LPM support matrix for POWER10 should be followed for guidance on migrating between firmware levels
https://www.ibm.com/docs/en/power10?topic=mobility-firmware-support-matrix-partition

Firmware Update Failure on Power10:

Important information regarding system firmware update might fail with errors HSCF0180E and E302F854 logged on the Hardware Management Console (HMC) and System Reference Code (SRC) B181303F on the flexible service processor (FSP).

See the link for further information: https://www.ibm.com/support/pages/node/6527300

2.1 IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

2.2 Concurrent Firmware Updates

Concurrent system firmware update is supported on HMC Managed Systems only.

Ensure that there are no RMC connections issues for any system partitions prior to applying the firmware update.  If there is a RMC connection failure to a partition during the firmware update, the RMC connection will need to be restored and additional recovery actions for that partition will be required to complete partition firmware updates.

2.3 Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:
Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
https://www.ibm.com/docs/en/power10/9080-HEX?topic=resources-memory

2.4 SBE Updates

Power10 servers contain SBEs (Self Boot Engines) and are used to boot the system.  SBE is internal to each of the Power10 chips and used to "self boot" the chip.  The SBE image is persistent and is only reloaded if there is a system firmware update that contains a SBE change.  If there is a SBE change and system firmware update is concurrent, then the SBE update is delayed to the next IPL of the CEC which will cause an additional 3-5 minutes per processor chip in the system to be added on to the IPL.  If there is a SBE change and the system firmware update is disruptive, then SBE update will cause an additional 3-5 minutes per processor chip in the system to be added on to the IPL.  During the SBE update process, the HMC or op-panel will display service processor code C1C3C213 for each of the SBEs being updated.  This is a normal progress code and system boot should be not be terminated by the user. Additional time estimate can be between 12-20 minutes per drawer or up to 48-80 minutes for maximum configuration.

The SBE image is only updated with this service pack if the starting firmware level is less than FW1010.30.


3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For systems that are not managed by an HMC, the installation of system firmware is always disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as Deferred and/or Partition-Deferred. Deferred fixes can be installed concurrently, but will not be activated until the next IPL. Partition-Deferred fixes can be installed concurrently, but will not be activated until a partition reactivate is performed. Deferred and/or Partition-Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For these types of fixes (Deferred and/or Partition-Deferred) within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01MHxxx_yyy_zzz

NOTE: Values of service pack and last disruptive service pack level (yyy and zzz) are only unique within a release level (xxx). For example, 01MH1010_040_040 and 01MH1010_040_045 are different service packs.

An installation is disruptive if:

            Example: Currently installed release is 01MH1010_040_040, new release is 01MH1030_050_050.

            Example: MH1010_040_040 is disruptive, no matter what level of MH1010 is currently installed on the system.

            Example: Currently installed service pack is MH1010_040_040 and new service pack is MH1010_050_045.

An installation is concurrent if:

The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system is the same or higher than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is MH1010_040_040, new service pack is MH1010_041_040.

3.1 Firmware Information and Description

 
Filename Size Checksum md5sum
01MH1010_140_094.rpm 145166084
13096
d838199c7d55782dbf4c8f5bd81a092d

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01MH1010_140_094.rpm

MH1010
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136

The complete Firmware Fix History for this Release Level can be reviewed at the following url:
https://public.dhe.ibm.com/software/server/firmware/MH-Firmware-Hist.html
MH1010_140_094 / FW1010.34

08/26/22
Impact: Availability    Severity:  HIPER

System firmware changes that affect all systems
  • HIPER/Pervasive: A problem was fixed for an issue attempting to recover from a processor core error. The failed recovery escalates to either a system checkstop or a processor core hang. The system checkstop is reported with SRC B113E504 or B181E540. The processor core hang has been observed as a partition hang and SRC B200F007 is reported when the partition fails to shutdown. The issue may also result in a partition crash or HMC Incomplete. With this fix, the processor core recovery will work correctly with no effect on the system.
MH1010_135_094 / FW1010.32

07/14/22
Impact: Availability    Severity:  HIPER

System firmware changes that affect all systems
  • HIPER/Pervasive:  A problem was fixed for a system hang during the concurrent code update of FW1010.31 and another problem was fixed for a potential impact to performance following any concurrent code update.  If the server has booted with FW1010.31, then there is no need to install FW1010.32. If the server has applied FW1010.31 concurrently and not booted on this level, then IBM recommends applying FW1010.32 or perform a system reboot on FW1010.31 to avoid the potential performance impact. If the server is running a level prior to FW1010.31, then IBM strongly recommends installing FW1010.32 to address these and other HIPER issues fixed in FW1010.31.
MH1010_132_094 / FW1010.31

07/01/22
Impact: Data    Severity:  HIPER

New Features and Functions
  • HIPER/Pervasive: For systems with Power Linux partitions, support was added for a new Linux secure boot key.  The support for the new secure boot key for Linux partitions may cause secure boot for Linux to fail if the Linux OS for SUSE or RHEL distributions does not have a secure boot key update.  
    The affected Linux distributions are as follows that need the Linux fix level that includes "Key for secure boot signing grub2 builds ppc64le".
    1) SLES 15 SP4 - The GA for this Linux level includes the secure boot fix.
    2) RHEL 8.5- This Linux level has no fix.  The user must update to RHEL: 8.6 or RHEL 9.0.
    3) RHEL 8.6
    4) RHEL 9.0.  
    The update to a Linux level that supports the new secure boot key also addresses the following security issues in Linux GRUB2 and are the reasons that the change in secure boot key is needed as documented in the following six CVEs:
    1) CVE-2021-3695
    2) CVE-2022-28733
    3) CVE-2022-28734
    4) CVE-2022-28735
    5) CVE-2022-28736
    6) CVE-2022-28737
    Please note that when this firmware level of FW1010.31 is applied, any LInux OS not updated to a secure boot fix level will fail to secure boot.  And any Linux OS partition updated to a fix level for secure boot requires a minimum firmware level of FW1010.30 or later to be be able to do a secure boot.  If  FW1010.30,  FW1010.31 or later is not installed but the Linux fix levels for secure boot are loaded for the Linux partition, the secure boot failure that occurs will have BA540010 logged.  If secure boot verification is enabled, but not enforced (log only mode), then the fixed Linux partition will boot, but a BA540020 informational error will be logged.
  • Support was added for new memory refresh settings to enhance reliability for new systems shipped from manufacturing.  Existing systems will pick up the enhancement on the IPL following the application of this firmware level.  There is no change in system performance due to this enhancement.
  • Support was added for a new Advanced System Management Interface (ASMI) System Configuration panel for Prefetch settings to enable or disable an alternate configuration of the processor core/nest to favor more aggressive prefetching behavior for the cache.  "Aggressive Prefetch" is disabled by default and a change to enable it must be done at service processor standby.  The default behavior of the system ("Aggressive Prefetch" disabled) will not change in any way with this new feature.  The customer will need to power off and enable "Aggressive Prefetch" in ASMI to get the new behavior.  Only change the "Aggressive Prefetch" value if instructed by support or if recommended by a solution vendor as it might cause degraded system performance.
System firmware changes that affect all systems
  • HIPER/Pervasive:  A problem was fixed for an issue that may cause undetected corruption of the Translation Look Aside Buffer (TLB).  This could result in undetected data corruption or a system crash.
  • HIPER/Pervasive:  A problem was fixed for an issue where a register file soft error could result in undetected data corruption or a system crash. If a soft error is detected a log will be generated.
  • HIPER/Pervasive: A problem was fixed for a checkstop with SRC B113E504 logged that could occur for a recoverable core event anytime after a concurrent code update has been performed. If this service pack is not installed, then a system IPL is required to eliminate the exposure.
  • HIPER/Pervasive:  A problem was fixed for a recoverable processor core error which fails to recover and causes a system checkstop with SRC B113E504 or B181E540 logged. With the fix, the core recovery is successful with no impact to the running workload.
  • HIPER/Non-Pervasive:  A problem was fixed for possible undetected data corruption, or a hardware checkstop.  In IBM internal testing, it was found that the execution of the new Power10 STXVP instruction may cause undetected data corruption, or a hardware detected error reported with reference code B111E540 in certain instances.
    The following applications on AIX 7.3 and/or Linux are currently known to be exposed:
    OpenBLAS 0.3.12  
    ESSL 7.1  
    Eigen 3.4  
    Applications compiled with Open XL v17.1.0 , GCC V10/V11 or CLANG/LLVM 12, 13, 14
    Any other applications exploiting the Power10 STXVP instruction.
  • A security problem was fixed for a flaw in OpenSSL certificate parsing that could result in an infinite loop in the hypervisor, causing a hang in a Live Partition Mobility (LPM) target partition.   The trigger for this failure is an LPM migration of a partition with a corrupted physical trusted platform module (pTPM) certificate. This is expected to be a rare problem.  The Common Vulnerability and Exposure number for this problem is CVE-2022-0778.
  • A problem was fixed for a potential performance impact for systems that have Lateral Cast Out Control set to disabled.  This problem can occur when a processor is deconfigured.  Performing a re-IPL of the system will recover from this problem.
  • A problem was fixed for a change made to disable Service Location Protocol (SLP) by default for a newly shipped system so that the SLP is disabled by a reset to manufacturing defaults on all systems and to also disable SLP on all systems when this fix is applied by the firmware update. The SLP configuration change has been made to reduce memory usage on the service processor by disabling a service that is not needed for normal system operations.  In the case where SLP does need to be enabled, the SLP setting can be changed using ASMI with the options "ASMI -> System Configuration -> Security -> External Services Management" to enable or disable the service.  Without this fix, resetting to manufacturing defaults from ASMI does not change the SLP setting that is currently active.
  • A problem was fixed for a missing warning in the ASMI Power On/Off menu that a power off while system dump is in progress will cause a truncated dump. The warning is displayed correctly in the ASMI Immediate Power Off menu.  This fix also adds a warning that a power off should not be performed when a firmware update is in progress.
  • A problem was fixed for a rare service processor core dump for NetsCommonMsgServer with SRC B1818611 logged that can occur when doing an AC power-on of the system.  This error does not have a system impact beyond the logging of the error as an auto-recovery happens.
  • A problem was fixed for a partition reboot recovery for an adapter in SR-IOV shared mode that rebooted with an SR-IOV port missing.  Prior to the reboot, this adapter had SR-IOV ports that failed and were removed after multiple adapter faults,  This problem should only occur rarely as it requires a sequence of multiple faults on an SR-IOV adapter in a short time interval to force the SR-IOV Virtual Function (VF) into the errant unrecoverable state.  The missing SR-IOV port can be recovered for the partition by doing a remove and add of the failed adapter with DLPAR, or the system can be re-IPLed.
  • A problem was fixed for an apparent hang in a partition shutdown where the HMC is stuck in a status of "shutting down" for the partition.  This infrequent error is caused by a timing window during the system or partition power down where the HMC checks too soon and does not see the partition in the "Powered Off" state. However, the power off of the partition does complete even though the HMC does not acknowledge it.  This error can be recovered by rebuilding the HMC representation of the managed system by following the below steps:
    1) In the navigation area on the HMC, select Systems Management > Servers.
    2) In the contents pane, select the required managed system.
    3) Select Tasks > Operations > Rebuild.
    4) Select Yes to refresh the internal representation of the managed system.
  • A problem was fixed that could potentially impact the performance of a dedicated processor partition after DLPAR is used to dynamically remove a dedicated processor from the partition.  This can affect all dedicated processor partitions but would more likely affect idle partitions or partitions set to share processors while active.  Performing a re-IPL of the partition will recover from this problem.
  • A problem was fixed for a PowerVM hypervisor task failure when using the "chhwres" command on the HMC to change an SR-IOV adapter firmware level to the alternate level with the "alternate_config " parameter.  This problem can occur if NVRAM was in use by the adapter prior to the attempt to change the adapter firmware level. A re-IPL of the system is needed to recover from this error.  Below is an example of an HMC command that can fail along with the error message from the HMC:
    chhwres -m d135a -r sriov --rsubtype adapter -o s -a "alternate_config=1,adapter_id=4"
        HSCL129A The operation to switch the adapter in slot 4 to dedicated mode failed with the following errors:
        HSCL1400 An error has occurred during the operation to the managed system. Try the task again.
  • A problem was fixed for a concurrent core initialization operation failure during a concurrent firmware update.  This problem can occur if a core has been deconfigured due to exceeding a recoverable error threshold.  Performing a re-IPL of the system will recover from this problem.
  • A problem was fixed for removing an unneeded callout for the PCIe adapter cassette extender card from eleven platform event logs with SRCs matching the B7006xxx pattern. This fix will prevent unnecessary hardware replacement.  The PCIe adapter cassette has CCIN 6B91 and PN 02WF424.  The following SRCs have been corrected to remove the unneeded callout:  B7006977, B7006A2A, B7006A2B, B7006A75, B7006A88, B7006A93, B7006A98, B7006A9D, B7006AA1, B7006AA9, and B7006AB1.
    Note:  the PCIe adapter cassette is never the first callout as it always follows the cable card in the callout list. 
  • A problem was fixed for a penalty throttle for invalid AIX Key Entitlement date and PEP 2.0 activation attempts that blocks further activation attempts until there is a re-IPL of the system.  This occurs if an activation code for these specific resources is improperly entered after five previous failed attempts.  With the fix, the penalty throttle is cleared after one hour has expired, and then additional activations for the affected resources can be entered again.  As a workaround, a re-IPL of the system clears the number of failed activation attempts, allowing new activations to be entered.
  • A problem was fixed for a hypervisor task failure with SRC B7000602 logged when running debug macro "sbdumptrace -sbmgr -detail 2" to capture diagnostic data.  The secure boot trace buffer is not aligned on a 16-byte boundary in memory which triggers the failure.  With the fix, the hypervisor buffer dump utility is changed to recognize 8-byte aligned end of buffer boundaries.
  • A problem was fixed for a hang in the IPL of the system when it trying to power on.  The problem is very infrequent and caused by a slow response from the IIC bus when the IIC bus is busy with multiple requests.  To recover from the problem,  reset the service processor and try the IPL again.
  • A problem was fixed for a failed correctable error recovery for a DIMM that causes a flood of SRC BC81E580 error logs and also can prevent dynamic memory deallocation from occurring for a hard memory error.  This is a very rare problem caused by an unexpected number of correctable error symbols for the DIMM in the per-symbol counter registers.
  • A problem was fixed for certain LPC clock failures not guarding the appropriate hardware.  This problem could lead to repeated failures on subsequent reboots for a hard failure.  It would also not prevent future service processor failovers, leading to more errors and long failure scenarios.  This error is seen when there is an LPC clock failure on the redundant path for the backup service processor during an IPL.
  • A problem was fixed for deconfigured ECO cores reducing the Workload Optimized Frequency (WOF) more than it should,  thereby causing system performance to be reduced.
  • A problem was fixed for the isolation, callouts, and guard for core errors that cause a system checkstop.  When a core causes a system checkstop, the isolation of the core is invalid and there is no callout or guard of the failing core.
  • A problem was fixed for an IPL failure with RC_STOP_TRANSITION_PENDING hardware procedure error on a warm (memory-preserving ) re-IPL of the system if there were certain processor cores deconfigured at runtime.  For this problem to occur, a core must have been deconfigured at runtime prior to the re-IPL of the system.  A workaround to this problem is to power off the system and then do a power on IPL.
  • A problem was fixed for a checkstop that can occur on a warm (memory-preserving ) re-IPL of the system if there were any processor cores deconfigured at runtime.  For this problem to occur, a core must have been deconfigured at runtime prior to the re-IPL of the system.  A workaround to this problem is to power off the system and then do a power on IPL.
  • A problem was fixed for a hypervisor hang that can occur during concurrent firmware update resulting in an Incomplete managed system state on the HMC.  The issue can occur when the Processor Sharing option for dedicated processor partitions is set to "Never Allow" or the system contains unlicensed processors.  Exposure to this issue can be reduced by configuring the Processor Sharing option for dedicated processor partitions to "Allow Always".
  • A problem was fixed for possible Serial Present Detect (SPD) EEPROM corruption on a memory DIMM during certain power off scenarios, causing loss of a DIMM with SRC BC8A1D07, BC201D48, or B155A437 logged.  This problem can occur for certain uncontrolled power off scenarios such as pulling the AC power cord when the system is powered on, or other loss of AC power when system is running.  If this problem happens, the failing memory DIMM must be replaced.
System firmware changes that affect certain systems
  • For a system that does not have an HMC attached, a problem was fixed for a system dump 2GB or greater in size failing to off-load to the OS with an SRC BA280000 logged in the OS and an SRC BA28003B logged on the service processor.  This problem does not affect systems with an attached HMC since in that case system dumps are off-loaded to the HMC, not the OS, where there is no 2GB boundary error for the dump size.

4.0 How to Determine The Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane. It appears in the top right corner. Example: MH1010_117.


5.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a USB flash memory device or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: MHxxx_yyy_zzz

Where xxx = release level

Instructions for installing firmware updates and upgrades can be found at https://www.ibm.com/docs/en/power10/9080-HEX?topic=support-getting-fixes

IBM i Systems:

For information concerning IBM i Systems, go to the following URL to access Fix Central: 
https://www.ibm.com/support/fixcentral/

Choose "Select product", under Product Group specify "System i", under Product specify "IBM i", then Continue and specify the desired firmware PTF accordingly.

HMC and NovaLink Co-Managed Systems (Disruptive firmware updates only):

A co-managed system is managed by HMC and NovaLink, with one of the interfaces in the co-management master mode.
Instructions for installing firmware updates and upgrades on systems co-managed by an HMC and Novalink is the same as above for a HMC managed systems since the firmware update must be done by the HMC in the co-management master mode.  Before the firmware update is attempted, one must be sure that HMC is set in the master mode using the steps at the following IBM KnowledgeCenter link for NovaLink co-managed systems:
https://www.ibm.com/docs/en/power10/9080-HEX?topic=environment-powervm-novalink

Then the firmware updates can proceed with the same steps as for the HMC managed systems except the system must be powered off because only a disruptive update is allowed.   If a concurrent update is attempted, the following error will occur: " HSCF0180E Operation failed for <system name> (<system mtms>).  The operation failed.  E302F861 is the error code:"
https://www.ibm.com/docs/en/power10/9080-HEX?topic=support-getting-fixes

7.0 Firmware History

The complete Firmware Fix History (including HIPER descriptions)  for this Release level can be reviewed at the following url:
https://public.dhe.ibm.com/software/server/firmware/MH-Firmware-Hist.html