Power10 System Firmware

Applies to: 9043-MRX

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.

1.0 Systems Affected
1.1 Minimum HMC Code Level
2.0 Important Information
2.2 Concurrent Firmware Updates
2.3 Memory Considerations for Firmware Upgrades
2.4 SBE Updates
3.0 Firmware Information
3.1 Firmware Information and Description Table
4.0 How to Determine Currently Installed Firmware Level
5.0 Downloading the Firmware Package
6.0 Installing the Firmware
7.0 Firmware History

1.0 Systems Affected

This package provides firmware for IBM Power System E1050 (9043-MRX) server only.

The firmware level in this package is:

MM1020_089 / FW1020.20

1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update. If the HMC managing the server targeted for the System Firmware update is running a code level lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code levels for this firmware for HMC x86, ppc64 or ppc64le are listed below.

NOTE: The HMC must be at a prerequisite level of HMC 1020.02 (September Monthly PTF) or 1021 (HMC 1020 SP1) before installing FW1020.10 or later service packs. This level will fix the HMC so that it will show any deferred defects in the service pack being installed.

x86 - This term is used to reference the legacy HMC that runs on x86/Intel/AMD hardware for the Virtual HMC that can run on the Intel hypervisors (KVM, XEN, VMWare ESXi).

The Minimum HMC Code level for this firmware is: HMC V10R1M1020.2 (PTF MF70256)

ppc64 or ppc64le - describes the Linux code that is compiled to run on Power-based servers or LPARS (Logical Partitions)

The Minimum HMC Code level for this firmware is: HMC V10R1M1020.2 (PTF MF70257)

The Minimum HMC level supports the following HMC models:
HMC models: 7063-CR1 and 7063-CR2
x86 - KVM, XEN, VMWare ESXi (6.0/6.5)
ppc64le - vHMC on PowerVM (POWER8,POWER9, and POWER10 systems)

For information concerning HMC releases and the latest PTFs, go to the following URL to access Fix Central:
https://www.ibm.com/support/fixcentral/

For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
https://esupport.ibm.com/customercare/flrt/home

NOTES:
-You must be logged in as hscroot in order for the firmware installation to complete correctly.
- Systems Director Management Console (SDMC) does not support this System Firmware level.

2.0 Important Information

NovaLink levels earlier than "NovaLink 1.0.0.16 Feb 2020 release" with partitions running certain SR-IOV capable adapters is NOT supported at this firmware release

NovaLink levels earlier than "NovaLink 1.0.0.16 Feb 2020 release" do not support IO adapter FCs EC2R/EC2S, EC2T/EC2U, EC66/EC67 with FW1010 and later.

2.2 Concurrent Firmware Updates

Concurrent system firmware update is supported on HMC Managed Systems only.

Ensure that there are no RMC connections issues for any system partitions prior to applying the firmware update. If there is a RMC connection failure to a partition during the firmware update, the RMC connection will need to be restored and additional recovery actions for that partition will be required to complete partition firmware updates.

2.3 Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:

Number of logical partitions
Partition environments of the logical partitions
Number of physical and virtual I/O devices used by the logical partitions
Maximum memory values given to the logical partitions

Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
https://www.ibm.com/docs/en/power10/9043-MRX?topic=resources-memory

2.4 SBE Updates

Power10 servers contain SBEs (Self Boot Engines) and are used to boot the system. SBE is internal to each of the Power10 chips and used to "self boot" the chip. The SBE image is persistent and is only reloaded if there is a system firmware update that contains a SBE change. If there is a SBE change and system firmware update is concurrent, then the SBE update is delayed to the next IPL of the CEC which will cause an additional 3-5 minutes per processor chip in the system to be added on to the IPL. If there is a SBE change and the system firmware update is disruptive, then SBE update will cause an additional 3-5 minutes per processor chip in the system to be added on to the IPL. During the SBE update process, the HMC or op-panel will display service processor code C1C3C213 for each of the SBEs being updated. This is a normal progress code and system boot should be not be terminated by the user. Additional time estimate can be between 12-20 minutes per drawer or up to 48-80 minutes for maximum configuration.

The SBE image is updated with this service pack.

3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For systems that are not managed by an HMC, the installation of system firmware is always disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as Deferred and/or Partition-Deferred. Deferred fixes can be installed concurrently, but will not be activated until the next IPL. Partition-Deferred fixes can be installed concurrently, but will not be activated until a partition reactivate is performed. Deferred and/or Partition-Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For these types of fixes (Deferred and/or Partition-Deferred) within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01VHxxx_yyy_zzz

xxx is the release level
yyy is the service pack level
zzz is the last disruptive service pack level

NOTE: Values of service pack and last disruptive service pack level (yyy and zzz) are only unique within a release level (xxx). For example, 01MM1010_040_040 and 01MM1010_040_040 are different service packs.

An installation is disruptive if:

The release levels (xxx) are different.

Example: Currently installed release is 01VH900_040_040, new release is 01VH910_050_050.

The service pack level (yyy) and the last disruptive service pack level (zzz) are the same.

Example: VH910_040_040 is disruptive, no matter what level of VH910 is currently installed on the system.

The service pack level (yyy) currently installed on the system is lower than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is VH910_040_040 and new service pack is VH910_050_045.

An installation is concurrent if:

The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system is the same or higher than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is VH910_040_040, new service pack is VH910_041_040.

3.1 Firmware Information and Description


*Filename*	*Size*	*Checksum*	md5sum
01MM1020_089_079.img	273054704	54377	17c88a7d5fdad20fc249e60c7b958c5a
01MM1020_089_079.tar	128849920	65044	180c94d679c480166adf331b51ece7f2

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01MM1020_089_079.img

MM1020
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136

The complete Firmware Fix History for this Release Level can be reviewed at the following url:
https://public.dhe.ibm.com/software/server/firmware/MM-Firmware-Hist.html

MM1020_089_079 / FW1020.20

12/01/22

Impact: Availability Severity: SPE

New Features and Functions

Password quality rules were enhanced on the eBMC for local passwords such that new passwords must have characters from at least two classes: lower-case letters, upper-case letters, digits, and other characters. With this enhancement, you can get a new error message from the `passwd` command:
"BAD PASSWORD: The password contains less than 2 character classes".

System firmware changes that affect all systems

DEFERRED: For a system with I/O Enlarged Capacity enabled and PCIe expansion drawers attached, a problem was fixed for the hypervisor using unnecessarily large amounts of storage that could result in system termination. This happens because extra memory is allocated for the external I/O drawers which should have been excluded from "I/O Enlarged Capacity". This problem can be avoided by not enabling "I/O Enlarged Capacity". This fix requires an IPL to take effect because the Huge Dynamic DMA Window capability (HDDW) TCE tables for the I/O memory are allocated during the IPL.
For a system with I/O Enlarged Capacity enabled, greater than 8 TB of memory, and having an adapter in SR-IOV shared mode, a problem was fixed for partition or system termination for a failed memory page relocation. This can occur if the SR-IOV adapter is assigned to a VIOS and virtualized to a client partition and then does an I/O DMA on a section of memory greater than 2 GB in size. This problem can be avoided by not enabling "I/O Enlarged Capacity".
Security problems were fixed for vTPM 1.2 by updating its OpenSSL library to version 0.9.8zh. Security vulnerabilities CVE-2022-0778, CVE-2018-5407, CVE-2014-0076, and CVE-2009-3245 were addressed. These problems only impact a partition if vTPM version 1.2 is enabled for the partition.
A security problem was fixed for vTPM 2.0 by updating its libtpms library. Security vulnerability CVE-2021-3746 was addressed. This problem only impacts a partition if vTPM version 2.0 is enabled for the partition. The biggest threat from this vulnerability is system availability.
A security problem was fixed for the Virtualization Management Interface (VMI) for vulnerability CVE-2021-45486 that could allow a remote attacker to reveal sensitive information. This can happen for session connections using IPv4.
A security problem was fixed for the eBMC for vulnerability CVE-2022-3435 that could allow a remote attacker to reveal sensitive information from the eBMC. This can happen for session connections using IPv4.
A security problem was fixed for the eBMC HTTPS server where a specially crafted multi-part HTTPS header, on a specific URI only available to admin users, could cause a buffer overflow and lead to a denial of service for the eBMC. This Common Vulnerabilities and Exposures issue number is CVE-2022-2809.
A problem was fixed for processor frequencies being lowered by the On-Chip Controller (OCC) with SRC BC8A2A62 logged. This is a rare problem that could occur when excessive droop is detected in the voltage level of the processor chip. This fix updates the SBE image.
A problem was fixed for too frequent callouts for repair action for recoverable errors for Predictive Error (PE) SRCs B7006A72, B7006A74, and B7006A75. These SRCs for PCIe correctable error events called for a repair action but the threshold for the events was too low for a recoverable error that does not impact the system. The threshold for triggering the PE SRCs has been increased for all PLX and non-PLX switch correctable errors.
A problem was fixed for a resource dump (rscdump) having incorrect release information in the dump header. There is a four-character length pre-pended to the value and the last four characters of the release are truncated. This problem was introduced in Power 10.
A problem was fixed for a post dump IPL failing and a system dump being lost following an abnormal system termination. This can only happen on a system when the system is going through a post dump IPL and there are not sufficient operational cores on the boot processor to support an IPL. This triggers resource recovery for the cores which can fail to restore the necessary cores if extra cores have been errantly deconfigured.
A problem was fixed for performance issues on a system due to dispatching delays when doing Live Partition Mobility (LPM) to migrate a partition in POWER9, POWER10, or default processor compatibility modes. For this to happen for a partition in default processor compatibility mode, it must have been booted on a Power10 system. All the problem dispatching delays will stop after the partition migration completes. This problem can be avoided by putting the LPM source partition into POWER9_base processor compatibility mode or older prior to the migration.
A problem was fixed for a rare partition hang that can happen any time Dynamic Platform Optimizer (DPO), memory guard recovery, or memory mirroring defragmentation occurs for a shared processor partition running in any compatibility mode if there is also a dedicated processor partition running in Power9 or Power10 processor compatibility mode. This does not happen if the dedicated partition is in Power9_base or older processor compatibility modes. Also, if the dedicated partition has the "Processor Sharing" setting set to "Always Allow" or "Allow when partition is active", it may be more likely to cause a shared processor partition to hang than if the setting is set to "Never allow" or "Allow when partition is inactive".
This problem can be avoided by using Power9_base processor compatibility mode for any dedicated processor partitions. This problem can also be avoided by changing all dedicated processor partitions to use shared processors.
A problem was fixed for an SR-IOV adapter in shared mode failing during run time with SRC B400FF04 or B400F104 logged. This is an infrequent error and may result in a temporary loss of communication as the affected SR-IOV adapter is reset to recover from the error.
A problem was fixed for a failed NIM download/install of OS images that are greater than 32M. This only happens when using the default TFTP block size of 512 bytes. The latest versions of AIX are greater than 32M in size and can have this problem. As a workaround, in the SMS menu, change "TFTP blocksize" from 512 to 1024. To do this, go to the SMS "Advanced Setup: BOOTP" menu option when setting up NIM install parameters. This will allow a NIM download of an image up to 64M.
A change was made for DDIMM operation to comply with dram controller requirement to disable periodic ZQ calibration during concurrent row repair operation, then restore afterward. The change improves resiliency against possible memory errors during the row repair operation.
A problem was fixed for the Hostboot platform error log entry "FW Released Ver" field to have the published firmware release name given instead of an IBM internal PNOR driver name. This affects all Hostboot unrecoverable, predictive, and informational logs.
A problem was fixed for errant DRAM memory row repairs. Row repair was going to the wrong address or not being cleared properly and then repaired with either a spare DRAM or chip mark, The row repair failures put the system closer to a predictive callout of a DRAM.
A problem was for a processor core failing to wake up, forcing the system into Safe Mode (reduced performance) with SRCs BC8A2920, BC8A2625, and BC8A2616 logged. This is an infrequent problem caused by a unique scenario that causes a wake up for a core target to be missed.
A problem was fixed for a partition firmware data storage error with SRC BA210003 logged or for a failure to locate NVMe target namespaces when attempting to access NVMe devices over Fibre Channel (FC-NVME) SANs connected to third-party vendor storage systems. This error condition, if it occurs, prevents firmware from accessing NVMe namespaces over FC as described in the following scenarios:
1) Boot attempts from an NVMe namespace over FC using the current SMS bootlist could fail.
2) From SMS menus via option 3 - I/O Device Information - no devices can be found when attempting to view NVMe over FC devices.
3) From SMS menus via option 5 - Select Boot Options - no bootable devices can be found when attempting to view and select an NVMe over FC bootable device for the purpose of boot, viewing the current device order, or modifying the boot device order.
The trigger for the problem is attempted access of NVMe namespaces over Fibre Channel SANs connected to storage systems via one of the scenarios listed above. The frequency of this problem can be high for some of the vendor storage systems.
A problem was fixed on the eBMC for a missing guard record for a bad core after a core checkstop. The guard record may fail to get created if the core checkstop is in the middle of a DMA operation with the hypervisor. This is a rare problem that is very timing dependent. A re-IPL of the system should get the bad core guarded when it fails again.
A problem was fixed on the eBMC for the Service login console menu being displayed to read-only users. The read-only users are not authorized to use the Service login console, so the menu for it has been removed.
A problem was fixed where in some cases the system fans are running unexpectantly at high speed, even when the system is powered off. This is an intermittent error caused by a race condition in the eBMC where the virtual-sensors service for the virtual ambient temperature may not get established until after the fan control service has started. This order of service initiation forces the system fans to the maximum speed.
A problem was fixed for the eBMC ASMI "Security and access -> Policies" VirtualTPM to provide a help indicator to state that a Virtual TPM policy change requires a boot of the system to take effect.
A problem was fixed for an eBMC Redfish Service Validator failure that can occur if there is a Redfish Validator task present on the eBMC. A retry of the Redfish Validator after the other validation task has been completed should be successful.
A problem was fixed for a system quiesce after three failed boot attempts from a corrupted SBE image. This should be a rare error. If the primary processor has a corrupted primary SBE image, the system will not boot until the processor is replaced. With the fix, the eBMC does a side-switch to the backup SBE image after three failed boots on the primary SBE image to allow the system to IPL.
A problem was fixed for an eBMC hang that could occur on an IPL with an SRC BD8D3404 logged. This is a rare error caused by dump storage on the eBMC being full when a core dump is present while starting the eBMC dump manager. The system can be recovered by clearing the dump storage files in the eBMC /var/lib/phosphor-debug-collector/dumps directory.
A problem was fixed for an incorrect firmware image being allowed to be used for firmware updates via the USB, the eBMC ASMI, and Redfish API. This causes a system power-on failure after the update. This error can not happen for firmware updates done through the HMC and by the OS as these methods block the incorrect image from being used. If this error occurs, the system can be recovered by doing another firmware update to install the correct firmware image.
A problem was fixed for an indefinite hang that can occur during a power-off shutdown of the system. This problem should not be frequent as it is triggered only if a hypervisor error occurs during the system shutdown. If a hang occurs, the system can be re-IPLed to resume normal operations.
A problem was fixed for eBMC hangs that can occur for some concurrent maintenance repairs and re-IPLs of the system. If this occurs, it can be recovered by a reset of the BMC.
A problem was fixed for the eBMC ASMI Health status rollup indicator not being updated to good (green check mark) after a faulty FRU repair or replacement. This happens for hot-pluggable or concurrently maintainable FRUs that are associated with the chassis when they are repaired or replaced.
For an eBMC service login using the Hypervisor console, a problem was fixed for the console connection status showing as "Disconnected" when it is "Connected". This happens for the following sequence for the "open in new tab" console view:
1) Login eBMC ASMI using service user.
2) Click on "Operations--->Service login consoles"
3) Select Hypervisor console which shows a "Connected" status.
4) Click on "open in new tab" which shows a "Disconnected" status.
The wrong status being shown does not prevent the use of the Hypervisor console.
A problem was fixed for the eBMC ASMI "Hardware status->Sensors" page to improve its usability. This page can take a few minutes to load all the sensor data, so it was changed to output each row of data as a sensor becomes available instead of waiting for all the sensor data to be ready before displaying the page. This makes sensor data available sooner and allows the user to monitor the progress of the page being built.
A problem was fixed for the eBMC dumping and going into a quiesced state with SRC BD8D3404 logged if a PCIe cable is plugged into the wrong PCIe slot. The eBMC dump is an hwmontempsensor core dump triggered by a temperature sensor failure for the incorrect slot. This can happen, for example, if a PCIe cable card with feature codes #EJ24 or #EJ2A is plugged into the C6 slot which is only for CAPI cards. This fix prevents the eBMC dump and quiesce but the cable card must still be moved to a supported PCIe slot for it to function correctly.
A problem was fixed for the eBMC ASMI "Operations->Server power operation" page power setting descriptions to provide a message that some options are enabled only when the system is not HMC-managed. The power setting options that this applies to are as follows:
1) Default partition environment
2) AIX/LINUX partition boot mode
3) IBM i partition boot mode
A problem was fixed for a firmware upgrade to FW1030.00 and later that could fail because of the larger firmware image for the FW1030 releases. An upgrade to FW1030 will require that the system be at least at the FW1020.20 firmware level.
A problem was fixed for the PCIe expansion drawer Chassis Management Card and Fan out modules (fabric adapters) not having the Location Identify LED and Health Status and State Status properties displayed in the eBMC Redfish query for Fabric Adapters. This happens every time for a "/redfish/v1/Systems/system/FabricAdapters" Redfish query.
A problem was fixed for the eBMC ASMI "Hardware status -> PCIe hardware topology" page showing stale topology data after a cable fault followed by a PCIe link reset. The cable status was stuck at inactive on the eBMC while the HMC showed a status of running. This error can occur whenever there is a cable attribute change followed by a link reset. As a workaround, the HMC can be used to view the correct status of the link in the PCIe topology view.
A problem was fixed for an ambient temperature sensor error on an IPL with SRC BD561007 logged. This error is random and intermittent. As a circumvention, the eBMC can be reset to fix the errant sensor.
A problem was fixed for the eBMC ASMI not showing hardware deconfiguration records for guarded resources after a reset of the eBMC. As a workaround, the hw-isolation service on the eBMC can be restarted.
A problem was fixed for the eBMC ASMI login not supporting passwords greater than 20 characters. Even with the fix for longer password support, there is still a password limitation for IPMI users since IPMI does not allow passwords greater than 20 characters.
A problem was fixed for the eBMC ASMI "Hardware status -> Inventory and LEDs" page not showing PCIe cable cards. A section called "Fabric Adapters" has been added to the page to provide the cable card data.
A problem was fixed for the eBMC ASMI "Settings->Power restore policy" for "Always on" which did not restore power to the system unless the chassis power was on prior to losing power. With the fix, if "Always on" is set, then the system will always power on (irrespective of the chassis power state before the eBMC reboot).

System firmware changes that affect certain systems

For a system that is not managed by an HMC, a problem was fixed for the OS off-loads of dumps from the eBMC on a non-HMC-managed system not always occurring. This error will happen if the system is changed from an HMC-managed system to a non-HMC-managed system without a reset of the eBMC. With the fix, a reset of the eBMC is not required for the dump off-loads to the OS to occur.
For a system that is not managed by an HMC, a problem was fixed for dump off-loads to the OS hung in waiting to be processed after a system checkstop. This can occur if a dump off-load was in progress at the time of the system checkstop. To recover from this problem, the eBMC can be reset after the re-IPL to the host running state is completed.

4.0 How to Determine The Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Overview page under the System Information section in the Firmware Information panel. Example: (MM1020_079)

5.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a USB flash memory device or ftp server.

6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: MHxxx_yyy_zzz

Where xxx = release level

If the release level will stay the same (Example: Level VH920_040_040 is currently installed and you are attempting to install level VH920_041_040) this is considered an update.
If the release level will change (Example: Level VH900_040_040 is currently installed and you are attempting to install level VH920_050_050) this is considered an upgrade.

Instructions for installing firmware updates and upgrades can be found at https://www.ibm.com/docs/en/power10/9043-MRX?topic=9043-MRX/p10eh6/p10eh6_updates_sys.htm

IBM i Systems:

For information concerning IBM i Systems, go to the following URL to access Fix Central:
https://www.ibm.com/support/fixcentral/

Choose "Select product", under Product Group specify "System i", under Product specify "IBM i", then Continue and specify the desired firmware PTF accordingly.

HMC and NovaLink Co-Managed Systems (Disruptive firmware updates only):

A co-managed system is managed by HMC and NovaLink, with one of the interfaces in the co-management master mode.
Instructions for installing firmware updates and upgrades on systems co-managed by an HMC and Novalink is the same as above for a HMC managed systems since the firmware update must be done by the HMC in the co-management master mode. Before the firmware update is attempted, one must be sure that HMC is set in the master mode using the steps at the following IBM KnowledgeCenter link for NovaLink co-managed systems:
https://www.ibm.com/docs/en/power10/9043-MRX?topic=environment-powervm-novalink

Then the firmware updates can proceed with the same steps as for the HMC managed systems except the system must be powered off because only a disruptive update is allowed. If a concurrent update is attempted, the following error will occur: " HSCF0180E Operation failed for <system name> (<system mtms>). The operation failed. E302F861 is the error code:"
https://www.ibm.com/docs/en/power10/9043-MRX?topic=9043-MRX/p10eh6/p10eh6_updates_sys.htm

7.0 Firmware History

The complete Firmware Fix History (including HIPER descriptions) for this Release level can be reviewed at the following url:
https://public.dhe.ibm.com/software/server/firmware/MM-Firmware-Hist.html