Power9 System Firmware
Applies
to: 9040-MR9
This document provides information about the installation of
Licensed
Machine or Licensed Internal Code, which is sometimes referred to
generically
as microcode or firmware.
Contents
1.0
Systems Affected
This
package provides firmware for Power
Systems E950 (9040-MR9) servers
only.
The firmware level in this package is:
1.1 Minimum HMC Code Level
This section is intended to describe the "Minimum HMC Code Level"
required by the System Firmware to complete the firmware installation
process. When installing the System Firmware, the HMC level must be
equal to or higher than the "Minimum HMC Code Level" before starting
the system firmware update. If the HMC managing the server
targeted for the System Firmware update is running a code level lower
than the "Minimum HMC
Code Level" the firmware update will not proceed.
The
Minimum HMC Code levels for this firmware for HMC x86, ppc64
or ppc64le are listed below.
x86 - This term is used to reference the
legacy HMC
that runs on x86/Intel/AMD hardware for the Virtual HMC that can run on
the Intel
hypervisors (KVM, XEN, VMWare ESXi).
- The
Minimum HMC Code level for this firmware is: HMC V9R2M950
(PTF MH01869).
Note: The 7042-CR9 is the ONLY
Machine Type HMC appliances for x86 supported for the
Minimum HMC level.
- Although the Minimum HMC Code level for this firmware is listed
above, V9R2, HMC V9R2M951.2 (PTF
MH01892) or
higher is recommended to avoid an issue that can cause the HMC to lose
connections to all servers for a brief time with service events
E2FF1409 and E23D040A being reported. This will cause all running
server tasks such as server firmware upgrade to fail.
ppc64 or ppc64le - describes the Linux code that is compiled to
run on Power-based servers or LPARS (Logical Partitions)
- The
Minimum HMC Code level for this firmware is: HMC V9R2M950 (PTF
MH01870).
- Although the Minimum HMC Code level for this firmware is listed
above, V9R2, HMC V9R2M951.2 (PTF MH01893) or
higher is recommended to avoid an issue that can cause the HMC to lose
connections to all servers for a brief time with service events
E2FF1409 and E23D040A being reported. This will cause all running
server tasks such as server firmware upgrade to fail.
The
Minimum HMC level supports the following HMC models:
x86 - KVM, XEN, VMWare ESXi (6.0/6.5)
ppc64le - 7063-CR1,vHMC on PowerVM (POWER8 and POWER9 systems
For
information
concerning HMC
releases and the latest PTFs,
go
to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/
For specific fix level
information on key components of IBM
Power Systems running the AIX, IBM i and Linux operating systems, we
suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home
NOTES:
-You must be logged in as hscroot in order for the
firmware
installation to complete correctly.
- Systems Director Management Console (SDMC) does not support this
System Firmware level
2.0 Important
Information
Concurrent firmware update of certain
SR-IOV adapters needs AIX/VIOS fix
If the adapter firmware level in this service pack is
concurrently applied, AIX and VIOS VFs may become failed. To prevent
the VF failure, the VIOS and AIX partitions must have the fix for
IJ44288 (or a sibling APAR) applied prior to concurrently updating
SR-IOV adapter firmware. AIX/VIOS SPs Spring 2023 will ship this
fix. Until then, interim fixes (ifixes) are available from https://aix.software.ibm.com/aix/efixes/ij44288/
or by calling IBM support if an ifix is required for a different level.
A re-IPL of the system instead of concurrently updating the SR-IOV
adapter firmware would also work to prevent a VF failure.
The following SR-IOV adapter Feature Codes and CCINs are affected :
#EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67
with CCIN 2CF3.
Boot
adapter microcode requirement
Update all adapters which are boot adapters, or which may be
used as boot adapters in the future, to the latest microcode from IBM
Fix Central. The latest microcode will ensure the adapters
support the new Firmware Secure Boot feature of Power Systems. This
requirement applies when updating system firmware from a level prior to
FW940 to levels FW940 and later.
The latest adapter microcode levels include signed boot driver code. If
a boot-capable PCI adapter is not installed with the latest level of
adapter microcode, the partition which owns the adapter will boot, but
error logs with SRCs BA5400A5 or BA5400A6 will be posted. Once
the adapter(s) are updated, the error logs will no longer be posted.
Downgrading
firmware from any
given release level to an earlier release level is not recommended
Firmware downgrade warnings:
1) Adapter feature codes (#EC2S/#EC2U and #EC3M and #EC66) when
configured in SR-IOV shared mode in FW930 or later, even if
originally configured in shared mode in a pre-FW930 release, may not
function properly if the system is downgraded to a pre-FW930 release.
The adapter should be configured in dedicated mode first (i.e. take the
adapter out of SR-IOV shared mode) before downgrading to a pre-FW930
release.
2) If partitions have been run in POWER9 compatibility mode in FW940, a
downgrade to an earlier release (pre-FW940) may cause a problem with
the partitions starting. To prevent this problem, the "server
firmware" settings must be reset by rebooting partitions in
"Power9_base" before doing the downgrade.
If you feel that it is
necessary to downgrade the firmware on
your system to an earlier release level, please contact your next level
of support.
2.1 IPv6 Support and
Limitations
IPv6 (Internet Protocol version 6)
is supported in the System
Management
Services (SMS) in this level of system firmware. There are several
limitations
that should be considered.
When configuring a network interface
card (NIC) for remote IPL, only
the most recently configured protocol (IPv4 or IPv6) is retained. For
example,
if the network interface card was previously configured with IPv4
information
and is now being configured with IPv6 information, the IPv4
configuration
information is discarded.
A single network interface card
may only be chosen once for the boot
device list. In other words, the interface cannot be configured for the
IPv6 protocol and for the IPv4 protocol at the same time.
2.2 Concurrent
Firmware Updates
Concurrent system firmware update is supported on HMC Managed
Systems
only.
Ensure that there are no RMC connections issues for any system
partitions prior to applying the firmware update. If there is a
RMC connection failure to a partition during the firmware update, the
RMC connection will need to be restored and additional recovery actions
for that partition will be required to complete partition firmware
updates.
2.3 Memory
Considerations for
Firmware Upgrades
Firmware Release Level upgrades
and Service Pack updates may consume
additional system memory.
Server firmware requires memory to
support the logical partitions on
the server. The amount of memory required by the server firmware varies
according to several factors.
Factors influencing server
firmware memory requirements include the
following:
- Number of logical partitions
- Partition environments of the logical
partitions
- Number of physical and virtual I/O devices
used by the logical partitions
- Maximum memory values given to the logical
partitions
Generally, you can estimate the
amount of memory required by server
firmware to be approximately 8% of the system installed memory. The
actual amount required will generally be less than 8%. However, there
are some server models that require an absolute minimum amount of
memory for server firmware, regardless of the previously mentioned
considerations.
Additional information can be
found at:
https://www.ibm.com/support/knowledgecenter/9040-MR9/p9hat/p9hat_lparmemory.htm
2.4 SBE Updates
Power 9 servers
contain SBEs (Self Boot Engines) and are used to boot the system.
SBE is internal to each of the Power 9 chips and used to "self boot"
the chip. The SBE image is persistent and is only reloaded if
there is a system firmware update that contains a SBE change. If
there is a SBE change and system firmware update is concurrent, then
the SBE update is delayed to the next IPL of the CEC which will cause
an additional 3-5 minutes per processor chip in the system to be added
on to the IPL. If there is a SBE change and the system firmware
update is disruptive, then SBE update will cause an additional 3-5
minutes per processor chip in the system to be added on to the
IPL. During the SBE update process, the HMC or op-panel will
display service processor code C1C3C213 for each of the SBEs being
updated. This is a normal progress code and system boot should be
not be terminated by the user. Additional time estimate can be
between 12-20 minutes.
The SBE image is only updated with
this service pack if the starting firmware level is less than FW950.10.
3.0 Firmware
Information
Use the following examples as a reference to determine whether your
installation
will be concurrent or disruptive.
For systems that are not managed by an HMC, the installation
of
system
firmware is always disruptive.
Note: The concurrent levels
of system firmware may, on occasion,
contain
fixes that are known as Deferred and/or Partition-Deferred. Deferred
fixes can be installed
concurrently, but will not be activated until the next IPL.
Partition-Deferred fixes can be installed concurrently, but will not be
activated until a partition reactivate is performed. Deferred
and/or Partition-Deferred
fixes,
if any, will be identified in the "Firmware Update Descriptions" table
of this document. For these types
of fixes (Deferred and/or
Partition-Deferred) within a service pack, only the
fixes
in the service pack which cannot be concurrently activated are
deferred.
Note: The file names and service pack levels used in the
following
examples are for clarification only, and are not
necessarily levels that have been, or will be released.
System firmware file naming convention:
01VMxxx_yyy_zzz
- xxx is the release level
- yyy is the service pack level
- zzz is the last disruptive service pack level
NOTE: Values of service pack and last disruptive service pack
level
(yyy and zzz) are only unique within a release level (xxx). For
example,
01VM900_040_040 and 01VM910_040_045 are different service
packs.
An installation is disruptive if:
- The release levels (xxx) are
different.
Example:
Currently installed release is 01VM900_040_040,
new release is 01VM910_050_050.
- The service pack level (yyy) and the last disruptive
service
pack level (zzz) are the same.
Example: VM910_040_040
is disruptive, no matter what
level of VM910 is currently
installed on the system.
- The service pack level (yyy) currently installed on the
system
is
lower than the last disruptive service pack level (zzz) of the service
pack to be installed.
Example:
Currently installed service pack is VM910_040_040 and new service
pack is VM910_050_045.
An installation is concurrent if:
The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system
is the same or higher than the last disruptive service pack level (zzz)
of the service pack to be installed.
Example: Currently installed service pack is VM910_040_040, new
service pack is VM910_041_040.
3.1 Firmware
Information
and Description
Filename |
Size |
Checksum |
md5sum |
01VM950_111_045.rpm |
163078890
|
59884
|
5edf15870a6ba419499d791591cac9c9
|
Note: The Checksum can be found by running the AIX sum
command against
the rpm file (only the first 5 digits are listed).
ie: sum 01VM950_111_045.rpm
VM950
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136
|
VM950_111_045 / FW950.60
10/20/22 |
Impact: Availability
Severity: SPE
System
firmware changes that
affect all systems
- A change was made
for certain SR-IOV adapters to move up to the latest level of adapter
firmware. No specific adapter problems were addressed at this new
level. This change updates the adapter firmware to XX.32.1010 for
the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA;
#EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3. If this
adapter firmware level is concurrently applied, AIX and VIOS VFs may
become failed. To prevent the VF failure, the VIOS and AIX partitions
must have the fix for IJ44288 (or a sibling APAR) applied prior to
concurrently updating SR-IOV adapter firmware. AIX/VIOS SPs Spring 2023
will ship this fix. Until then, interim fixes (ifixes) are
available from https://aix.software.ibm.com/aix/efixes/ij44288/
or by calling IBM support if an ifix is required for a different level.
A re-IPL of the system instead of concurrently updating the SR-IOV
adapter firmware would also work to prevent a VF failure.
- Security problems were fixed for vTPM 1.2 by updating its
OpenSSL library to version 0.9.8zh. Security vulnerabilities
CVE-2022-0778, CVE-2018-5407, CVE-2014-0076, and CVE-2009-3245 were
addressed. These problems only impact a partition if vTPM version
1.2 is enabled for the partition.
- A problem was fixed for an intermittent service processor
core dump for MboxDeviceMsg with SRCs B1818601 and B6008601 logged
while the system is running. This is a timing failure related to
a double file close on an NVRAM file. The service processor will
automatically recover from this error with no impact on the system.
- A problem was fixed for an SR-IOV adapter in shared mode
failing on an IPL with SRC B2006002 logged. This is an infrequent
error caused by a different SR-IOV adapter than expected being
associated with the slot because of the same memory buffer being used
by two SR-IOV adapters. The failed SR-IOV adapter can be powered
on again and it should boot correctly.
- A problem was fixed for an SR-IOV adapter in shared mode
failing during run time with SRC B400FF04 or B400F104 logged.
This is an infrequent error and may result in a temporary loss of
communication as the affected SR-IOV adapter is reset to recover from
the error.
- A problem was fixed for a system crash with a B700F103
logged after a local core checkstop of a core with a running
partition. This infrequent error also requires a configuration
change on the system like changing the processor configuration of the
affected partition or running Dynamic Platform Optimizer (DPO).
- A problem was fixed for a rare system hang that can happen
any time Dynamic Platform Optimizer (DPO), memory guard recovery, or
memory mirroring defragmentation occurs for a dedicated processor
partition running in Power9 or Power10 processor compatibility mode.
This does not affect partitions in Power9_base or older processor
compatibility modes. If the partition has the "Processor Sharing"
setting set to "Always Allow" or "Allow when partition is active", it
may be more likely to encounter this than if the setting is set to
"Never allow" or "Allow when partition is inactive".
This problem can be avoided by using Power9_base processor
compatibility mode for dedicated processor partitions. This can also be
avoided by changing all dedicated processor partitions to use shared
processors.
- A problem was fixed for a partition with VPMEM failing to
activate after a system IPL with SRC B2001230 logged for a
"HypervisorDisallowsIPL" condition. This problem is very rare and
is triggered by the partition's hardware page table (HPT) being too big
to fit into a contiguous space in memory. As a workaround, the
problem can be averted by reducing the memory needed for the HPT.
For example, if the system memory is mirrored, the HPT size is doubled,
so turning off mirroring is one option to save space. Or the size
of the VPMEM LUN could be reduced. The goal of these options
would be to free up enough contiguous blocks of memory to fit the
partition's HPT size.
- A problem was fixed for a rare partition hang that can
happen any time Dynamic Platform Optimizer (DPO), memory guard
recovery, or memory mirroring defragmentation occurs for a shared
processor partition running in any compatibility mode if there is also
a dedicated processor partition running in Power9 or Power10 processor
compatibility mode. This does not happen if the dedicated
partition is in Power9_base or older processor compatibility modes.
Also, if the dedicated partition has the "Processor Sharing" setting
set to "Always Allow" or "Allow when partition is active", it may be
more likely to cause a shared processor partition to hang than if the
setting is set to "Never allow" or "Allow when partition is inactive".
This problem can be avoided by using Power9_base processor
compatibility mode for any dedicated processor partitions. This problem
can also be avoided by changing all dedicated processor partitions to
use shared processors.
- A problem was fixed for booting an OS using iSCSI from SMS
menus that fails with a BA010013 information log. This failure is
intermittent and infrequent. If the contents of the BA010013 are
inspected, the following messages can be seen embedded within the log:
" iscsi_read: getISCSIpacket returned ERROR"
" updateSN: Old iSCSI Reply - target_tag, exp_tag"
- A problem was fixed for the SMS menu option "I/O Device
Information". When using a partition's SMS menu option "I/O
Device Information" to list devices under a physical or virtual Fibre
Channel adapter, the list may be missing or entries in the list may be
confusing. If the list does not display, the following message is
displayed:
"No SAN adapters present. Press any key to continue".
An example of a confusing entry in a list follows:
"Pathname: /vdevice/vfc-client@30000004
WorldWidePortName: 0123456789012345
1.
500173805d0c0110,0
Unrecognized device type: c"
- A problem was fixed for a memory leak in the service
processor (FSP) that can result in an out of memory (OOM) condition in
the FSP kernel with an FSP dump and reset of the FSP. This can
occur after the FSP has been active for more than 80 days of
uptime. If the problem occurs, the system automatically recovers
with a reset/reload of the FSP.
- A problem was fixed for too frequent callouts for repair
action for recoverable errors for SRCs B7006A72, B7006A74, and
B7006A75. The current threshold limit for the switch
correctable errors is 5 occurring in 10 minutes, which is too low for a
predictable event that requests a part replacement. With the fix,
the threshold value for calling out a part replacement is increased to
match what is done for the PCIe Host Bridge ( PHB) correctable
errors. Every correctable error threshold condition on the switch
link triggers the too frequent callouts.
- A problem was fixed for a service processor FSP kernel
panic dump and reset/reload that can occur if there is a network
configuration error when using ASMI to change the network. The
SRCs B1817201 and B1817212 are logged prior to the dump. This
problem only occurs when changing the network configuration to an
incorrect setting that causes a network timeout.
System firmware changes that
affect certain systems
- On a system with no HMC and a serially attached terminal, a
problem was fixed for an intermittent service processor core dump for
NetsVTTYServer with B181D30B logged that can when using the terminal
console for the OS. This error causes the console to be lost but
can be recovered by doing a soft reset of the service processor.
|
4.0
How to Determine The Currently Installed Firmware Level
You can view the server's
current firmware level on the Advanced System
Management Interface (ASMI) Welcome pane. It appears in the top right
corner.
Example: VM920_123.
5.0
Downloading the Firmware Package
Follow the instructions on Fix Central. You must read and agree to
the
license agreement to obtain the firmware packages.
Note: If your HMC is not internet-connected you will need
to
download
the new firmware level to a USB flash memory device or ftp server.
6.0 Installing the
Firmware
The method used to install new firmware will depend on the release
level
of firmware which is currently installed on your server. The release
level
can be determined by the prefix of the new firmware's filename.
Example: VMxxx_yyy_zzz
Where xxx = release level
- If the release level will stay the same (Example: Level
VM910_040_040 is
currently installed and you are attempting to install level
VM910_041_040)
this is considered an update.
- If the release level will change (Example: Level VM900_040_040 is
currently
installed and you are attempting to install level VM910_050_050) this
is
considered an upgrade.
Instructions for
installing firmware updates and upgrades can be found at https://www.ibm.com/support/knowledgecenter/9040-MR9/p9eh6/p9eh6_updates_sys.htm
IBM i Systems:
For information concerning IBM i Systems, go
to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/
Choose "Select product", under
Product Group specify "System i", under
Product specify "IBM i", then Continue and specify the desired firmware
PTF accordingly.
HMC and
NovaLink
Co-Managed Systems (Disruptive firmware updates only):
A co-managed system is managed by HMC and NovaLink,
with one of the interfaces in the co-management master mode.
Instructions for installing firmware updates and upgrades on systems
co-managed by an HMC and Novalink is the same as above for a HMC
managed systems since the firmware update must be done by the HMC in
the co-management master mode. Before the firmware update is
attempted, one must be sure that HMC is set in the master mode using
the steps at the following IBM KnowledgeCenter link for NovaLink
co-managed systems:
https://www.ibm.com/support/knowledgecenter/9009-22A/p9eig/p9eig_kickoff.htm
Then the firmware updates can proceed with the same steps as for
the HMC managed systems except the system must be powered off because
only a disruptive update is allowed. If a concurrent update
is
attempted, the following error will occur: " HSCF0180E Operation failed
for <system name> (<system mtms>). The operation
failed.
E302F861 is the error code:"
https://www.ibm.com/support/knowledgecenter/9009-22A/p9eh6/p9eh6_updates_sys.htm
7.0 Firmware History
The complete Firmware Fix History (including HIPER descriptions)
for this Release level can be
reviewed at the following url:
https://www.ibm.com/support/pages/node/6955589