MM1020_089_079 / FW1020.20
12/01/22 |
Impact: Availability
Severity: SPE
New Features and Functions
- Password quality
rules were enhanced on the eBMC for local passwords such that new
passwords must have characters from at least two classes: lower-case
letters, upper-case letters, digits, and other characters. With this
enhancement, you can get a new error message from the `passwd` command:
"BAD PASSWORD: The password contains less than 2 character classes".
System firmware changes that
affect all systems
- DEFERRED: For a
system with I/O Enlarged Capacity enabled and PCIe expansion drawers
attached, a problem was fixed for the hypervisor using unnecessarily
large amounts of storage that could result in system termination.
This happens because extra memory is allocated for the external I/O
drawers which should have been excluded from "I/O Enlarged
Capacity". This problem can be avoided by not enabling "I/O
Enlarged Capacity". This fix requires an IPL to take effect
because the Huge Dynamic DMA Window capability (HDDW) TCE tables for
the I/O memory are allocated during the IPL.
- For a system with I/O Enlarged Capacity enabled, greater
than 8 TB of memory, and having an adapter in SR-IOV shared mode, a
problem was fixed for partition or system termination for a failed
memory page relocation. This can occur if the SR-IOV adapter is
assigned to a VIOS and virtualized to a client partition and then does
an I/O DMA on a section of memory greater than 2 GB in size. This
problem can be avoided by not enabling "I/O Enlarged Capacity".
- Security problems were fixed for vTPM 1.2 by updating its
OpenSSL library to version 0.9.8zh. Security vulnerabilities
CVE-2022-0778, CVE-2018-5407, CVE-2014-0076, and CVE-2009-3245 were
addressed. These problems only impact a partition if vTPM version
1.2 is enabled for the partition.
- A security problem was fixed for vTPM 2.0 by updating its
libtpms library. Security vulnerability CVE-2021-3746 was
addressed. This problem only impacts a partition if vTPM version
2.0 is enabled for the partition. The biggest threat from this
vulnerability is system availability.
- A security problem was fixed for the Virtualization
Management Interface (VMI) for vulnerability CVE-2021-45486 that could
allow a remote attacker to reveal sensitive information. This can
happen for session connections using IPv4.
- A security problem was fixed for the eBMC for vulnerability
CVE-2022-3435 that could allow a remote attacker to reveal sensitive
information from the eBMC. This can happen for session
connections using IPv4.
- A security problem was fixed for the eBMC HTTPS server
where a specially crafted multi-part HTTPS header, on a specific URI
only available to admin users, could cause a buffer overflow and lead
to a denial of service for the eBMC. This Common Vulnerabilities
and Exposures issue number is CVE-2022-2809.
- A problem was fixed for processor frequencies being lowered
by the On-Chip Controller (OCC) with SRC BC8A2A62 logged. This is
a rare problem that could occur when excessive droop is detected in the
voltage level of the processor chip. This fix updates the SBE
image.
- A problem was fixed for too frequent callouts for repair
action for recoverable errors for Predictive Error (PE) SRCs B7006A72,
B7006A74, and B7006A75. These SRCs for PCIe correctable error
events called for a repair action but the threshold for the events was
too low for a recoverable error that does not impact the system.
The threshold for triggering the PE SRCs has been increased for all PLX
and non-PLX switch correctable errors.
- A problem was fixed for a resource dump (rscdump) having
incorrect release information in the dump header. There is a
four-character length pre-pended to the value and the last four
characters of the release are truncated. This problem was
introduced in Power 10.
- A problem was fixed for a post dump IPL failing and a
system dump being lost following an abnormal system termination.
This can only happen on a system when the system is going through a
post dump IPL and there are not sufficient operational cores on the
boot processor to support an IPL. This triggers resource recovery
for the cores which can fail to restore the necessary cores if extra
cores have been errantly deconfigured.
- A problem was fixed for performance issues on a system due
to dispatching delays when doing Live Partition Mobility (LPM) to
migrate a partition in POWER9, POWER10, or default processor
compatibility modes. For this to happen for a partition in default
processor compatibility mode, it must have been booted on a Power10
system. All the problem dispatching delays will stop after the
partition migration completes. This problem can be avoided by
putting the LPM source partition into POWER9_base processor
compatibility mode or older prior to the migration.
- A problem was fixed for a rare partition hang that can
happen any time Dynamic Platform Optimizer (DPO), memory guard
recovery, or memory mirroring defragmentation occurs for a shared
processor partition running in any compatibility mode if there is also
a dedicated processor partition running in Power9 or Power10 processor
compatibility mode. This does not happen if the dedicated
partition is in Power9_base or older processor compatibility modes.
Also, if the dedicated partition has the "Processor Sharing" setting
set to "Always Allow" or "Allow when partition is active", it may be
more likely to cause a shared processor partition to hang than if the
setting is set to "Never allow" or "Allow when partition is inactive".
This problem can be avoided by using Power9_base processor
compatibility mode for any dedicated processor partitions. This problem
can also be avoided by changing all dedicated processor partitions to
use shared processors.
- A problem was fixed for an SR-IOV adapter in shared mode
failing during run time with SRC B400FF04 or B400F104 logged.
This is an infrequent error and may result in a temporary loss of
communication as the affected SR-IOV adapter is reset to recover from
the error.
- A problem was fixed for a failed NIM download/install of OS
images that are greater than 32M. This only happens when using
the default TFTP block size of 512 bytes. The latest versions of
AIX are greater than 32M in size and can have this problem. As a
workaround, in the SMS menu, change "TFTP blocksize" from 512 to 1024.
To do this, go to the SMS "Advanced Setup: BOOTP" menu option when
setting up NIM install parameters. This will allow a NIM download
of an image up to 64M.
- A change was made for DDIMM operation to comply with dram
controller requirement to disable periodic ZQ calibration during
concurrent row repair operation, then restore afterward. The
change improves resiliency against possible memory errors during the
row repair operation.
- A problem was fixed for the Hostboot platform error log
entry "FW Released Ver" field to have the published firmware release
name given instead of an IBM internal PNOR driver name. This
affects all Hostboot unrecoverable, predictive, and informational logs.
- A problem was fixed for errant DRAM memory row
repairs. Row repair was going to the wrong address or not being
cleared properly and then repaired with either a spare DRAM or chip
mark, The row repair failures put the system closer to a
predictive callout of a DRAM.
- A problem was for a processor core failing to wake up,
forcing the system into Safe Mode (reduced performance) with SRCs
BC8A2920, BC8A2625, and BC8A2616 logged. This is an infrequent
problem caused by a unique scenario that causes a wake up for a core
target to be missed.
- A problem was fixed for a partition firmware data storage
error with SRC BA210003 logged or for a failure to locate NVMe target
namespaces when attempting to access NVMe devices over Fibre Channel
(FC-NVME) SANs connected to third-party vendor storage systems.
This error condition, if it occurs, prevents firmware from accessing
NVMe namespaces over FC as described in the following scenarios:
1) Boot attempts from an NVMe namespace over FC using the current
SMS bootlist could fail.
2) From SMS menus via option 3 - I/O Device Information - no
devices can be found when attempting to view NVMe over FC devices.
3) From SMS menus via option 5 - Select Boot Options - no
bootable devices can be found when attempting to view and select an
NVMe over FC bootable device for the purpose of boot, viewing the
current device order, or modifying the boot device order.
The trigger for the problem is attempted access of NVMe namespaces over
Fibre Channel SANs connected to storage systems via one of the
scenarios listed above. The frequency of this problem can be high
for some of the vendor storage systems.
- A problem was fixed on the eBMC for a missing guard record
for a bad core after a core checkstop. The guard record may fail
to get created if the core checkstop is in the middle of a DMA
operation with the hypervisor. This is a rare problem that is
very timing dependent. A re-IPL of the system should get the bad
core guarded when it fails again.
- A problem was fixed on the eBMC for the Service login
console menu being displayed to read-only users. The read-only
users are not authorized to use the Service login console, so the menu
for it has been removed.
- A problem was fixed where in some cases the system fans are
running unexpectantly at high speed, even when the system is powered
off. This is an intermittent error caused by a race condition in
the eBMC where the virtual-sensors service for the virtual ambient
temperature may not get established until after the fan control service
has started. This order of service initiation forces the system
fans to the maximum speed.
- A problem was fixed for the eBMC ASMI "Security and access
-> Policies" VirtualTPM to provide a help indicator to state that a
Virtual TPM policy change requires a boot of the system to take effect.
- A problem was fixed for an eBMC Redfish Service Validator
failure that can occur if there is a Redfish Validator task present on
the eBMC. A retry of the Redfish Validator after the other
validation task has been completed should be successful.
- A problem was fixed for a system quiesce after three failed
boot attempts from a corrupted SBE image. This should be a rare
error. If the primary processor has a corrupted
primary SBE image, the system will not boot until the processor is
replaced. With the fix, the eBMC does a side-switch to the backup
SBE image after three failed boots on the primary SBE image to allow
the system to IPL.
- A problem was fixed for an eBMC hang that could occur on an
IPL with an SRC BD8D3404 logged. This is a rare error caused by
dump storage on the eBMC being full when a core dump is present while
starting the eBMC dump manager. The system can be recovered by
clearing the dump storage files in the eBMC
/var/lib/phosphor-debug-collector/dumps directory.
- A problem was fixed for an incorrect firmware image being
allowed to be used for firmware updates via the USB, the eBMC ASMI, and
Redfish API. This causes a system power-on failure after the
update. This error can not happen for firmware updates done
through the HMC and by the OS as these methods block the incorrect
image from being used. If this error occurs, the system can be
recovered by doing another firmware update to install the correct
firmware image.
- A problem was fixed for an indefinite hang that can occur
during a power-off shutdown of the system. This problem should
not be frequent as it is triggered only if a hypervisor error occurs
during the system shutdown. If a hang occurs, the system can be
re-IPLed to resume normal operations.
- A problem was fixed for eBMC hangs that can occur for some
concurrent maintenance repairs and re-IPLs of the system. If this
occurs, it can be recovered by a reset of the BMC.
- A problem was fixed for the eBMC ASMI Health status rollup
indicator not being updated to good (green check mark) after a faulty
FRU repair or replacement. This happens for hot-pluggable or
concurrently maintainable FRUs that are associated with the chassis
when they are repaired or replaced.
- For an eBMC service login using the Hypervisor console, a
problem was fixed for the console connection status showing as
"Disconnected" when it is "Connected". This happens for the
following sequence for the "open in new tab" console view:
1) Login eBMC ASMI using service user.
2) Click on "Operations--->Service login consoles"
3) Select Hypervisor console which shows a "Connected" status.
4) Click on "open in new tab" which shows a "Disconnected" status.
The wrong status being shown does not prevent the use of the Hypervisor
console.
- A problem was fixed for the eBMC ASMI "Hardware
status->Sensors" page to improve its usability. This page can
take a few minutes to load all the sensor data, so it was changed to
output each row of data as a sensor becomes available instead of
waiting for all the sensor data to be ready before displaying the
page. This makes sensor data available sooner and allows the user
to monitor the progress of the page being built.
- A problem was fixed for the eBMC dumping and going into a
quiesced state with SRC BD8D3404 logged if a PCIe cable is plugged into
the wrong PCIe slot. The eBMC dump is an hwmontempsensor
core dump triggered by a temperature sensor failure for the incorrect
slot. This can happen, for example, if a PCIe cable card with
feature codes #EJ24 or #EJ2A is plugged into the C6 slot which is only
for CAPI cards. This fix prevents the eBMC dump and quiesce but
the cable card must still be moved to a supported PCIe slot for it to
function correctly.
- A problem was fixed for the eBMC ASMI
"Operations->Server power operation" page power setting descriptions
to provide a message that some options are enabled only when the system
is not HMC-managed. The power setting options that this applies
to are as follows:
1) Default partition environment
2) AIX/LINUX partition boot mode
3) IBM i partition boot mode
- A problem was fixed for a firmware upgrade to FW1030.00 and
later that could fail because of the larger firmware image for the
FW1030 releases. An upgrade to FW1030 will require that the
system be at least at the FW1020.20 firmware level.
- A problem was fixed for the PCIe expansion drawer Chassis
Management Card and Fan out modules (fabric adapters) not having the
Location Identify LED and Health Status and State Status properties
displayed in the eBMC Redfish query for Fabric Adapters. This happens
every time for a "/redfish/v1/Systems/system/FabricAdapters"
Redfish query.
- A problem was fixed for the eBMC ASMI "Hardware status
-> PCIe hardware topology" page showing stale topology data after a
cable fault followed by a PCIe link reset. The cable status was
stuck at inactive on the eBMC while the HMC showed a status of running.
This error can occur whenever there is a cable attribute change
followed by a link reset. As a workaround, the HMC can be used to
view the correct status of the link in the PCIe topology view.
- A problem was fixed for an ambient temperature sensor error
on an IPL with SRC BD561007 logged. This error is random and
intermittent. As a circumvention, the eBMC can be reset to fix the
errant sensor.
- A problem was fixed for the eBMC ASMI not showing hardware
deconfiguration records for guarded resources after a reset of the
eBMC. As a workaround, the hw-isolation service on the eBMC can
be restarted.
- A problem was fixed for the eBMC ASMI login not supporting
passwords greater than 20 characters. Even with the fix for
longer password support, there is still a password limitation for IPMI
users since IPMI does not allow passwords greater than 20 characters.
- A problem was fixed for the eBMC ASMI "Hardware status
-> Inventory and LEDs" page not showing PCIe cable cards. A
section called "Fabric Adapters" has been added to the page to provide
the cable card data.
- A problem was fixed for the eBMC ASMI "Settings->Power
restore policy" for "Always on" which did not restore power to the
system unless the chassis power was on prior to losing power.
With the fix, if "Always on" is set, then the system will always power
on (irrespective of the chassis power state before the eBMC reboot).
System firmware changes that
affect certain systems
- For a system that
is not managed by an HMC, a problem was fixed for the OS off-loads of
dumps from the eBMC on a non-HMC-managed system not always occurring.
This error will happen if the system is changed from an HMC-managed
system to a non-HMC-managed system without a reset of the eBMC.
With the fix, a reset of the eBMC is not required for the dump
off-loads to the OS to occur.
- For a system that is not managed by an HMC, a problem was
fixed for dump off-loads to the OS hung in waiting to be processed
after a system checkstop. This can occur if a dump off-load was
in progress at the time of the system checkstop. To recover from
this problem, the eBMC can be reset after the re-IPL to the host
running state is completed.
|