SV860
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The following Fix description table will
only contain the N (current) and N-1 (previous) levels.
The complete Firmware Fix History
(including HIPER descriptions) for
this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SV-Firmware-Hist.html
|
SV860_231_165 / FW860.A0
07/08/21 |
Impact: Availability
Severity: SPE
New
features and functions
- Support added to Redfish to provide a command to set the
ASMI user passwords using a new AccountService schema.
Using this service, the ASMI admin, HMC, and general user passwords can
be changed.
System firmware changes that
affect all systems
- A problem was fixed
for Time of Day (TOD) being lost for the real-time clock (RTC) with an
SRC B15A3303 logged when the service processor boots or resets.
This is a very rare problem that involves a timing problem in the
service processor kernel. If the server is running when the error
occurs, there will be an SRC B15A3303 logged, and the time of day on
the service processor will be incorrect for up to six hours until the
hypervisor synchronizes its (valid) time with the service
processor. If the server is not running when the error occurs,
there will be an SRC B15A3303 logged, and If the server is subsequently
IPLed without setting the date and time in ASMI to fix it, the IPL will
abort with an SRC B7881201 which indicates to the system operator that
the date and time are invalid.
- A problem was fixed in ASMI to allow setting static routes
with two default gateway IP addresses. Without the fix,
ASMI always fails with "Invalid entry. Gateway address" for this
configuration. As a workaround, the static routes could be
created using the ASMI command line and the "route add" command.
- On systems with PowerVM firmware, a problem was fixed for
intermittent failures for a reset of a Virtual Function (VF) for SR-IOV
adapters during Enhanced Error Handling (EEH) error recovery.
This is triggered by EEH events at a VF level only, not at the adapter
level. The error recovery fails if a data packet is received by
the VF while the EEH recovery is in progress. A VF that has
failed can be recovered by a partition reboot or a DLPAR remove and add
of the VF.
- On systems with PowerVM firmware, a problem was fixed where
the Floating Point Unit Computational Test, which should be set to
"staggered" by default, has been changed in some circumstances to be
disabled. If you wish to re-enable this option, this fix is
required. After applying this service pack, do the
following steps:
1) Sign into the Advanced System Management Interface (ASMI).
2) Select Floating Point Computational Unit under the System
Configuration heading and change it from disabled to what is needed:
staggered (run once per core each day) or periodic (a specified time).
3) Click "Save Settings".
- On systems with PowerVM firmware, the following problems
were fixed for certain SR-IOV adapters:
1) An error was fixed that occurs during a VNIC failover where the VNIC
backing device has a physical port down or read port errors with an SRC
B400FF02 logged.
2) A problem was fixed for adding a new logical port that has a PVID
assigned that is causing traffic on that VLAN to be dropped by other
interfaces on the same physical port which uses OS VLAN tagging for
that same VLAN ID. This problem occurs each time a logical port
with a non-zero PVID that is the same as an existing VLAN is
dynamically added to a partition or is activated as part of a partition
activation, the traffic flow stops for other partitions with OS
configured VLAN devices with the same VLAN ID. This problem can
be recovered by configuring an IP address on the logical port with the
non-zero PVID and initiating traffic flow on this logical port.
This problem can be avoided by not configuring logical ports with a
PVID if other logical ports on the same physical port are configured
with OS VLAN devices.
This fix updates the adapter firmware to 11.4.415.37 for the following
Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with
CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0,
#EN0K/#EN0L with CCIN 2CC1, #EL56/#EL38 with CCIN 2B93, and #EL57/#EL3C
with CCIN 2CC1.
Update instructions: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
- On systems with PowerVM firmware, a problem was fixed for
some serviceable events specific to the reporting of EEH errors not
being displayed on the HMC. The sending of an associated call
home event, however, was not affected. This problem is
intermittent and infrequent.
- A problem was fixed for newer hardware record names
(hardware delivered after the original POWER8 GA) not being displayed
correctly in the ASMI deconfiguration records. For example, Capp
is displayed as "Unknown".
- A problem was fixed for Over Temperature (OT) errors being
reported for the processor with SRC B1112A10. In certain workload
environments, additional cooling is needed for the processors and this
can be provided by a user option to increase the floor speed for the
fans. This fix is activated using the ASMI command line to
install an alternate power management definition file to increase the
fan speeds. This change will persist until a factory reset of the
system. Please contact IBM Support for information on the command
to use to increase the fan speeds.
This problem only pertains to the S822 (8284-22A), S822L(8247-22L), and
S822L(5148-22L) models.
- On systems with PowerVM firmware, a problem was fixed for a
system termination with SRC B700F107 following a time facility
processor failure with SRC B700F10B. With the fix, the
transparent replacement of the failed processor will occur for the
B700F10B if there is a free core, with no impact to the system.
- On systems with PowerVM firmware, a problem was fixed for
possible partition errors following a concurrent firmware update from
FW810 or later. A precondition for this problem is that DLPAR
operations of either physical or virtual I/O devices must have occurred
prior to the firmware update The error can take the form of a
partition crash at some point following the update. The frequency of
this problem is low. If the problem occurs, the OS will likely
report a DSI (Data Storage Interrupt) error. For example, AIX
produces a DSI_PROC log entry. If the partition does not crash,
it is also possible that some subsequent I/O DLPAR operations will fail.
- A problem was fixed for spurious out-of-range (greater than
127 C) temperatures being reported for the processor with SRC
B1112A10. With the fix, only valid temperature sensor readings
are used when reporting processors that have exceeded the Over
Temperature (OT) value.
- A problem was fixed in ASMI for setting a static route with
a network address for the IP such as "xxx.xxx.xxx.0". Without the
fix, ASMI always fails with "Invalid entry. IP address" for this
network address format. As a workaround, the static route could
be created with the individual IP endpoint entered instead of the
network address. or created using the ASMI command line and the "route
add" command.
System firmware changes that affect
certain systems
- On systems with an IBM i partition, a problem was fixed for
physical I/O property data not being able to be collected for an
inactive partition booted in "IOR" mode with SRC B200A101
logged. This can happen when making a system plan (sysplan)
for an IBM i partition using the HMC and the IBM i partition is
inactive. The sysplan data collection for the active IBM i
partitions is successful.
- On systems with only Integrated Facility for Linux ( IFL)
processors and AIX or IBM i partitions, a problem was fixed for
performance issues for IFL VMs (Linux and VIOS). This problem
occurs if AIX or IBM i partitions are active on a system with IPL only
cores. As a workaround, AIX or IBM i partitions should not be
activated on an IFL only system. With the fix, the activation of
AIX and IBM i partitions are blocked on an IFL only system. If
this fix is installed concurrently with AIX or IBM i partitions
running, these partitions will be allowed to continue to run until they
are powered off. Once powered off, the AIX and IBM i partitions
will not be allowed to be activated again on the IFL-only system.
This problem pertains to only the E850 (8408-E8E) and E850C(8408-44E)
models.
|
SV860_226_165 / FW860.90
12/09/20 |
Impact: Data
Severity: HIPER
New
features and functions
- On systems with
PowerVM firmware, enable periodic logging
of internal component operational data for the PCIe3 expansion drawer
paths. The logging of this data does not impact the normal use of
the system.
System firmware changes that
affect all systems
- HIPER/Pervasive:
On systems with PowerVM firmware, a problem was fixed for certain
SR-IOV adapters for a condition that may result from frequent resets of
adapter Virtual Functions (VFs), or transmission stalls and could lead
to potential undetected data corruption.
The following additional fixes are also included:
1) The VNIC backing device goes to a powered off state during a VNIC
failover or Live Partition Mobility (LPM) migration. This failure
is intermittent and very infrequent.
2) Adapter time-outs with SRC B400FF01 or B400FF02 logged.
3) Adapter time-outs related to adapter commands becoming blocked with
SRC B400FF01 or B400FF02 logged.
4) VF function resets occasionally not completing quickly enough
resulting in SRC B400FF02 logged.
This fix updates the adapter firmware to 11.4.415.33 for the following
Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with
CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0,
#EN0K/#EN0L with CCIN 2CC1, #EL56/#EL38 with CCIN 2B93, and #EL57/#EL3C
with CCIN 2CC1.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- A problem was fixed for the service processor ASMI "Factory
Reset" option to disable the IPMI service as part of the factory
reset. Without the fix, the IPMI operation state will be
unchanged by the factory reset.
- A rare problem was fixed for a checkstop during an IPL that
fails to isolate and guard the problem core. An SRC is logged
with B1xxE5xx and an extended hex word 8 xxxxDD90. With the fix,
the suspected failing hardware is guarded.
- A problem was fixed for the REST/Redfish interface to
change the success return code for object creation from "200" to
"201". The "200" status code means that the request was received
and understood and is being processed. A "201" status code
indicates that a request was successful and, as a result, a resource
has been created. The Redfish Ruby Client, "redfish_client" may
fail a transaction if a "200" status code is returned when "201" is
expected.
- On systems with PowerVM firmware, a problem was fixed to
allow quicker recovery of PCIe links for the #EMXO PCIe expansion
drawer for a run-time fault with B7006A22 logged. The time for
recovery attempts can exceed six minutes on rare occasions which may
cause I/O adapter failures and failed nodes. With the fix, the
PCIe links will recover or fail faster (in the order of seconds) so
that redundancy in a cluster configuration can be used with failure
detection and failover processing by other hosts, if available, in the
case where the PCIe links fail to recover.
- On systems with PowerVM firmware, a problem was fixed for a
concurrent maintenance "Repair and Verify" (R&V) operation for a
#EMX0 fanout module that fails with an "Unable to isolate the resource"
error message. This should occur only infrequently for cases
where a physical hardware failure has occurred which prevents access to
slot power controls. This problem can be worked around by
bringing up the "PCIe Hardware Topology" screen from either ASMI or the
HMC after the hardware failure but before the concurrent repair is
attempted. This will avoid the problem with the PCIe slot
isolation These steps can also be used to recover from the
error to allow the R&V repair to be attempted again.
- On systems with PowerVM firmware, a problem was fixed for a
B7006A96 fanout module FPGA corruption error that can occur in
unsupported PCIe3 expansion drawer(#EMX0) configurations that mix an
enhanced PCIe3 fanout module (#EMXH) in the same drawer with legacy
PCIe3 fanout modules (#EMXF, #EMXG, #ELMF, or #ELMG). This causes
the FPGA on the enhanced #EMXH to be updated with the legacy firmware
and it becomes a non-working and unusable fanout module. With the
fix, the unsupported #EMX0 configurations are detected and handled
gracefully without harm to the FPGA on the enhanced fanout modules.
- On systems with PowerVM firmware, a problem was fixed for
possible dispatching delays for partitions running in POWER8 processor
compatibility mode.
- On systems with PowerVM firmware, a problem was fixed for
system memory not returned after create and delete of partitions,
resulting in slightly less memory available after configuration changes
in the systems. With the fix, an IPL of the system will recover
any of the memory that was orphaned by the issue.
- On systems with PowerVM firmware, a problem was fixed for
utilization statistics for commands such as HMC lslparutil and
third-party lpar2rrd that do not accurately represent CPU
utilization. The values are incorrect every time for a partition
that is migrated with Live Partition Mobility (LPM). Power Enterprise
Pools 2.0 is not affected by this problem. If this problem has
occurred, here are three possible recovery options:
1) Re-IPL the target system of the migration.
2) Or delete and recreate the partition on the target system.
3) Or perform an inactive migration of the partition. The cycle
values get zeroed in this case.
- On systems with PowerVM firmware, a problem was fixed for a
PCIe3 expansion drawer cable that has hidden error logs for a single
lane failure. This happens whenever a single lane error
occurs. Subsequent lane failures are not hidden and have visible
error logs. Without the fix, the hidden or informational logs
would need to be examined to gather more information for the failing
hardware.
- On systems with PowerVM firmware, a problem was fixed for a
DLPAR remove of memory from a partition that fails if the partition
contains 65535 or more LMBs. With 16MB LMBs, this error threshold
is 1 TB of memory. With 256 MB LMBs, it is 16 TB of memory.
A reboot of the partition after the DLPAR will remove the memory from
the partition.
- On systems with PowerVM firmware, a problem was fixed for
extraneous B400FF01 and B400FF02 SRCs logged when moving cables on
SR-IOV adapters. This is an infrequent error that can occur if
the HMC performance monitor is running at the same time the cables are
moved. These SRCs can be ignored when accompanied by cable
movement.
- On systems with PowerVM firmware, a problem was fixed for
B400FF02 errors for certain SR-IOV adapters during adapter
initialization or error recovery. This is a rare error that can
occur because of a race condition in the firmware.
This fix pertains to adapters with the following Feature Codes and
CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with CCIN 2CE4,
#EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0, #EN0K/#EN0L
with CCIN 2CC1, #EL56/#EL38 with CCIN 2B93, and #EL57/#EL3C with CCIN
2CC1.
- On systems with OPAL firmware, a problem was fixed for a
reset/reload of the service processor initiated by ipmitool inband
usage on the host (such as "mc reset cold") causing all subsequent
inband IPMI messages to be blocked.
- On systems with OPAL firmware, a problem was fixed for host
hangs that can occur when doing error recovery.
- On systems with OPAL firmware, a problem was fixed for I2C
transactions to the On-Chip Controller (OCC) causing a host hang.
- On systems with PowerVM firmware, a problem was fixed for
not logging SRCs for certain cable pulls from the #EMXO PCIe expansion
drawer. With the fix, the previously undetected cable pulls are
now detected and logged with SRC B7006A8B and B7006A88 errors.
- On systems with PowerVM firmware, a problem was fixed for a
rare system hang that can occur when a page of memory is being
migrated. Page migration (memory relocation) can occur for a
variety of reasons, including predictive memory failure, DLPAR of
memory, and normal operations related to managing the page pool
resources.
- On systems with PowerVM firmware, a problem was fixed for
running PCM on a system with SR-IOV adapters in shared mode that
results in an "Incomplete" system state with certain hypervisor tasks
deadlocked. This problem is rare and is triggered when using
SR-IOV adapters in shared mode and gathering performance statistics
with PCM (Performance Collection and Monitoring) and also having a low
level error on an adapter. The only way to recover from this
condition is to re-IPL the system.
- On systems with PowerVM firmware, a problem was fixed for
an SRC B7006A99 informational log now posted as a Predictive with a
call out of the CXP cable FRU, This fix improves FRU isolation
for cases where a CXP cable alert causes a B7006A99 that occurs prior
to a B7006A22 or B7006A8B. Without the fix, the SRC B7006A99 is
informational and the latter SRCs cause a larger hardware replacement
even though the earlier event identified a probable cause for the cable
FRU.
|
SV860_215_165 / FW860.81
03/04/20 |
Impact:
Security Severity: HIPER
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
System firmware changes that affect all systems
- HIPER/Pervasive:
A problem was fixed for an HMC "Incomplete" state for a system after
the HMC user password is changed with ASMI on the service
processor. This problem can occur if the HMC password is changed
on the service processor but not also on the HMC, and a reset of the
service processor happens. With the fix, the HMC will get the
needed "failed authentication" error so that the user knows to update
the old password on the HMC.
|
SV860_212_165 / FW860.80
12/17/19 |
Impact: Security
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
New features and functions
- Support was added
for improved security for the
service processor password policy. For the service
processor, the "admin", "hmc" and "general" password must be set
on first use for newly manufactured systems and after a factory reset
of the system. The IPMI interface has been changed to
be disabled by default in these scenarios. The REST/Redfish
interface will return an error saying the user account is
expired. This policy change helps to enforce the service
processor is not left in a state with a well-known password. The
user can change from an expired default password to a new password
using the Advanced System Management Interface (ASMI).
- Support was added for real-time
data capture for PCIe3 expansion drawer (#EMX0) cable card connection
data via resource dump selector on the HMC or in ASMI on the service
processor. Using the resource selector string of "xmfr
-dumpccdata" will non-disruptively generate an RSCDUMP type of dump
file that has the current cable card data, including data from cables
and the retimers.
System firmware changes that affect all systems
- A problem was fixed
for an intermittent IPMI core
dump on the service processor. This occurs only rarely when
multiple IPMI sessions are starting and cleaning up at the same
time. A new IPMI session can fail initialization when one of its
session objects is cleaned up. The circumvention is to retry the
IPMI command that failed.
- On systems using PowerVM firmware, a
problem was fixed for SR-IOV adapters to provide a consistent
Informational message level for cable plugging issues. For
transceivers not plugged on certain SR-IOV adapters, an unrecoverable
error (UE) SRC B400FF03 was changed to an Informational message
logged. This affects the SR-IOV adapters with the following
feature codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U
with CCIN 58FB; and #EC3L/EC3M with CCIN 2CEC.
For copper cables unplugged on certain SR-IOV adapters, a missing
message was replaced with an Informational message logged. This
affects the SR-IOV adapters with the following feature codes and
CCINs: #EN17/EN18 with CCIN 2CE4; #EN0K/EN0L with CCIN 2CC1; and
#EL57/EL3C with CCIN 2CC1.
- On systems with PowerVM firmware, the
following problem related to SR-IOV was fixed: If the SR-IOV
logical port's VLAN ID (PVID) is modified while the logical port is
configured, the adapter will use an incorrect PVID for the Virtual
Function (VF). This problem is rare because most users do not
change the PVID once the logical port is configured, so they will not
have the problem.
This fix updates adapter firmware to 10.2.252.1940 for the
following Feature Codes and CCINs: #EN15/EN16 with CCIN 2CE3;
#EN17/EN18 with CCIN 2CE4; #EN0H/EN0J with CCIN 2B93; #EN0M/EN0N with
CCIN 2CC0; #EN0K/EN0L with CCIN 2CC1; #EL56/EL38 with CCIN 2B93; and
#EL57/EL3C with CCIN 2CC1.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- A problem was fixed for unknowingly
running at lower (the default) frequencies when changing into Fixed Max
Frequency (FMF) mode. This problem should be unlikely to happen
because it requires that the system already is in FMF mode, and then
the user requesting a change into FMF mode. This request is not
handled correctly as the tunable parameters get reset to default which
allows the processor frequency to be reduced to the minimum
value. The recovery for this problem is to change the power mode
to "Nominal" and then change it to FMF.
- A problem was fixed for
Novalink failing to activate partitions that have names with character
lengths near the maximum allowed character length. This problem
can be circumvented by changing the partition name to have 32
characters or less.
- A problem was fixed where a Linux or AIX partition type was
incorrectly reported as unknown. Symptoms include: IBM Cloud
Management Console (CMC) not being able to determine the RPA partition
type (Linux/AIX) for partitions that are not active; and HMC attempts
to dynamically add CPU to Linux partitions may fail with a HSCL1528
error message stating that there are not enough Integrated Facility for
Linux ( IFL) cores for the operation.
- A problem was fixed for
a possible system crash with SRC B7000103 if the HMC session is closed
while the performance monitor is active. As a circumvention for
this problem, make sure the performance monitor is turned off before
closing the HMC sessions.
- A problem was fixed for a Live
Partition Mobility (LPM) migration of a large memory partition to a
target system that causes the target system to crash and for the HMC to
go to the "Incomplete" state. For servers with the default LMB
size (256MB), if a partition is >=16TB and if desired memory is
different than the maximum memory, LPM may fail on the target
system. Servers with LMB sizes less than the default could hit
this problem with smaller memory partition sizes. A circumvention
to the problem is to set the desired and maximum memory to the same
value for the large memory partition that is to be migrated.
- A problem was fixed for
system hangs or incomplete states displayed by HMC(s) caused by a loop
in the handling of Segment Lookaside Buffer (SLB) cache memory parity
errors where SRC B7005442 may be logged. This problem has a low
frequency of occurrence as it requires severe errors in the SLB cache
that are not cleared by an error flush of the entries. A re-IPL
of the system can be used to recover from this error.
System firmware changes that affect certain systems
- On systems with an
IBM i partition, a problem was fixed
for a D-mode IPL failure when using a USB DVD drive in an IBM 7226
multimedia storage enclosure. Error logs with SRC BA16010E,
B2003110, and/or B200308C can occur. As a circumvention, an
external DVD drive can be used for the D-mode IPL.
- On systems with IBM i partitions, a
rare problem was fixed for an intermittent failure of a DLPAR remove of
an adapter. In most cases, a retry of the operation will be
successful.
|
SV860_205_165 / FW860.70
06/18/19 |
Impact: Availability
Severity: HIPER
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
System firmware changes that affect all systems
- HIPER/Pervasive: On
systems with PowerVM firmware , the following problems related to
SR-IOV were fixed:
1) A problem was fixed for new or replacement SR-IOV adapters with
feature codes EN15 and EN17 being rendered non-functional when moved to
SR-IOV mode. This includes cards moved from dedicated device mode,
newly installed adapters, and FRU replacements. This problem occurs
when the adapter firmware is updated to the 10.2.252.x levels from 11.x
adapter firmware levels.
2) A problem was fixed for certain SR-IOV adapters where SRC B400FF01
errors are seen during vNIC failovers and Live Partition Mobility (LPM)
migration of vNIC clients.This may also result in errors seen in
partitions (for example, some partitions may show LNC2ENT_TX_ERR).
3) A problem was fixed where network multicast traffic is not received
by a SR-IOV logical port (VF) network interface for a Linux partition.
The failure can occur when the partition transitions the network
interface out of promiscuous or multicast promiscuous mode.
These fixes update adapter firmware to 10.2.252.1939 for the
following Feature Codes: EN15, EN17, EN0H, EN0J, EN0M,
EN0N, EN0K, EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- DEFERRED:
PARTITION_DEFERRED: On systems with PowerVM firmware, a
problem was fixed for repeated CPU
DLPAR remove operations by Linux (Ubuntu, SUSE, or RHEL) OSes possibly
resulting in a partition crash. No specific SRCs or error
logs are reported. The problem can occur on any DLPAR CPU
remove operation if running on Linux. The occurrence is
intermittent and rare. The partition crash may result in one or
more of the following console messages (in no particular order):
1) Bad kernel stack pointer addr1 at addr2
2) Oops: Bad kernel stack pointer
3) ******* RTAS CALL BUFFER CORRUPTION *******
4) ERROR: Token not supported
This fix does not activate until there is a reboot of the partition.
- A problem was fixed for a PCIe Hub checkstop with SRC
B138E504 logged that fails to guard the errant processor chip.
With the fix, the problem hardware FRU is guarded so there is not a
recurrence of the error on the next IPL.
- A problem was fixed for an incorrect SRC of B1810000 being
logged when a firmware update fails because of Entitlement Key
expiration. The error displayed on the HMC and in the OS is
correct and meaningful. With the fix, for this firmware update
failure the correct SRC of B181309D is now logged.
- A problem was fixed for informational logs flooding
the error log if a "Get Sensor Reading" is not working.
- A problem was fixed for a Redfish (REST) Patch
request for PowerSaveMode with an unsupported mode value returning an
error code "500" instead of the correct error code of "400".
- On systems with PowerVM firmware, a problem was
fixed for a rare Live Partition Mobility migration hang with the
partition left in VPM (Virtual Page Mode) which causes performance
concerns. This error is triggered by a migration failover
operation occurring during the migration state of "Suspended" and there
has to be insufficient VASI buffers available to clear all partition
state data waiting to be sent to the migration target. Migration
failovers are rare and the migration state of "Suspended" is a
migration state lasting only a few seconds for most partitions, so this
problem should not be frequent. On the HMC, there will be an
inability to complete either a migration stop or a recovery
operation. The HMC will show the partition as migrating and any
attempt to change that will fail. The system must be re-IPLed to
recover from the problem.
- A problem was fixed for an IPMI core dump and SRC B1818601
logged intermittently when an IPMI session is closed. A flood of
B1818A03 SRCs may be logged after the error occurs. The IPMI
server is not impacted and a call home is reported for the
problem. There is no service outage for the IPMI users because of
this.
- A problem was fixed for IPMI sessions in the service
processor causing a flood of B181A803 informational error logs on
registry read fails for IPv6 and IPv4 keywords. These error logs
do not represent a real problem and may be ignored.
- On systems with the PowerVM firmware, a problem was
fixed for shared processor partitions going unresponsive after changing
the processor sharing mode of a dedicated processor partition
from "allow when partition is active" to either "allow when partition
is inactive" or "never". This problem can be circumvented by
avoiding disabling processor sharing when active on a dedicated
processor partition. To recover if the issue has been
encountered, enable "processor sharing when active" on the dedicated
partition.
- On systems with PowerVM firmware, a problem was fixed for
an error in deleting a partition with the virtualized Trusted Platform
Module (vTPM) enabled and SRC B7000602 logged. When this error
occurs, the encryption process in the hypervisor may become
unusable. The problem can be recovered from with a re-IPL of the
system.
- On systems with PowerVM firmware, a problem was fixed in
Live Partition Mobility (LPM) of a partition to a shared processor
pool, which results in the partition being unable to consume uncapped
cycles on the target system. To prevent the issue from occurring,
partitions can be migrated to the default shared processor pool and
then dynamically moved to the desired shared processor pool. To
recover from the issue, do one of the following four steps:
1) Either use DLPAR to add or remove a virtual processor to/from the
affected partition;
2) or dynamically move the partition between shared processor pools;
3) or reboot the partition;
4) or re-IPL the system.
- On systems with PowerVM firmware, a problem was fixed
for a boot failure using a N_PORT ID Virtualization (NPIV) LUN for an
operating system that is installed on a disk of 2 TB or greater, and
having a device driver for the disk that adheres to a non-zero
allocation length requirement for the "READ CAPACITY 16". The IBM
partition firmware had always used an invalid zero allocation length
for the return of data and that had been accepted by previous device
drivers. Now some of the newer device drivers are adhering to the
specification and needing an allocation length of non-zero to allow the
boot to proceed.
- On systems with PowerVM firmware, a problem was fixed for
failing to boot from an AIX mksysb backup on a USB RDX drive with SRCs
logged of BA210012, AA06000D, and BA090010. The problem trigger
is a boot attempt from the RDX device. The boot error does not occur if
a serial console is used to navigate the SMS menus.
- On systems with PowerVM firmware, a problem was fixed
for a system IPLing with an invalid time set on the service processor
that causes partitions to be reset to the Epoch date of
01/01/1970. With the fix, on the IPL, the hypervisor logs a
B700120x when the service processor real time clock is found to be
invalid and halts the IPL to allow the time and date to be corrected by
the user. The Advanced System Management Interface (ASMI) can be
used to correct the time and date on the service processor. On
the next IPL, if the time and date have not been corrected, the
hypervisor will log a SRC B7001224 (indicating the user was warned on
the last IPL) but allow the partitions to start, but the time and date
will be set to the Epoch value.
- A security problem was fixed in the service processor
Network Security Services (NSS) services which, with a
man-in-the-middle attack, could provide false completion or errant
network transactions or exposure of sensitive data from intercepted SSL
connections to ASMI, Redfish, or the service processor message
server. The Common Vulnerabilities and Exposures issue number is
CVE-2018-12384.
- On systems with PowerVM firmware, a problem was fixed for
hypervisor task getting deadlocked if partitions are powered on at the
same time that SR-IOV is being configured for an adapter. With
this problem, workloads will continue to run but it will not be
possible to change the virtualization configuration or power partitions
on and off. This error can be recovered by doing a re-IPL of the
system.
- On systems with PowerVM firmware, a problem was fixed
for hypervisor tasks getting deadlocked that cause the hypervisor to be
unresponsive to the HMC ( this shows as an incomplete state on the HMC)
with SRC B200F011 logged. This is a rare timing error. With
this problem, OS workloads will continue to run but it will not
be possible for the HMC to interact with the partitions. This
error can be recovered by doing a re-IPL of the system with a scheduled
outage.
- A problem was fixed for false indication of a real time
clock (RTC) battery failure with SRC B15A3305 logged. This error
happens infrequently. If the error occurs, and another battery
failure SRC is not logged within 24 hours, ignore the error as it was
caused by a timing issue in the battery test.
- A problem was fixed for an IPMI core dump and SRC B181720D
logged, causing the service processor to reset due to a low memory
condition. The memory loss is triggered by frequently using the
ipmitool to read the network configuration. The service processor
recovers from this error but if three of these errors occur within a 15
minute time span, the service processor will go to a failed hung state
with SRC B1817212 logged. Should a service processor hang occur,
OS workloads will continue to run but it will not be possible for the
HMC to interact with the partitions. This service processor hung
state can be recovered by doing a re-IPL of the system with a scheduled
outage.
System firmware changes that affect certain systems
- DEFERRED: On
systems with a PCIe3 I/O expansion drawer (#EMX0) , a problem was fixed
for the PCIe3 I/O expansion drawer links to improve
stability. Intermittent training failures on the links
occurred during the IPL with SRC B7006A8B logged. With the fix,
the link settings were changed to lower the peak link signal
amplification to bring the signal level into the middle of the
operating range, thus improving the high margin to reduce link training
failures. The system must be re-IPLed for the fix to activate.
- On a system witn an IBM i partition, a problem was fixed
for a DLPAR force-remove of a physical IO adapter from an IBM i
partition and a simultaneous power off of the partition causing the
partition to hang during the power off. To recover the partition
from the error, the system must be re-IPLed. This problem is rare
because there is only a 2-second timing window for the DLPAR and power
off to interfere with each other.
- On a system with an active IBM i partition, a problem was
fixed for a SPCN firmware download to the PCIe3 I/O expansion drawer
(feature #EMX0) Chassis Management Card (CMC) that could possibly get
stuck in a pending state. This failure is very unlikely as it
would require a concurrent replacement of the CMC card that is loaded
with a SPCN level that is older than 2015 (01MEX151012a). The
failure with the SPCN download can be corrected by a re-IPL of the
system.
- On a system with an AMS (Active Memory Sharing) partition,
a problem was fixed for a Live Partition Mobility (LPM) migration
failure when migrating from P9 to a pre-FW860 P8 or P7 system.
This failure can occur if the P9 partition is in dedicated memory mode,
and the Physical Page Table (PPT) ratio is explicitly set on the HMC
(rather than keeping the default value) and the partition is then
transitioned to AMS mode prior to the migration to the older
system. This problem can be avoided by using dedicated memory in
the partition being migrated back to the older system.
- On systems with PowerVM firmware and a vNIC configuration
with multiple backing Virtual Functions (VFs), a problem was fixed for
a backing VF failure after a sequence of repeated failovers where one
of the VF backing devices goes to a powered off state. This
problem is infrequent and only occurs after many vNIC failovers.
A reboot of the partition with the affected VF will recover it.
- On systems with PCIe3 expansion drawers (feature code
#EMX0), a problem was fixed for a UE B700BA01 logged after a FRU
was replaced in the PCIe Expansion drawer. The log should have
been informational instead of unrecoverable because it is normal to
have this log for a replaced part in the expansion drawer that has a
different serial number from the old part. If a part in the
expansion drawer has been replaced, the UE error log can be ignored.
- On systems with IBMi partitions, a problem was fixed
for Live Partition Mobility (LPM) migrations that could have incorrect
hardware resource information (related to VPD) in the target partition
if a failover had occurred for the source partition during the
migration. This failover would have to occur during the Suspended
state of the migration, which only lasts about a second, so this should
be rare. With the fix, at a minimum the migration error will be
detected to abort the migration so it can be restarted. And at a
later IBMi OS level, the fix will allow the migration to complete even
though the failover has occurred during the Suspended state of the
migration.
- On systems with PCIe3 expansion drawers (feature #EMX0), a
problem was fixed for PCI link recovery failure during a PCI Host
Bridge (PHB) reset with SRCs of B7006A80, B7006A22, B7006A8B, and
B7006970 logged. This causes the cable card to fail, losing all
slots in the expansion drawer. This is a rare problem. If
this error occurs, a concurrent maintenance operation could reboot the
expansion drawer or a re-IPL of the system could be done to recover the
drawer.
- On systems with an IBM i partition with greater than 9999
GB installed, a problem was fixed for on/Off COD memory-related
amounts not being displayed correctly. This only happens when
retrieving the On/Off COD numbers via a particular IBMi MATMATR MI
command option value.
- On systems with PCIe3 expansion drawers(feature code
#EMX0), a problem was fixed for a concurrent exchange of a PCIe
expansion drawer cable card, although successful, leaves the fault LED
turned on.
- On systems using PowerVM firmware, a problem was fixed for
shared processor pools where
uncapped shared processor partitions placed in a pool may not be able
to consume all available processor cycles. The problem may occur
when the sum of the allocated processing units for the pool member
partitions equals the maximum processing units of the pool.
|
SV860_180_165 / FW860.60
10/31/18 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
System firmware changes that affect all systems
- A security problem was fixed in the Dynamic Host Control
Protocol
(DHCP) client on the service processor for an out-of-bound memory
access flaw that could be used by a malicious DHCP server to crash the
DHCP client process. The Common Vulnerabilities and Exposures
issue
number is CVE-2018-5732.
- A security problem was fixed to detect and prevent Self
Boot Engine (SBE) SEEPROM corruption. The Common
Vulnerabilities and Exposures issue number is CVE-2018-8931.
|
SV860_165_165 / FW860.51
05/22/18 |
Impact: Security
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
Response for Recent Security Vulnerabilities
- DISRUPTIVE:
On systems with PowerVM firmware, In response to recently
reported security vulnerabilities, this firmware update is being
released to address Common Vulnerabilities and Exposures issue number
CVE-2018-3639. In addition, Operating System updates are required
in conjunction with this FW level for CVE-2018-3639.
|
SV860_160_056 / FW860.50
05/03/18 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
System firmware changes that affect certain systems
- DEFERRED: On
systems with PowerVM firmware, a problem was fixed for a PCIe3 I/O
expansion drawer (with feature code #EMX0) where control path stability
issues may cause certain SRCs to be logged. Systems using copper
cables may log SRC B7006A87 or similar SRCs, and the fanout module may
fail to become active. Systems using optical cables may log SRC
of B7006A22 or similar SRCs. For this problem, the errant I/O
drawer may be recovered by a re-IPL of the system.
|
SV860_138_056 / FW860.42
01/09/18 |
Impact: Security
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers only.
New features and functions
- In response to recently reported security vulnerabilities,
this firmware update is being released to address Common
Vulnerabilities and Exposures issue numbers CVE-2017-5715,
CVE-2017-5753 and CVE-2017-5754. Operating System updates are
required in conjunction with this FW level for CVE-2017-5753 and
CVE-2017-5754.
|
SV860_127_056 / FW860.41
12/08/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers only.
|
SV860_118_056 / FW860.40
11/08/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers only.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for
DPO (Dynamic Platform Optimizer) operations taking a very long and
impacting the server system with a performance degradation. The
problem is triggered by a DPO operation being done on a system with
unlicensed processor cores and a very high I/O load. The fix
involves
using a different lock type for the memory relocation activities (to
prevent lock contention between memory relocation threads and partition
threads) that is created at IPL time, so an IPL is needed to activate
the fix. More information on the DPO function can be found at the
IBM
Knowledge Center: https://www.ibm.com/support/knowledgecenter/en/8247-42L/p8hat/p8hat_dpoovw.htm
|
SV860_109_056 / FW860.31
08/30/17 |
Impact: Availability
Severity: ATT
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
|
SV860_103_056 / FW860.30
06/30/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for PCIe3 I/O
expansion drawer (#EMX0) link improved stability. The settings
for the continuous time linear equalizers (CTLE) was updated for all
the PCIe adapters for the PCIe links to the expansion drawer. The
system must be re-IPLed for the fix to activate.
|
SV860_096_056 / FW860.21
06/07/17 |
Impact: Availability
Severity: ATT
Power
System S812L (8247-21L), Power
System S822L (8247-22L) and Power System S824L (8247-42L)
servers only. |
SV860_082_056 / FW860.20
03/17/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
|
SV860_070_056 / FW860.12
01/13/17 |
Impact: Availability
Severity: SPE
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
|
SV860_063_056 / FW860.11
12/05/16 |
Impact: Availability
Severity: SPE
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that
affect certain systems
- DEFERRED: A problem
was fixed for a Field Core Override (FCO) error
that causes a processor chip without functional cores to be guarded
with a SRC B111BA24 error logged and by guard association causes all
the memory and I/O resources behind the processor chip to be lost for
the current IPL. This problem is triggered by a system
being manufactured with one or more feature codes of #2319
(Factory Deconfiguration of 1-core) to assist with optimization of
software licensing. For more information on Field Core Override,
refer to IBM Knowledge Center: http://www.ibm.com/support/knowledgecenter/POWER8/p8hby/fieldcore.htm.
The error only occurs in systems where the total number of active cores
is less than the number of processor chips. When the fix is
applied on a system that has lost memory or I/O resources due to the
errant processor guard, the system must be re-IPLed with the guard
removed from the processor to recover the resources.
Without the fix, the problem may be circumvented by the following four
steps:
1) Power off the system.
2) Use the Field Core Override function to increase the number of
active processor cores in the system. The Advanced System
Management Interface (ASMI) "System Configuration -> Hardware
Deconfiguration -> Field Core Override" panel shows the number of
cores that are active in the system and it can be used to increase the
number of active processor cores in the system.
3) Unguard the failed processor. Use the ASMI "System
Configuration -> Hardware Deconfiguration -> Clear All
Deconfiguration Errors" panel to restore the guarded processor.
4) IPL with the increased number of active processor cores and the
unguarded processor.
This problem does not pertain to the IBM Power System E850 (8408-44E)
model.
|
SV860_056_056 / FW860.10
11/18/16 |
Impact:
New
Severity: New
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that affect certain systems
- DISRUPTIVE:
On systems
using the PowerVM firmware, a problem was fixed for an "Incomplete"
state caused by initiating a resource dump with selector macros from
NovaLink (vio -dump -lp 1 -fr). The failure causes a
communication
process stack frame, HVHMCCMDRTRTASK, size to be exceeded with a
hypervisor page fault that disrupts the NovalLink and/or HMC
communications. The recovery action is to re-IPL the CEC but that will
need to be done without the assistance of the management console.
For
each partition that has a OS running on the system, shut down each
partition from the OS. Then from the Advanced System Management
Interface (ASMI), power off the managed system.
Alternatively, the
system power button may also be used to do the power off. If the
management console Incomplete state persists after the power off, the
managed system should be rebuilt from the management console. For
more
information on management console recovery steps, refer to this IBM
Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm.
The fix is disruptive because the size of the PowerVM hypervisor must
be increased to accommodate the over-sized stack frame of the failing
task.
- DEFERRED: On
systems using
the PowerVM firmware, a problem was fixed for a CAPI function
unavailable condition on a system with the maximum number of CAPI
adapters and partitions. Not enough bytes were allocated for CAPI
for
the maximum configuration case. The problem may be circumvented
by
reducing the number of active partitions or CAPI adapters.
The fix is
deferred because the size of the hypervisor must be increased to
provide the additional CAPI space.
- DEFERRED:
On systems using PowerVM
firmware, a problem was fixed for cable card capable PCI slots that
fail during the IPL. Hypervisor I/O Bus Interface UE B7006A84 is
reported for each cable card capable PCI slot that doesn't
contain a
PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code
#EJ05). PCI slots containing a cable card will not report an
error but
will not be functional. The problem can be resolved by performing
an
AC cycle of the system. The trigger for the failure is the I2C
devices
used to detect the cable cards are not coming out of the power on reset
process in the correct state due to a race condition.
|
SV860_039_039 / FW860.00
11/02/16 |
Impact:
New
Severity:
New
Power System E850C (8408-44E) servers only.
|