SV840
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The following Fix description table will
only contain the N (current) and N-1 (previous) levels.
The complete Firmware Fix History
(including HIPER descriptions) for
this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SV-Firmware-Hist.html
|
SV860_118_056 / FW860.40
11/08/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
System firmware changes that affect all systems
- A problem was fixed
for the "Minimum code level supported" not being shown by the Advanced
System Management Interface (ASMI) when selecting the "System
Configuration/Firmware Update Policy" menu. The message shown is
"Minimum code level supported value has not been set". The
workaround to find this value is to use the ASMI command line interface
with the "registry -l cupd/MinMifLevel" command.
- A problem was fixed for system termination and outage
caused by a corrupted system reset type. For cases where the
system reset type cannot be identified, the service processor will now
do a reset/reload to keep the system running. This is a rare
problem that is occurring during an error/recovery situation that
involves a reset of the service processor. This is a replacement
for a previous fix attempt (same fix description) for this problem but
it failed to prevent the system from terminating.
- A problem was fixed for a power supply error log with SRC
B155A4E0 not identifying the FRU location of the failed power
supply. This will happen anytime a power supply fails or is
removed at system runtime. A circumvention for this problem is to
look for other power Predictive Errors in the error log and these will
help identify the location of the failing power supply.
- A problem was fixed for "sh: errl: not found " error
messages to the service processor console whenever the Advanced System
Management Interface (ASMI) was used to display error logs. These
messages did not cause any problems except to clutter the console
output as seen in the service processor traces.
- A problem was fixed for the LineInputVoltage and
LastPowerOutputWatts being displayed in millivolts and milliwatts,
respectively, instead of volts and watts for the output from the
Redfish API for power properties for the chassis. The URL
affected is the following: "https://<fsp
ip>/redfish/v1/Chassis/<id>/Power"
- A problem was fixed for a Power Supply Unit (PSU) failure
of SRC 110015xF logged with a power supply fan call out
when doing a hot re-plug of a PSU. The power supply may be
made operational again by doing a dummy replace of the PSU that was
called out (keeping the same PSU for the replace operation). A
re-IPL of the system will also recover the PSU.
- A problem was fixed for the service processor low-level
boot code always running off the same side of the flash image,
regardless of what side has been selected for boot ( P-side or
T-side). Because this low-level boot code rarely changes, this
should not cause a problem unless corruption occurs in the flash image
of the boot code. This problem does not affect firmware
side-switches as the service processor initialization code
(higher-level code than the boot code) is running correctly from the
selected side. Without the fix, there is no recovery for boot
corruption for systems with a single service processor as the service
processor must be replaced.
- A problem was fixed for a missing serviceable event from a
periodic call home reminder. This occurred if there was an FRU
deconfigured for the serviceable event.
- A problem was fixed for help text in the Advanced System
Management Interface (ASMI) not informing the user that system fan
speeds would increase if the system Power Mode was changed to "Fixed
Maximum Frequency" mode. If ASMI panel function "System
Configuration->Power Management->Power Mode Setup" "Enable Fixed
Maximum Frequency mode" help is selected, the updated text states
"...This setting will result in the fans running at the maximum speed
for proper cooling."
- A problem was fixed for a degraded PCI link causing a
Predictive SRC for a non-cacheable unit (NCU) store time-out that
occurred with SRC B113E540 or B181E450 and PRD signature
"(NCUFIR[9]) STORE_TIMEOUT: Store timed out on PB". With the fix,
the error is changed to be an Informational as the problem is not with
the processor core and the processor should not be replaced. The
solution for degraded PCI links is different from the fix for this
problem, but a re-IPL of the CEC or a reset of the PCI adapters could
help to recover the PCI links from their degraded mode.
- A problem was fixed for the IPMI serial over LAN (SOL)
console buffer becoming full without an active ipmitool client causing
a service processor hang to host, resulting in a host initiated
reset/reload of the service processor. The problem causes a
serviceable event and a service processor dump, but otherwise it should
not impact the jobs on the running host.
- A problem was fixed for the IPMI serial over LAN (SOL)
console intermittently dropping a character of data. This
occurred anytime the console data to write size matched the free space
size in the SOL console 4K buffer.
- A problem was fixed for a Redfish Patch on the
"Chassis" "HugeDynamicDMAWindowSlotCount" for the validation of
incorrect values. Without the fix, the user will not get proper
error messages when providing bad values to the patch.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for
DPO (Dynamic Platform Optimizer) operations taking a very long and
impacting the server system with a performance degradation. The
problem is triggered by a DPO operation being done on a system with
unlicensed processor cores and a very high I/O load. The fix
involves
using a different lock type for the memory relocation activities (to
prevent lock contention between memory relocation threads and partition
threads) that is created at IPL time, so an IPL is needed to activate
the fix. More information on the DPO function can be found at the
IBM
Knowledge Center: https://www.ibm.com/support/knowledgecenter/en/8247-42L/p8hat/p8hat_dpoovw.htm
- On systems using PowerVM firmware, a problem was
fixed for an intermittent service processor core dump and a callout for
netsCommonMSGServer with SRC B181EF88. The HMC connection
to the service processor automatically recovers with a new session.
- On systems using PowerVM firmware, a problem was fixed for
a concurrent firmware update failure with HMC error message
"E302F865-PHYPTooBusyToQuiesce". This error can occur when the
error log is full on the hypervisor and it cannot accept more error
logs from the service processor. But the service processor keeps
retrying the send of an error log, resulting in a "denial of service"
scenario where the hypervisor is kept busy rejecting the error logging
attempts. Without the fix, the problem may be circumvented by
starting a logical partition (if none are running) or by purging
the error logs on the service processor.
- On systems using PowerVM firmware with mirrored memory
running IBM i partitions, a problem was fixed for memory fails in the
partition that also caused the system to crash. The system
failure will occur any time that IBM i partition memory towards the
beginning of the partition's assigned memory fails. With the fix,
the memory failure is isolated to the impacted partition, leaving the
rest of the system unaffected.
- On systems using PowerVM firmware, a problem was fixed for
failures deconfiguring SR-IOV Virtual Functions (VFs). This can
occur during Live Partition Mobility (LPM) migrations with HMC error
messages of HSCLAF16, HSCLAF15 and HSCLB602 shown. This results
in an LPM migration failure and a system reboot is required to recover
the VFs for the I/O adapters. This error may occur more
frequently in cases where the I/O adapter has pending I/O at the time
of the deconfigure request for the VF.
- On systems using PowerVM firmware, a problem was fixed for
a vNIC client that has backing devices being assigned an active server
that was not the one intended by an HMC user failover for the client
adapter. This only can happen if the vNIC client adapter had
never been activated. A circumvention is to activate the client
OS and initialize the vNIC device (ifconfig "xxx" up) and an active
backing device will then be selected.
- On systems using PowerVM firmware, a problem was fixed for
partitions with more than 32TB memory failing to IPL with memory space
errors. This can occur if the logical memory block (LMB) size is
small as there is a memory loss associated with each LMB. The
problem can be circumvented by reducing the amount of partition memory
or increasing the LMB size to reduce the total number of LMBs needed
for the memory allocation.
- On systems using PowerVM firmware, a problem was
fixed for the error handling of EEH events for the SR-IOV Virtual
Functions (VFs) that can result in IPL failure with B7006971, B400FF05,
and BA210000 SRCs logged. In these cases, the partition console
stops at an OFDBG prompt. Also, a DLPAR add of a VF may result in
a partition crash due to a 300 DSI exception because of a low-level EEH
event. A circumvention for the problem would be to debug the EEH
events which should be recovered errors and eliminate the cause of the
EEH events. With the fix, the EEH events still log Predictive
Errors but do not cause a partition failure.
- On systems using PowerVM firmware and running IBM i on
stand-alone systems (no HMC attached). a problem was fixed for an
inadvertent Operations Panel function 71 activation that put the system
into "Network Boot" mode and prevented the IBM i from IPLing. A
circumvention is to use Operations Panel function 72 to turn off
"Network Boot" mode. With the fix, the Operations Panel function
71 request will be ignored on IBM i stand-alone systems.
- A problem was fixed for intermittent high-temperature
induced link failures on the 100GB EDR IB, NIC, and RoCE adapters
caused by system fans running at too low of a speed. These
adapters include the PCIe3 1-port and 2-port 100Gb EDR IB x16 adapters
and the PCIe3 2-port 100GbE (NIC and RoCE) QSFP28 x16 adapter with
feature codes EC3E, EC3F, EC3L, EC3M, EC3T, and EC3U. EDR IB
(Enhanced Data Rate Infiniband), NIC (Network Interface Controller),
and IBTA RoCE (Remote Direct Memory Access (RDMA) over Converged
Ethernet) are the specific network standards supported in the adapters.
This problem was fixed earlier in FW860.31 for the (8284-xxx) and
(8247-xxx) models. The fix has been extended to include the E850
(8408-E8E) and the E850 (8408-44E) models.
- On systems using PowerVM firmware, a problem was fixed for
an invalid date from the service processor causing the customer date
and time to go to the Epoch value (01/01/1970) without a warning or
chance for a correction. With the fix, the first IPL
attempted on an invalid date will be rejected with a message alerting
the user to set the time correctly in the service processor. If
the warning is ignored and the date/time is not corrected, the next IPL
attempt will complete to the OS with the time reverted to the Epoch
time and date. This problem is very rare but it has been known to
occur on service processor replacements when the repair step to set the
date and time on the new service processor was inadvertently skipped by
the service representative.
- On systems using PowerVM firmware with PowerVM NovaLink, a
problem was fixed for a lost of a communications channel between the
hypervisor and the PowerVM NovaLink during a reset of the service
processor. Various NovaLink tasks, including deploy, could fail
with a "No valid host was found" error. With the fix, PowerVM
NovaLink prevents normal operations from being impacted by a reset of
the service processor.
- On systems using PowerVM firmware, a problem was fixed for
a rare system hang caused by a process dispatcher deadlock timing
window. If this problem occurs, the HMC will also go to an
"Incomplete" state for the managed system.
- On systems using PowerVM firmware, a problem
was fixed for communication failures on adapters in SR-IOV shared
mode. This communication failure only occurs when a logical
port's VLAN ID ( PVID) is dynamically changed from non-zero to
zero. An SR-IOV logical port is an I/O device created for a
partition or a partition profile using the management console (HMC)
when a user intends for the partition to access an SR-IOV adapter
Virtual Function. The error can be recovered from by a reboot of
the partition.
This fix updates adapter firmware to 10.2.252.1929, for the following
Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K,
EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using PowerVM firmware, a problem was fixed for
error logs not getting sent to the OS running in a
partition. This problem could occur if the error log buffer
was full in the hypervisor and then a re-IPL of the system
occurred. The error log full condition was persisting across the
re-IPL, preventing further logs from being sent to the OS.
- On systems using OPAL firmware, Skiboot was updated to
V5.4.8 from V5.4.6, providing the following fixes:
- A problem was fixed for an intermittent host freeze during a
reset/reload of the service processor. The host will resume
normal operations after the reset/reload has completed. To have
this error occur, a timing window has to be hit where a
synchronous message from the host is in progress to the service
processor at the same time a reset/reload is initiated.
- A problem was fixed for IPMI Serial Over Lan (SOL) console
disconnects to prevent Host process hangs related to the console
management for output buffers and error logging. If there is a
reset of the service processor and the console was active, the console
session is now closed to free all the console resources.
- A problem was fixed for "FSP: Unhandled message eb0500" error
message. This is a command sent by the FSP to OPAL to get vNVRAM
statistics. Since OPAL maintains no NVRAM statistics, it now
returns FSP_STATUS_INVALID_SUBCMD with its new handler. Sample of
OPAL log that will no longer occur with the fix:
[16944.384670488,3] FSP: Unhandled message eb0500
[16944.474110465,3] FSP: Unhandled message eb0500
- A problem was fixed for sending false messages for "Reassociating
HVSI console" when the console is not available. These message
are no longer issued for unavailable consoles:
5013.227994012,7] FSP: Reassociating HVSI console 1
[ 5013.227997540,7] FSP: Reassociating HVSI console 2
- A problem was fixed for a Delayed Power Off (DPO) failure that
occurred if the service processor reset right after the request.
With the fix, the DPO and normal shutdowns will complete on the host
without regard to service processor state changes that occur after the
request.
- On systems using OPAL firmware, Petitboot was updated to
V1.4.4 from V1.4.2, providing the following fixes:
- A problem was fixed for line truncation on the Petitboot screen
occurring for any line that had a multibyte character in it.
- A problem was fixed for the safe mode message not clearing even after
"Rescan Devices" button in safe mode was pressed and re-initialization
completed successfully.
- A problem was fixed for Petitboot configuration for boot order and
network settings being cleared when the user just wanted to clear the
IPMI override. With the fix, the IPMI override is cleared and
safe mode is exited, if active, without modifying the rest of the
configuration.
- On systems using PowerVM firmware, a problem was fixed in
the text for the Firmware License agreement to correct a link that
pointed to a URL that was not specific to microcode licensing.
The message is displayed for a machine during its initial power
on. Once accepted, the message is not displayed again. The
fixed link in the licensing agreement is the following: http://www.ibm.com/support/docview.wss?uid=isg3T1025362.
|
SV860_109_056 / FW860.31
08/30/17 |
Impact: Availability
Severity: ATT
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
System firmware changes that
affect certain systems
- A problem was fixed for intermittent high-temperature
induced link failures on the 100GB EDR IB, NIC, and RoCE adapters
caused by system fans running at too low of a speed. These
adapters include the PCIe3 1-port and 2-port 100Gb EDR IB x16 adapters
and the PCIe3 2-port 100GbE (NIC and RoCE) QSFP28 x16 adapter with
feature codes EC3E, EC3F, EC3L, EC3M, EC3T, and EC3U. EDR IB
(Enhanced Data Rate Infiniband), NIC (Network Interface Controller),
and IBTA RoCE (Remote Direct Memory Access (RDMA) over Converged
Ethernet) are the specific network standards supported in the adapters.
This problem does not apply
to the E850 (8408-E8E) or the E850
(8408-44E) models.
|
SV860_103_056 / FW860.30
06/30/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for PCIe3 I/O
expansion drawer (#EMX0) link improved stability. The settings
for the continuous time linear equalizers (CTLE) was updated for all
the PCIe adapters for the PCIe links to the expansion drawer. The
system must be re-IPLed for the fix to activate.
|
SV860_096_056 / FW860.21
06/07/17 |
Impact: Availability
Severity: ATT
Power
System S812L (8247-21L), Power
System S822L (8247-22L) and Power System S824L (8247-42L)
servers only. |
SV860_082_056 / FW860.20
03/17/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
|
SV860_070_056 / FW860.12
01/13/17 |
Impact: Availability
Severity: SPE
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
|
SV860_063_056 / FW860.11
12/05/16 |
Impact: Availability
Severity: SPE
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that
affect certain systems
- DEFERRED: A problem
was fixed for a Field Core Override (FCO) error
that causes a processor chip without functional cores to be guarded
with a SRC B111BA24 error logged and by guard association causes all
the memory and I/O resources behind the processor chip to be lost for
the current IPL. This problem is triggered by a system
being manufactured with one or more feature codes of #2319
(Factory Deconfiguration of 1-core) to assist with optimization of
software licensing. For more information on Field Core Override,
refer to IBM Knowledge Center: http://www.ibm.com/support/knowledgecenter/POWER8/p8hby/fieldcore.htm.
The error only occurs in systems where the total number of active cores
is less than the number of processor chips. When the fix is
applied on a system that has lost memory or I/O resources due to the
errant processor guard, the system must be re-IPLed with the guard
removed from the processor to recover the resources.
Without the fix, the problem may be circumvented by the following four
steps:
1) Power off the system.
2) Use the Field Core Override function to increase the number of
active processor cores in the system. The Advanced System
Management Interface (ASMI) "System Configuration -> Hardware
Deconfiguration -> Field Core Override" panel shows the number of
cores that are active in the system and it can be used to increase the
number of active processor cores in the system.
3) Unguard the failed processor. Use the ASMI "System
Configuration -> Hardware Deconfiguration -> Clear All
Deconfiguration Errors" panel to restore the guarded processor.
4) IPL with the increased number of active processor cores and the
unguarded processor.
This problem does not pertain to the IBM Power System E850 (8408-44E)
model.
|
SV860_056_056 / FW860.10
11/18/16 |
Impact:
New
Severity: New
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that affect certain systems
- DISRUPTIVE:
On systems
using the PowerVM firmware, a problem was fixed for an "Incomplete"
state caused by initiating a resource dump with selector macros from
NovaLink (vio -dump -lp 1 -fr). The failure causes a
communication
process stack frame, HVHMCCMDRTRTASK, size to be exceeded with a
hypervisor page fault that disrupts the NovalLink and/or HMC
communications. The recovery action is to re-IPL the CEC but that will
need to be done without the assistance of the management console.
For
each partition that has a OS running on the system, shut down each
partition from the OS. Then from the Advanced System Management
Interface (ASMI), power off the managed system.
Alternatively, the
system power button may also be used to do the power off. If the
management console Incomplete state persists after the power off, the
managed system should be rebuilt from the management console. For
more
information on management console recovery steps, refer to this IBM
Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm.
The fix is disruptive because the size of the PowerVM hypervisor must
be increased to accommodate the over-sized stack frame of the failing
task.
- DEFERRED: On
systems using
the PowerVM firmware, a problem was fixed for a CAPI function
unavailable condition on a system with the maximum number of CAPI
adapters and partitions. Not enough bytes were allocated for CAPI
for
the maximum configuration case. The problem may be circumvented
by
reducing the number of active partitions or CAPI adapters.
The fix is
deferred because the size of the hypervisor must be increased to
provide the additional CAPI space.
- DEFERRED:
On systems using PowerVM
firmware, a problem was fixed for cable card capable PCI slots that
fail during the IPL. Hypervisor I/O Bus Interface UE B7006A84 is
reported for each cable card capable PCI slot that doesn't
contain a
PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code
#EJ05). PCI slots containing a cable card will not report an
error but
will not be functional. The problem can be resolved by performing
an
AC cycle of the system. The trigger for the failure is the I2C
devices
used to detect the cable cards are not coming out of the power on reset
process in the correct state due to a race condition.
|
SV860_039_039 / FW860.00
11/02/16 |
Impact:
New
Severity:
New
Power System E850C (8408-44E) servers only.
|