SV840
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The following Fix description table will
only contain the N (current) and N-2 (previous) levels.
The complete Firmware Fix History
(including HIPER descriptions) for
this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SV-Firmware-Hist.html
|
SV860_109_056 / FW860.31
08/30/17 |
Impact: Availability
Severity: ATT
System firmware changes that
affect certain systems
- A problem was fixed for intermittent high-temperature
induced link failures on the 100GB EDR IB, NIC, and RoCE adapters
caused by system fans running at too low of a speed. These
adapters include the PCIe3 1-port and 2-port 100Gb EDR IB x16 adapters
and the PCIe3 2-port 100GbE (NIC and RoCE) QSFP28 x16 adapter with
feature codes EC3E, EC3F, EC3L, EC3M, EC3T, and EC3U. EDR IB
(Enhanced Data Rate Infiniband), NIC (Network Interface Controller),
and IBTA RoCE (Remote Direct Memory Access (RDMA) over Converged
Ethernet) are the specific network standards supported in the adapters.
This problem does not apply
to the E850 (8408-E8E) or the E850
(8408-44E) models.
|
SV860_103_056 / FW860.30
06/30/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
New features and functions
- Support was added for Redfish API to allow the ISO 8610
extended format for the time and date so that the date/time can be
represented as an offset from UTC (Universal Coordinated Time).
- Support for the Redfish API for power and thermal
properties for the chassis. The new URIs are as follows::
https://<fsp ip>/redfish/v1/Chassis/<id>/Power :
Provides fan data
https://<fsp ip>/redfish/v1/Chassis/<id>/Thermal : Provides
power supply data
Only the Redfish GET operation is supported for these resources.
System firmware changes that affect all systems
- A problem was fixed
for service actions with SRC B150F138 missing an Advanced System
Management Interface (ASMI) Deconfiguration Record. The
deconfiguration records make it easier to organize the repairs that are
needed for the system and they need to be consistent with the periodic
maintenance reminders that are logged for the failed FRUs.
- A problem was fixed for a false 1100026B1 (12V power good
failure) caused by an I2C bus write error for a LED state. This
error can be triggered by the fan LEDs changing state.
- A problem was fixed for a fan LED turning amber on solid
when there is no fan fault, or when the fan fault is for a different
fan. This error can be triggered anytime a fan LED needs to
change its state. The fan LEDs can be recovered to a normal state
concurrently using the following link steps for a soft reset of the
service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
- A problem was fixed for a system termination and outage
caused by a corrupted system reset type. For cases where the
system reset type cannot be identified, the service processor will now
do a reset/reload to keep the system running. This is a rare
problem that is occurring during an error/recovery situation that
involves a reset of the service processor.
- A problem was fixed for sporadic blinking amber LEDs for
the system fans with no SRCs logged. There was no problem with
the fans. The LED corruption occurred when two service processor
tasks attempted to update the LED state at the same time. The fan
LEDs can be recovered to a normal state concurrently using the
following link steps for a soft reset of the service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
- A problem was fixed for a Redfish Patch on the "Chassis" or
"IBMEnterpriseComputerSystem" with empty data that caused a "500
Internal Server Error". Validation for the empty data case has
been added to prevent the server error.
- A problem was fixed for the loss of Operations Panel
function 30 (displaying ethernet port HMC1 and HMC2 IP addresses)
after a concurrent repair of the Operations Panel.
Operations Panel function 30 can be restored concurrently using
the following link steps for a soft reset of the service
processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
- A problem was fixed for a core dump of the rtiminit
(service processor time of day) process that logs an SRC B15A3303
and could invalidate the time on the service processor. If the
error occurs while the system is powered on, the hypervisor has the
master time and will refresh the service processor time, so no action
is needed for recovery. If the error occurs while the system is
powered off, the service processor time must be corrected on the
systems having only a single service processor. Use the following
steps from the IBM Knowledge Center to change the UTC time with the
Advanced System Management Interface: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hby/viewtime.htm.
- A problem was fixed for the service processor boot
watch-dog timer expiring too soon during DRAM initialization in the
reset/reload, causing the service processor to go unresponsive.
On systems with a single service processor, the SRC B1817212 was
displayed on the control panel. For systems with redundant
service processors, the failing service processor was
deconfigured. To recover the failed service processor, the system
will need to be powered off with AC powered removed during a regularly
scheduled system service action. This problem is intermittent and
very infrequent as most of the reset/reloads of the service processor
will work correctly to restore the service processor to a normal
operating state.
- A problem was fixed for host-initiated resets of the
service processor causing the system to terminate. A prior fix
for this problem did not work correctly because some of the
host-initiated resets were being translated to unknown reset types that
caused the system to terminate. With this new correction for
failed host-initiated resets, the service processor will still be
unresponsive but the system and partitions will continue to run.
On systems with a single service processor, the SRC B1817212 will be
displayed on the control panel. For systems with redundant
service processors, the failing service processor will be
deconfigured. To recover the failed service processor, the system
will need to be powered off with AC powered removed during a regularly
scheduled system service action. This problem is intermittent and
very infrequent as most of the host-initiated resets of the service
processor will work correctly to restore the service processor to a
normal operating state.
- A problem was fixed for a service processor reset triggered
by a spurious false IIC interrupt request in the kernel. On
systems with a single service processor, the SRC B1817201 is displayed
on the Operator Panel. For systems with redundant service
processors, an error failover to the backup service processor
occurs. The problem is extremely infrequent and does not impact
processes on the running system.
- A problem was fixed for an incorrect Redfish error message
when trying to use the $metadata URI: "The resource at the
URI https://<systemip>/redfish/v1/%24metadata was not found.".
This %24 is meaningless. The "%24" has been replaced with a "$"
in the error message. The Redfish $metadata URI is not supported.
- A problem was fixed so that IPMI boot parameters are not
cleared after a service processor reset or loss of AC power to the
system.
- A problem was fixed for serializing concurrent requests for
the IPMI serial over LAN (SOL) console that were causing a service
processor hang with a subsequent Host-Initiated Reset/Reload for
service processor.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for PCIe3 I/O
expansion drawer (#EMX0) link improved stability. The settings
for the continuous time linear equalizers (CTLE) was updated for all
the PCIe adapters for the PCIe links to the expansion drawer. The
system must be re-IPLed for the fix to activate.
- On systems using the OPAL firmware, a problem was fixed for
an IPMI console hang to OPAL that caused the Linux host to be hung for
SSH sessions and for ipmitool commands to fail with "Error in open
session response message : insufficient resources for session" error
messages on the service processor. An error log with
SRC B1818601 is reported for the service processor IPMI failure
and multiple SRC BB822210 error logs are reported for OPAL
message time outs to the service processor. In most cases, this
error can be recovered from by doing a soft reset of the service
processor using the following steps from the IBM Knowledge
Center:
https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
- On systems using
OPAL firmware, a problem was fixed for intermittent long delays in the
NX co-processor for asynchronous requests such as NX 842
compressions. This problem was observed for PowerVM AIX DB2 when
it was doing hardware-accelerated compressions of data but could occur
on any asynchronous request to the NX co-processor. The PowerVM
version of the fix was delivered in FW860.00.
- On systems using PowerVM firmware with a Linux Little
Endian (LE) partition, a problem was fixed for system reset interrupts
returning the wrong values in the debug output for the NIP and MSR
registers. This problem reduces the ability to debug hung Linux
partitions using system reset interrupts. The error occurs every
time a system reset interrupt is used on a Linux LE partition.
- On systems using PowerVM firmware, a problem was fixed for
"Time Power On" enabled partitions not being capable of suspend and
resume operations. This means Live Partition Mobility (LPM) would
not be able to migrate this type of partition. As a workaround,
the partition could be transitioned to a "Non-time Power On" state and
then made capable of suspend and resume operations.
- On systems using PowerVM firmware, a problem was fixed for
manual vNIC failovers (from the HMC, manually "Make the Backing Device
Active") so that the selected server was chosen for the failover,
regardless of its priority. With the problem, the server chosen
for the VNIC failover will be the one with the most favorable
priority.
There are two possible workarounds to the problem:
(1) Disable auto-priority-failover; Change priority to the server that
is needed as the target of the failover; Force the vNIC failover;
Change priority back to original setting.
(2) Or use auto-priority-failover and change the priority so the server
that is needed as the target of the failover is favored.
- On systems using PowerVM firmware, a problem was fixed for
extra error logs in the VIOS due to failovers taking place while the
client vNIC is inactive. The inactive client vNIC failovers are
skipped unless the force flag is on. With the problem occurring,
Enhanced Error Handling (EEH) Freeze/Temporary Error/Recovery logs
posted in the VIOS error log of the client partition boot can be
ignored unless an actual problem is experienced.
- On systems using PowerVM firmware, a problem was fixed for
a Live Partition Mobility (LPM) migration abort and reboot on the
FW860 target CEC caused by a mismatched address space for the
source and target partition. The occurrence of this problem is
very rare and related to performance improvements made in the memory
management on the FW860 system that exposed a timing window in the
partition memory validation for the migration. The reboot of the
migrated partition recovers from the problem as the migration was
otherwise successful.
- On systems using PowerVM firmware, a problem was fixed for
reboot retries for IBM i partitions such that the first load source I/O
adapter (IOA) is retried instead of bypassed after the first failed
attempt. The reboot retries are done for an hour before the
reboot process gives up. This error can occur if there is more
than one known load source, and the IOA of the first load source is
different from the IOA of the last load source. The error can be
circumvented by retrying the boot of the partition after the load
source device has become available.
- On systems using PowerVM firmware, a problem was fixed for
adapters failing to transition to shared SR-IOV mode on the IPL after
changing the adapter from dedicated mode. This intermittent
problem could occur on systems using SR-IOV with very large memory
configurations.
- On systems using PowerVM firmware, a problem
was fixed for SR-IOV adapters in shared mode for a transmission stall
or time out with SRC B400FF01 logged. The time out happens during
Virtual Function (VF) shutdowns and during Function Level Resets (FLRs)
with network traffic running.
This fix updates adapter firmware to 10.2.252.1927, for the following
Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K,
EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems with maximum memory configurations (where every
DIMM slot is populated - size of DIMM does not matter), a problem
has been fixed for systems losing performance and going into Safe mode
(a power mode with reduced processor frequencies intended to protect
the system from overheating and excessive power consumption) with
B1xx2AC3/B1xx2AC4 SRCs logged. This happened because of On-Chip
Controller (OCC) timeout errors when collecting Analog Power Subsystem
Sweep (APSS) data, used by the OCC to tune the processor
frequency. This problem occurs more frequently on systems that
are running heavy workloads. Recovery from Safe mode back to
normal performance can be done with a re-IPL of the system, or
concurrently using the following link steps for a soft reset of the
service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
To check or validate that Safe mode is not active on the system will
require a dynamic celogin password from IBM Support to use the service
processor command line:
1) Log into ASMI as celogin with dynamic celogin password
generated by IBM Support
2) Select System Service Aids
3) Select Service Processor Command Line
4) Enter "tmgtclient --query_mode_and_function" from the command line
The first line of the output, "currSysPwrMode" should say "NOMINAL" and
this means the system is in normal mode and that Safe mode is not
active.
- A problem has been fixed for systems losing
performance and going into Safe mode (a power mode with reduced
processor frequencies intended to protect the system from overheating
and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs
logged. This happened because of an On-Chip Controller (OCC)
internal queue overflow. The problem has only been observed for systems
running heavy workloads with maximum memory configurations (where every
DIMM slot is populated - size of DIMM does not matter), but this may
not be required to encounter the problem. Recovery from Safe mode
back to normal performance can be done with a re-IPL of the system, or
concurrently using the following link steps for a soft reset of the
service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
To check or validate that Safe mode is not active on the system will
require a dynamic celogin password from IBM Support to use the service
processor command line:
1) Log into ASMI as celogin with dynamic celogin password
generated by IBM Support
2) Select System Service Aids
3) Select Service Processor Command Line
4) Enter "tmgtclient --query_mode_and_function" from the command line
The first line of the output, "currSysPwrMode" should say "NOMINAL" and
this means the system is in normal mode and that Safe mode is not
active.
- On systems using PowerVM firmware, a problem
was fixed for a partition boot from a USB 3.0 device that has an error
log SRC BA210003. The error is triggered by an Open Firmware
entry to the trace buffer during the partition boot. The error
log can be ignored as the boot is successful to the OS.
- On systems using PowerVM firmware, a problem
was fixed for a partition boot fail or hang from a Fibre Channel device
having fabric faults. Some of the fabric errors returned by the
VIOS are not interpreted correctly by the Open Firmware VFC drive,
causing the hang instead of generating helpful error logs.
- On systems using PowerVM firmware, a problem
was fixed for a power off hanging at D200C1FF caused by a vNIC VF
failover error with SRC B200F011. The power off hang error is
infrequent because it requires that a VF failover error having occurred
first. The system can be recovered by using the power off
immediate option from the Hardware Management Console (HMC).
- On systems using PowerVM firmware, a problem was fixed for
the incorrect reporting of the Universally Unique Identifier (UUID) to
the OS, which prevented the tracking of a partition as it moved within
a data center. The UUID value as seen on HMC or the NovaLink did
not match the value as displayed in the OS.
- On systems using OPAL firmware, a problem was fixed
for an IPMI console hang to OPAL that caused the Linux host to be hung
for SSH sessions and for ipmitool commands to fail with "Error in open
session response message: insufficient resources for session" error
messages on the service processor. An error log with
SRC B1818601 is reported for the service processor IPMI failure
and multiple SRC BB822210 error logs are reported for OPAL
message timeouts to the service processor. In most cases, this
error can be recovered from by doing a soft reset of the service
processor using the following steps from the IBM Knowledge
Center: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
- On systems using the OPAL firmware, Petitboot was updated
to v1.4.2 from V1.2.7, including the following update:
A problem was fixed for the User Interface server connect message to
make it more clear. The current message mentions a "server" which
can give the misleading impression that the user interface is waiting
for a remote network server. The delay is actually in waiting for the
pb-discover process to be ready.
More information for the Petitboot changes can be found at the
following link: http://git.ozlabs.org/?p=petitboot;a=tags
- On systems using OPAL firmware, Skiboot was updated to
v5.4.6 from V5.3.7, including the following updates:
- Fix setting of firmware progress sensor properly. OPAL was
incorrectly setting firmware status on a sensor id "00" which doesn't
exist.
- Fix error log timeout to only timeout on the send of the error log to
the service processor. This will significantly reduce false time
out errors.
- A problem was fixed for excessive "Poller recursion detected" error
messages during the skiboot that could require a power off to recover
from the error.
- A problem was fixed for an unnecessary error message when a reset
occurs on an empty PCIe Host Bridge (PHB) - no PCIe adapters
attached. The extra error message occurs anytime the PHBs in the
system go through error recovery.
- A problem was fixed to fence off an errant PCIe Host Bridge (PHB)
during a complete reset to allow the kernel to retry the
operation. This helps the system recovery process by guarding out
the bad hardware to prevent a fatal error loop.
- A problem was fixed for unknown command messages in the OPAL
log after a Host-Initiated Reset/Reload of the service processor.
- A problem was fixed the I2C bus locking that sometimes caused an OPAL
crash with double unlock() detected.
- A problem was fixed for OPAL kernel lockups when the IPMI SOL console
became unresponsive. The console can become full now and drop
messages but this prevents the lock-up of the Host kernel.
- A problem was fixed service processor time-out messages being
interpreted as "success" by OPAL, preventing correct error reporting
and recovery actions.
- A problem was fixed for a kernel hang caused by queued messages
needing to be sent to the service processor during a reset/reload of
the service processor. The messages are now cached and sent when
the service processor is ready to receive after a reset/reload.
- A problem was fixed for a soft lockup of the kernel that occurred
because of RTC/TOD clock errors during a Host-initiated Reset/Reload of
the service processor. A frozen process would be seen on the host
system along with this message: "NMI watchdog: BUG: soft
lockup - CPU#57 stuck for 23s!" where the CPU number would vary.
More information on the Skiboot changes can be found at the following
link: https://github.com/open-power/skiboot/tree/master/doc/release-notes.
- For the IBM Power System E850 (8408-44E), a problem was
fixed for the power supply with feature #EB3M and part number
001KU578 for fans spinning too slowly with SRC 110015xf logged,
where x is 1,2,3, or 4 depending on which power supply has the failing
fan.
- On systems using PowerVM firmware, a problem was fixed for
an error finding the partition load source that has a GPT format.
GUID Partition Table (GPT) is a standard for the layout of the
partition table on a physical storage device used in the server, such
as a hard disk drive or solid-state drive, using globally unique
identifiers (GUID). Other drives that are working may be using
the older master boot record (MBR) partition table format. This
problem occurs whenever load sources utilizing the GPT format occur in
other than the first entry of the boot table. Without the fix, a
GPT disk drive must be the first entry in the boot table to be able to
use it to boot a partition.
- On systems using PowerVM firmware, a problem was fixed for
an SRC BA090006 serviceable event log occurring whenever an attempt was
made to boot from an ALUA (Asymmetric Logical Unit Access)
drive. These drives are always busy by design and cannot be used
for a partition boot, but no service action is required if a user
inadvertently tries to do that. Therefore, the SRC was changed to
be an informational log.
|
SV860_096_056 / FW860.21
06/07/17 |
Impact: Availability
Severity: ATT
Power
System S812L (8247-21L), Power
System S822L (8247-22L) and Power System S824L (8247-42L)
servers only.
System firmware changes that affect certain systems
- On systems using
the OPAL firmware, a problem was fixed for an IPMI console hang to OPAL
that caused the Linux host to be hung for SSH sessions and for ipmitool
commands to fail with "Error in open session response message :
insufficient resources for session" error messages on the service
processor. An error log with SRC B1818601 is reported
for the service processor IPMI failure and multiple SRC BB822210
error logs are reported for OPAL message time outs to the service
processor. In most cases, this error can be recovered from by
doing a soft reset of the service processor using the following steps
from the IBM Knowledge Center: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
|
SV860_082_056 / FW860.20
03/17/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
New features and functions
- Support for the Redfish API for provisioning of Power
Management tunable (EnergyScale) parameters. The Redfish Scalable
Platforms Management API ("Redfish") is a DMTF specification that uses
RESTful interface semantics to perform out-of-band systems
management. (http://www.dmtf.org/standards/redfish).
Redfish service enables platform management tasks to be controlled by
client scripts developed using secure and modern programming paradigms.
For systems with redundant service processors, the Redfish service is
accessible only on the primary service processor. Usage
information for the Redfish service is available at the following
IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hdx/p8_workingwithconsoles.htm.
The IBM Power server supports DMTF Redfish API (DSP0266, version 1.0.3
published 2016-06-17) for systems management.
A copy of the the Redfish schema files in JSON format published by the
DMTF (http://redfish.dmtf.org/schemas/v1/)
are packaged in the firmware image.
The schema files are distributed on chip to enable proper functioning
in deployments with no WAN connectivity.
IBM extensions to the Redfish schema are published at http://public.dhe.ibm.com/systems/power/redfish/schemas/v1.
Copyright notices for the DMTF Redfish API and schemas are at: (a) http://www.dmtf.org/about/policies/copyright,
and (b) http://redfish.dmtf.org/schemas/README8010.html.
- Support for the IBM Power System S812 (8284-21A) with a
single partition system running either AIX (FC #EPXQ 4-core
3.026GHz 130W module, CCIN 54E9) or IBM i (FC #EPXP, 1-core 3.026GHz
130W module, CCIN 54E9) for the operating system.
- Support added to reduce memory usage for shared SR-IOV
adapters.
- Support for the Advanced System Management Interface (ASMI)
was changed to allow the special characters of "I", "O", and "Q" to be
entered for the serial number of the I/O Enclosure under the Configure
I/O Enclosure option. These characters have only been found in an
IBM serial number rarely, so typing in these characters will normally
be an incorrect action. However, the special character entry is
not blocked by ASMI anymore so it is able to support the exception
case. Without the enhancement, the typing of one of the special
characters causes message "Invalid serial number" to be displayed.
- On systems using PowerVM firmware, support was added to
allow the IBM i OS on the Power System S822 (8284-22A) without the need
for a VET code.
System firmware changes that affect all systems
- A problem was fixed
for the setting the disable of a periodic notification for a call home
error log SRC B150F138 for Memory Buffer resources (membuf) from the
Advanced System Management Interface (ASMI).
- A problem was fixed for the call home data for the B1xx2A01
SRC to include the min/max/average readings for more values. The
values for processor utilization, memory utilization, and node power
usage were added.
- A problem was fixed for incorrect callouts of the Power
Management Controller (PMC) hardware with SRC B1112AC4 and SRC
B1112AB2 logged. These extra callouts occur when the On-Chip
Controller (OCC) has placed the system in the safe state for a prior
failure that is the real problem that needs to be resolved.
- A problem was fixed for System Vital Product Data (SVPD)
FRUs being guarded but not having a corresponding error log
entry. This is a failure to commit the error log entry that has
occurred only rarely.
- A problem was fixed for the failover to the backup PNOR on
a Hostboot Self Boot Engine (SBE) failure. Without the fix, the
failed SBE causes loss of processors and memory with B15050AD
logged. With the fix, the SBE is able to access the backup PNOR
and IPL successfully by deconfiguring the failing PNOR and calling it
out as a failed FRU.
- A problem was fixed for the OS not being able to detect the
USB connected Uninterruptible Power Supply (UPS) that has feature code
#ECCF. An informational SRC B1814616 is logged from the service
processor and the IBM i OS logs a CPI0961 (Uninterruptible power supply
no longer attached). The error occurs infrequently because it
depends on system timing and system configuration. If a system is
having the error, it might have it on every IPL. The
circumvention is to reseat the USB cable connector for the USB
connected UPS.
- A problem was fixed for the Advanced System Management
Interface (ASMI) "System Service Aids => Error/Event Logs" panel not
showing the "Clear" and "Show" log options and also having a truncated
error log when there are a large number of error logs on the system.
- A problem was fixed for IPMI process core dumps for DCMI
commands used to gather power and thermal data. These dumps occur
intermittently if the DCMI commands are used in a repetitive loop.
- A problem was fixed to allow changing the IPMI channel
authentication capabilities from the OS. The following command
was causing an IPMI core dump "ipmitool channel authcap 1 4" every time
it was run.
- A problem was fixed a system going into safe mode with SRC
B1502616 logged as informational without a call home
notification. Notification is needed because the system is
running with reduced performance. If there are unrecoverable
error logs and any are marked with reduced performance and the system
has not been rebooted, then the system is probably running in safe mode
with reduced performance. With the fix, the SRC B1502616 is a
Unrecoverable Error (UE).
- A problem was fixed for valid IPv4 static IP addresses not
being allowed to communicate on the network and not being allowed to be
configured.
The Advanced System Management Interface (ASMI) static IPv4
address configuration was not allowing "255" in the IP address
subfields. The corrected range checking is as follows:
Allowed values: x.255.x.x, x.x.255.x, x.255.255.x
Disallowed values: x.x.x.255
The failure for the communication on the network is seen if the
problematic IP addresses are in use prior to a firmware update to
860.00, 860.10, 860.11, or 860.12. After the firmware update, the
service processor is unable to communicate on the network. The
problem can be circumvented by changing the service processor to use
DHCP addressing, or by moving the IP address to a different static IP
range, prior to doing the firmware update.
- A problem was fixed for DCMI commands intermittent failures
when used from the HMC to continuously gather power and thermal
data. The maximum number of IPMI sessions was being exceeded by
the HMC. The number of IPMI sessions has been increased to allow
two HMCs to collect data simultaneously.
- A problem was fixed for an unneeded service action request
for a informational VRM redundant phase fail error logged with SRC
11002701. If reminders for service action with SRC B150F138
are occurring for this problem, then firmware containing the fix needs
to be installed and ASMI error logs need to be cleared in order to stop
the periodic reminder.
System firmware changes that affect certain systems
- On systems using
PowerVM firmware with PowerVM NovaLink, a problem was fixed for
returning to HMC-only management from co-management when a
Novalink partition is deleted holding the master mode. A
circumvention is to release master mode before deleting the NovaLink
partition and then reconnect the disconnected management console.
Please refer to IBM Knowledge Center link "http://ibm.biz/novalink-kc" for
more information on the PowerVM NovaLink feature and changing the
master authority when doing co-management.
- On systems using PowerVM firmware, a problem was
fixed for a blank SRC in the LPA dump for user-initiated non-disruptive
adjunct dumps. The A2D03004 SRC is needed for problem
determination and dump analysis.
- A problem was fixed for the system VPD showing 4 extra PCIe
slots that are not actually available to the system. When running
an IBM i partition, the IBM i Hardware Service Manager shows twelve
PCIe adapter slots instead of the actual eight that can be used (P1-C2,
P1-C3, P1-C4, and P1-C5 are the extra slots displayed). This
problem only pertains to the IBM Power System S814 (8286-41A).
- On a system using PowerVM firmware with an IBM i partition
and VIOS, a problem was fixed for a Live Partition Mobility
migration for a IBM i partition that fails if there is a VIOS failover
during the migration suspended window.
- On a system using PowerVM firmware and VIOS, a
problem was fixed for a HMC "Incomplete State" after a Live Partition
Mobility migration followed by a VIOS failover. The error is
triggered by a delete operation on a migration adapter on the VIOS that
did the failover. The HMC "Incomplete State" can be recovered
from by doing a re-IPL of the system. This error can also prevent
a VIOS from activating.
- On systems using PowerVM firmware, a problem was fixed with
SR-IOV adapter error recovery where the adapter is left in a failed
state in nested error cases for some adapter errors. The
probability of this occurring is very low since the problem trigger is
multiple low-level adapter failures. With the fix, the adapter is
recovered and returned to an operational state.
- On systems using PowerVM firmware with PCIe adapters
in Single Root I/O Virtualization (SR-IOV) shared mode, a problem was
fixed for the hypervisor SR-IOV adjunct partition failing during the
IPL with SRCs B200F011 and B2009014 logged. The SR-IOV adjunct
partition successfully recovers after it reboots and the system is
operational.
- On systems using PowerVM firmware with PCIe adapters in
Single Root I/O Virtualization (SR-IOV) shared-mode in a PCIe slot with
Enlarged IO Capacity and 2TB or more of system memory, a problem was
fixed for the hypervisor SR-IOV adjunct partition failing during
the IPL with SRCs B200F011 and B2009014 logged. In this
configuration, it is possible the SR-IOV adapter will not become
functional following a system reboot or when an adapter is first
configured into shared-mode. Larger system memory configurations
of 2TB or more than 1TB are more likely to encounter the problem.
The problem can be avoided by reducing the number of PCIe slots with
Enlarged IO Capacity enabled so it does not include adapters in SR-IOV
shared-mode. Another circumvention option is to move the adapter
to an SR-IOV capable PCIe slot where Enlarged IO Capacity is not
enabled.
- On a system using PowerVM firmware and VIOS, a
problem was fixed for a Live Partition Mobility (LPM) migration for an
Active Memory Sharing (AMS) partition that hangs if there is a VIOS
failover during the migration.
- On systems using PowerVM firmware, a problem was fixed for
the PCIe3 Optical Cable Adapter for the PCIe3 Expansion Drawer failing
with SRC B7006A84 error logged during the IPL. The failed cable
adapter can be recovered by using a concurrent repair operation to
power it off and on. Or the system can be re-IPLed to
recover the cable adapter. The affected optical cable adapters
have feature codes #EJ05, #EJ06, and #EJ08 with CCINs 2B1C, 6B52, and
2CE2, respectively.
- On systems using PowerVM firmware, the hypervisor "vsp"
macro was enhanced to show the type of the adjunct partition. The
"vsp -longname" macro option was also updated to list the location
codes for the SR-IOV adjunct partitions. The hypervisor macros
are used by IBM support to help debug Power system problems.
- On systems using PowerVM firmware, a problem was fixed for
PCIe Host Bridge (PHB) outages and PCIe adapter failures in the PCIe
I/O expansion drawer caused by error thresholds being exceeded for the
LEM bit [21] errors in the FIR accumulator. These are typically
minor and expected errors in the PHB that occur during adapter updates
and do not warrant a reset of the PHB and the PCIe adapter
failures. Therefore, the threshold LEM[21] error limit has been
increased and the LEM fatal error has been changed to a Predictive
Error to avoid the outages for this condition.
- On systems using PowerVM firmware, a problem was fixed for
PCIe3 I/O expansion drawer (#EMX0) link improved stability. The
settings for the continuous time linear equalizers (CTLE) was updated
for all the PCIe adapters for the PCIe links to the expansion
drawer. The CEC must be re-IPLed for the fix to activate.
- On systems using PowerVM firmware with IBM i partitions, a
problem was fixed for frequent logging of informational B7005120 errors
due to communications path closed conditions during messaging from HMCs
to IBMi partitions. In the majority of cases these errors are due
to normal operating conditions and not due to errors that require
service or attention. The logging of informational errors due to
this specific communications path closed condition that are the result
of normal operating conditions has been removed.
- On a system using PowerVM firmware with an IBM i
partition, a problem was fixed for a D-mode boot failure for IBM
i from an USB RDX cartridge. There is a hang at the LPAR
progress code C2004130 for a period of time and then a failure with SRC
B2004158 logged. There is a USB External Dock (FC #EU04) and
Removable Disk Cartridge (RDX) 63B8-005 attached. The error is
intermittent so the RDX can be powered off and back on to retry the
D-mode boot to recover.
- On systems using the OPAL firmware, Petitboot was updated
to v1.2.7. It is is now less verbose during boot - only
error-level messages are printed during Petitboot bootloader
initialization. This means that there will be fewer messages
printed as the system boots. Additionally, the Petitboot user interface
is started earlier in the boot process. This means that the user will
be presented with the user interface sooner, but it may still take
time, potentially up to 30 seconds, for the user interface to be
populated with boot options as storage and network hardware is being
initialized. During this time, Petitboot will show the status
message "Info: Waiting for device discovery". When Petitboot
device discovery is completed, the following status message will be
shown "Info: Connected to pb-discover!".
- On systems using PowerVM firmware, the following
problems were fixed for SR-IOV adapters:
1) Insufficient resources reported for SR-IOV logical port configured
with promiscuous mode enable and a Port VLAN ID (PVID) when creating
new interface on the SR-IOV adapters.
2) Spontaneous dumps and reboot of the adjunct partition for SR-IOV
adapters.
3) Adapter enters firmware loop when single bit ECC error is
detected. System firmware detects this condition as a adapter
command time out. System firmware will reset and restart the
adapter to recover the adapter functionality. This condition will
be reported as a temporary adapter hardware failure.
4) vNIC interfaces not being deleted correctly causing SRC
B400FF01 to be logged and Data Storage Interrupt (DSI) errors with
failiure on boot of the LPAR.
This set of fixes updates adapter firmware to 10.2.252.1926, for the
following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M,
EN0N, EN0K, EN0L, EL38 , EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using PowerVM firmware with an IBM i partition,
a problem was fixed for incorrect maximum performance reports based on
the wrong number of "maximum" processors for the system.
Certain performance reports that can be generated on IBMi systems
contain not only the existing machine information, but also "what-if"
information, such as "how would this system perform if it had all the
processors possible installed in this system". This "what-if"
report was in error because the maximum number of processors possible
was too high for the system.
- On systems using PowerVM firmware, a problem was fixed for
degraded PCIe3 links for the PCIe3 expansion drawer with SRC B7006A8F
not being visible on the HMC. This occurred because the SRC was
informational. The problem occurs when the link attaching a
drawer to the system trains to x8 instead of x16. With the fix,
the SRC has been changed to a B70006A8B permanent error for the
degraded link.
- On systems using PowerVM firmware, a problem was fixed for
a concurrent exchange of a CAPI adapter that left the new adapter in a
deactivated state. The system can be powered off and IPLed
again to recover the new adapter. The CAPI adapters have the
following feature codes: #EC3E, #EC3F, #EC3L, #EC3M, #EC3T,
#EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B.
- On a system using PowerVM firmware with SR-IOV
adapters, a problem was fixed for a DLPAR remove on a Virtual
Function (VF) of a ConnectX-4 (CX4) adapter that failed with AIX error
"0931-013 Unable to isolate the resource". The HMC reported error
is "HSCL12B5 The operation to remove SR-IOV logical port xx
failed because of the following error: HSCL131D The SR-IOV logical port
is still in use by the partition". The failing PCIe3 adapters are
sourced from Mellanox Corporation based on ConnectX-4 technology and
have the following feature codes and CCINs: #EC3E, #EC3F with
CCIN 2CEA; #EC3L and #EC3M with CCIN 2CEC; and #EC3T and #ECTU with
CCIN 2CEB. The issue occurs each time a DLPAR remove operation is
attempted on the VF. Restarting the partition after a failed
DLPAR remove recovers from the error.
- A problem was fixed for the serial port being disabled on
the service processor for the IBM Power System E850
(8408-44E). There is no response when plugging the serial port.
- On systems using PowerVM firmware, a problem was fixed for
NVRAM corruption that can occur when deleting a partition that owns a
CAPI adapter, if that CAPI adapter is not assigned to another partition
before the system is powered off. On a subsequent IPL, the system
will come up in recovery mode if there is NVRAM corruption. To
recover, the partitions must be restored from the HMC. The
frequency of this error is expected to be rare. The CAPI adapters
have the following feature codes: #EC3E, #EC3F, #EC3L, #EC3M,
#EC3T, #EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B.
- On systems using PowerVM firmware, a problem was fixed for
NVRAM corruption and a HMC recovery state when using Simplified Remote
Restart partitions. The failing systems will have at least one
Remote Restart partition and on the failed IPL there will be a
B70005301 SRC with word 7 being 0X00000002.
- On systems using PowerVM firmware, a problem was fixed for
a group of shared processor partitions being able to exceed the
designated capacity placed on a shared processor pool. This error
can be triggered by using the DLPAR move function for the shared
processor partitions, if the pool has already reached its maximum
specified capacity. To prevent this problem from occurring when
making DLPAR changes when the pool is at the maximum capacity, do not
use the DLPAR move operation but instead break it into two steps:
DLPAR remove followed by DLPAR add. This gives enough time for
the DLPAR remove to be fully completed prior to starting the DLPAR add
request.
- On systems using PowerVM firmware, a problem was fixed for
partition boot failures and run time DLPAR failures when adding I/O
that log BA210000, BA210003, and/or BA210005 errors. The fix also
applies to run time failures configuring an I/O adapter following an
EEH recovery that log BA188001 events. The problem can impact
IBMi partitions running in any processor mode or AIX/Linux partitions
running in P7 (or older) processor compatibility modes. The
problem is most likely to occur when the system is configured in the
Manufacturing Default Configuration (MDC) mode. The trigger for
the problem is a race-condition between the hypervisor and the physical
operations panel with a very rare frequency of occurrence.
|
SV860_070_056 / FW860.12
01/13/17 |
Impact: Availability
Severity: SPE
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
|
SV860_063_056 / FW860.11
12/05/16 |
Impact: Availability
Severity: SPE
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that
affect certain systems
- DEFERRED: A problem
was fixed for a Field Core Override (FCO) error
that causes a processor chip without functional cores to be guarded
with a SRC B111BA24 error logged and by guard association causes all
the memory and I/O resources behind the processor chip to be lost for
the current IPL. This problem is triggered by a system
being manufactured with one or more feature codes of #2319
(Factory Deconfiguration of 1-core) to assist with optimization of
software licensing. For more information on Field Core Override,
refer to IBM Knowledge Center: http://www.ibm.com/support/knowledgecenter/POWER8/p8hby/fieldcore.htm.
The error only occurs in systems where the total number of active cores
is less than the number of processor chips. When the fix is
applied on a system that has lost memory or I/O resources due to the
errant processor guard, the system must be re-IPLed with the guard
removed from the processor to recover the resources.
Without the fix, the problem may be circumvented by the following four
steps:
1) Power off the system.
2) Use the Field Core Override function to increase the number of
active processor cores in the system. The Advanced System
Management Interface (ASMI) "System Configuration -> Hardware
Deconfiguration -> Field Core Override" panel shows the number of
cores that are active in the system and it can be used to increase the
number of active processor cores in the system.
3) Unguard the failed processor. Use the ASMI "System
Configuration -> Hardware Deconfiguration -> Clear All
Deconfiguration Errors" panel to restore the guarded processor.
4) IPL with the increased number of active processor cores and the
unguarded processor.
This problem does not pertain to the IBM Power System E850 (8408-44E)
model.
|
SV860_056_056 / FW860.10
11/18/16 |
Impact:
New
Severity: New
Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that affect certain systems
- DISRUPTIVE:
On systems
using the PowerVM firmware, a problem was fixed for an "Incomplete"
state caused by initiating a resource dump with selector macros from
NovaLink (vio -dump -lp 1 -fr). The failure causes a
communication
process stack frame, HVHMCCMDRTRTASK, size to be exceeded with a
hypervisor page fault that disrupts the NovalLink and/or HMC
communications. The recovery action is to re-IPL the CEC but that will
need to be done without the assistance of the management console.
For
each partition that has a OS running on the system, shut down each
partition from the OS. Then from the Advanced System Management
Interface (ASMI), power off the managed system.
Alternatively, the
system power button may also be used to do the power off. If the
management console Incomplete state persists after the power off, the
managed system should be rebuilt from the management console. For
more
information on management console recovery steps, refer to this IBM
Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm.
The fix is disruptive because the size of the PowerVM hypervisor must
be increased to accommodate the over-sized stack frame of the failing
task.
- DEFERRED: On
systems using
the PowerVM firmware, a problem was fixed for a CAPI function
unavailable condition on a system with the maximum number of CAPI
adapters and partitions. Not enough bytes were allocated for CAPI
for
the maximum configuration case. The problem may be circumvented
by
reducing the number of active partitions or CAPI adapters.
The fix is
deferred because the size of the hypervisor must be increased to
provide the additional CAPI space.
- DEFERRED:
On systems using PowerVM
firmware, a problem was fixed for cable card capable PCI slots that
fail during the IPL. Hypervisor I/O Bus Interface UE B7006A84 is
reported for each cable card capable PCI slot that doesn't
contain a
PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code
#EJ05). PCI slots containing a cable card will not report an
error but
will not be functional. The problem can be resolved by performing
an
AC cycle of the system. The trigger for the failure is the I2C
devices
used to detect the cable cards are not coming out of the power on reset
process in the correct state due to a race condition.
|
SV860_039_039 / FW860.00
11/02/16 |
Impact:
New
Severity:
New
Power System E850C (8408-44E) servers only.
|