SC860
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The complete Firmware Fix History for this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
|
SC860_082_056 / FW860.20
03/17/17 |
Impact: Availability
Severity: SPE
New features and functions
- Support for the Redfish API for provisioning of Power
Management tunable (EnergyScale) parameters. The Redfish Scalable
Platforms Management API ("Redfish") is a DMTF specification that uses
RESTful interface semantics to perform out-of-band systems
management. (http://www.dmtf.org/standards/redfish).
Redfish service enables platform management tasks to be controlled by
client scripts developed using secure and modern programming paradigms.
For systems with redundant service processors, the Redfish service is
accessible only on the primary service processor. Usage
information for the Redfish service is available at the following
IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hdx/p8_workingwithconsoles.htm.
The IBM Power server supports DMTF Redfish API (DSP0266, version 1.0.3
published 2016-06-17) for systems management.
A copy of the the Redfish schema files in JSON format published by the
DMTF (http://redfish.dmtf.org/schemas/v1/)
are packaged in the firmware image.
The schema files are distributed on chip to enable proper functioning
in deployments with no WAN connectivity.
IBM extensions to the Redfish schema are published at http://public.dhe.ibm.com/systems/power/redfish/schemas/v1.
Copyright notices for the DMTF Redfish API and schemas are at: (a) http://www.dmtf.org/about/policies/copyright,
and (b) http://redfish.dmtf.org/schemas/README8010.html.
- Support added to reduce memory usage for shared SR-IOV
adapters.
- Support for the Advanced System Management Interface (ASMI)
was changed to allow the special characters of "I", "O", and "Q" to be
entered for the serial number of the I/O Enclosure under the Configure
I/O Enclosure option. These characters have only been found in an
IBM serial number rarely, so typing in these characters will normally
be an incorrect action. However, the special character entry is
not blocked by ASMI anymore so it is able to support the exception
case. Without the enhancement, the typing of one of the special
characters causes message "Invalid serial number" to be displayed.
- Support was added to the Advanced System Management
Interface (ASMI) "System Service Aids => Cable Validation" to add a
timestamp for when the last time the cables were validated.
System firmware changes that affect all systems
- A problem was fixed
for the setting the disable of a periodic notification for a call home
error log SRC B150F138 for Memory Buffer resources (membuf) from the
Advanced System Management Interface (ASMI).
- A problem was fixed for the call home data for the B1xx2A01
SRC to include the min/max/average readings for more values. The
values for processor utilization, memory utilization, and node power
usage were added.
- A problem was fixed for incorrect callouts of the Power
Management Controller (PMC) hardware with SRC B1112AC4 and SRC
B1112AB2 logged. These extra callouts occur when the On-Chip
Controller (OCC) has placed the system in the safe state for a prior
failure that is the real problem that needs to be resolved.
- A problem was fixed for System Vital Product Data (SVPD)
FRUs being guarded but not having a corresponding error log
entry. This is a failure to commit the error log entry that has
occurred only rarely.
- A problem was fixed for the failover to the backup PNOR on
a Hostboot Self Boot Engine (SBE) failure. Without the fix, the
failed SBE causes loss of processors and memory with B15050AD
logged. With the fix, the SBE is able to access the backup PNOR
and IPL successfully by deconfiguring the failing PNOR and calling it
out as a failed FRU.
- A problem was fixed for the Advanced System Management
Interface (ASMI) "System Service Aids => Error/Event Logs" panel not
showing the "Clear" and "Show" log options and also having a truncated
error log when there are a large number of error logs on the system.
- A problem was fixed a system going into safe mode with SRC
B1502616 logged as informational without a call home
notification. Notification is needed because the system is
running with reduced performance. If there are unrecoverable
error logs and any are marked with reduced performance and the system
has not been rebooted, then the system is probably running in safe mode
with reduced performance. With the fix, the SRC B1502616 is a
Unrecoverable Error (UE).
- A problem was fixed for valid IPv4 static IP addresses not
being allowed to communicate on the network and not being allowed to be
configured.
The Advanced System Management Interface (ASMI) static IPv4
address configuration was not allowing "255" in the IP address
subfields. The corrected range checking is as follows:
Allowed values: x.255.x.x, x.x.255.x, x.255.255.x
Disallowed values: x.x.x.255
The failure for the communication on the network is seen if the
problematic IP addresses are in use prior to a firmware update to
860.00, 860.10, 860.11, or 860.12. After the firmware update, the
service processor is unable to communicate on the network. The
problem can be circumvented by changing the service processor to use
DHCP addressing, or by moving the IP address to a different static IP
range, prior to doing the firmware update.
- A problem was fixed for corrupt service processor error log
entries caused by incorrect error log synchronization between primary
and backup service processor during firmware updates. At
the time of the corruption an B1818601 is logged with a fipsdump
generated. Then during normal operations,
periodic B1818A12 SRC may be logged as the corrupted error log
entries are encountered. No service action is needed for the
corrupted error logs as the old corrupted entries will be deleted as
new error logs are added as part of the error log housekeeping.
- A problem was fixed for an unneeded service action request
for a informational VRM redundant phase fail error logged with SRC
11002701. If reminders for service action with SRC B150F138
are occurring for this problem, then firmware containing the fix needs
to be installed and ASMI error logs need to be cleared in order to stop
the periodic reminder.
System firmware changes that affect certain systems
- On systems using
PowerVM firmware, a problem was fixed for a blank SRC in the LPA
dump for user-initiated non-disruptive adjunct dumps. The
A2D03004 SRC is needed for problem determination and dump analysis.
- On a system using PowerVM firmware with an IBM i partition
and VIOS, a problem was fixed for a Live Partition Mobility
migration for a IBM i partition that fails if there is a VIOS failover
during the migration suspended window.
- On a system using PowerVM firmware and VIOS, a
problem was fixed for a HMC "Incomplete State" after a Live Partition
Mobility migration followed by a VIOS failover. The error is
triggered by a delete operation on a migration adapter on the VIOS that
did the failover. The HMC "Incomplete State" can be recovered
from by doing a re-IPL of the system. This error can also prevent
a VIOS from activating.
- On systems using PowerVM firmware, a problem was fixed with
SR-IOV adapter error recovery where the adapter is left in a failed
state in nested error cases for some adapter errors. The
probability of this occurring is very low since the problem trigger is
multiple low-level adapter failures. With the fix, the adapter is
recovered and returned to an operational state.
- On systems using PowerVM firmware with PCIe adapters
in Single Root I/O Virtualization (SR-IOV) shared mode, a problem was
fixed for the hypervisor SR-IOV adjunct partition failing during the
IPL with SRCs B200F011 and B2009014 logged. The SR-IOV adjunct
partition successfully recovers after it reboots and the system is
operational.
- On systems using PowerVM firmware with PCIe adapters in
Single Root I/O Virtualization (SR-IOV) shared-mode in a PCIe slot with
Enlarged IO Capacity and 2TB or more of system memory, a problem was
fixed for the hypervisor SR-IOV adjunct partition failing during
the IPL with SRCs B200F011 and B2009014 logged. In this
configuration, it is possible the SR-IOV adapter will not become
functional following a system reboot or when an adapter is first
configured into shared-mode. Larger system memory configurations
of 2TB or more than 1TB are more likely to encounter the problem.
The problem can be avoided by reducing the number of PCIe slots with
Enlarged IO Capacity enabled so it does not include adapters in SR-IOV
shared-mode. Another circumvention option is to move the adapter
to an SR-IOV capable PCIe slot where Enlarged IO Capacity is not
enabled.
- On a system using PowerVM firmware and VIOS, a
problem was fixed for a Live Partition Mobility (LPM) migration for an
Active Memory Sharing (AMS) partition that hangs if there is a VIOS
failover during the migration.
- On systems using PowerVM firmware, a problem was fixed for
the PCIe3 Optical Cable Adapter for the PCIe3 Expansion Drawer failing
with SRC B7006A84 error logged during the IPL. The failed cable
adapter can be recovered by using a concurrent repair operation to
power it off and on. Or the system can be re-IPLed to
recover the cable adapter. The affected optical cable adapters
have feature codes #EJ05, #EJ06, and #EJ08 with CCINs 2B1C, 6B52, and
2CE2, respectively.
- On systems using PowerVM firmware, the hypervisor "vsp"
macro was enhanced to show the type of the adjunct partition. The
"vsp -longname" macro option was also updated to list the location
codes for the SR-IOV adjunct partitions. The hypervisor macros
are used by IBM support to help debug Power system problems.
- On systems using PowerVM firmware, a problem was fixed for
PCIe Host Bridge (PHB) outages and PCIe adapter failures in the PCIe
I/O expansion drawer caused by error thresholds being exceeded for the
LEM bit [21] errors in the FIR accumulator. These are typically
minor and expected errors in the PHB that occur during adapter updates
and do not warrant a reset of the PHB and the PCIe adapter
failures. Therefore, the threshold LEM[21] error limit has been
increased and the LEM fatal error has been changed to a Predictive
Error to avoid the outages for this condition.
- On systems using PowerVM firmware, a problem was fixed for
PCIe3 I/O expansion drawer (#EMX0) link improved stability. The
settings for the continuous time linear equalizers (CTLE) was updated
for all the PCIe adapters for the PCIe links to the expansion drawer.
The CEC must be re-IPLed for the fix to activate.
- On systems using PowerVM firmware with IBM i partitions, a
problem was fixed for frequent logging of informational B7005120 errors
due to communications path closed conditions during messaging from HMCs
to IBMi partitions. In the majority of cases these errors are due
to normal operating conditions and not due to errors that require
service or attention. The logging of informational errors due to
this specific communications path closed condition that are the result
of normal operating conditions has been removed.
- On a system using PowerVM firmware with an IBM i
partition, a problem was fixed for a D-mode boot failure for IBM
i from an USB RDX cartridge. There is a hang at the LPAR
progress code C2004130 for a period of time and then a failure with SRC
B2004158 logged. There is a USB External Dock (FC #EU04) and
Removable Disk Cartridge (RDX) 63B8-005 attached. The error is
intermittent so the RDX can be powered off and back on to retry the
D-mode boot to recover.
- On systems using PowerVM firmware, the following
problems were fixed for SR-IOV adapters:
1) Insufficient resources reported for SR-IOV logical port configured
with promiscuous mode enable and a Port VLAN ID (PVID) when creating
new interface on the SR-IOV adapters.
2) Spontaneous dumps and reboot of the adjunct partition for SR-IOV
adapters.
3) Adapter enters firmware loop when single bit ECC error is
detected. System firmware detects this condition as a adapter
command time out. System firmware will reset and restart the
adapter to recover the adapter functionality. This condition will
be reported as a temporary adapter hardware failure.
4) vNIC interfaces not being deleted correctly causing SRC
B400FF01 to be logged and Data Storage Interrupt (DSI) errors with
failiure on boot of the LPAR.
This set of fixes updates adapter firmware to 10.2.252.1926, for the
following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M,
EN0N, EN0K, EN0L, EL38 , EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using PowerVM firmware with an IBM i partition,
a problem was fixed for incorrect maximum performance reports based on
the wrong number of "maximum" processors for the system.
Certain performance reports that can be generated on IBMi systems
contain not only the existing machine information, but also "what-if"
information, such as "how would this system perform if it had all the
processors possible installed in this system". This "what-if"
report was in error because the maximum number of processors possible
was too high for the system.
- On systems using PowerVM firmware, a problem was fixed for
degraded PCIe3 links for the PCIe3 expansion drawer with SRC B7006A8F
not being visible on the HMC. This occurred because the SRC was
informational. The problem occurs when the link attaching a
drawer to the system trains to x8 instead of x16. With the fix,
the SRC has been changed to a B70006A8B permanent error for the
degraded link.
- On systems using PowerVM firmware, a problem was fixed for
a concurrent exchange of a CAPI adapter that left the new adapter in a
deactivated state. The system can be powered off and IPLed
again to recover the new adapter. The CAPI adapters have the
following feature codes: #EC3E, #EC3F, #EC3L, #EC3M, #EC3T,
#EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B.
- On a system using PowerVM firmware with SR-IOV
adapters, a problem was fixed for a DLPAR remove on a Virtual
Function (VF) of a ConnectX-4 (CX4) adapter that failed with AIX error
"0931-013 Unable to isolate the resource". The HMC reported error
is "HSCL12B5 The operation to remove SR-IOV logical port xx
failed because of the following error: HSCL131D The SR-IOV logical port
is still in use by the partition". The failing PCIe3 adapters are
sourced from Mellanox Corporation based on ConnectX-4 technology and
have the following feature codes and CCINs: #EC3E, #EC3F with
CCIN 2CEA; #EC3L and #EC3M with CCIN 2CEC; and #EC3T and #ECTU with
CCIN 2CEB. The issue occurs each time a DLPAR remove operation is
attempted on the VF. Restarting the partition after a failed
DLPAR remove recovers from the error.
- On systems using PowerVM firmware, a problem was fixed for
NVRAM corruption that can occur when deleting a partition that owns a
CAPI adapter, if that CAPI adapter is not assigned to another partition
before the system is powered off. On a subsequent IPL, the system
will come up in recovery mode if there is NVRAM corruption. To
recover, the partitions must be restored from the HMC. The
frequency of this error is expected to be rare. The CAPI adapters
have the following feature codes: #EC3E, #EC3F, #EC3L, #EC3M,
#EC3T, #EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B.
- On systems using PowerVM firmware, a problem was fixed for
NVRAM corruption and a HMC recovery state when using Simplified Remote
Restart partitions. The failing systems will have at least one
Remote Restart partition and on the failed IPL there will be a
B70005301 SRC with word 7 being 0X00000002.
- On systems using PowerVM firmware, a problem was fixed for
a group of shared processor partitions being able to exceed the
designated capacity placed on a shared processor pool. This error
can be triggered by using the DLPAR move function for the shared
processor partitions, if the pool has already reached its maximum
specified capacity. To prevent this problem from occurring when
making DLPAR changes when the pool is at the maximum capacity, do not
use the DLPAR move operation but instead break it into two steps:
DLPAR remove followed by DLPAR add. This gives enough time for
the DLPAR remove to be fully completed prior to starting the DLPAR add
request.
- On systems using PowerVM firmware, a problem was fixed for
partition boot failures and run time DLPAR failures when adding I/O
that log BA210000, BA210003, and/or BA210005 errors. The fix also
applies to run time failures configuring an I/O adapter following an
EEH recovery that log BA188001 events. The problem can impact
IBMi partitions running in any processor mode or AIX/Linux partitions
running in P7 (or older) processor compatibility modes. The
problem is most likely to occur when the system is configured in the
Manufacturing Default Configuration (MDC) mode. The trigger for
the problem is a race-condition between the hypervisor and the physical
operations panel with a very rare frequency of occurrence.
|
SC860_070_056 / FW860.12
01/13/17 |
Impact: Availability
Severity: SPE
System firmware changes that
affect certain systems
- On a system using
PowerVM firmware, a problem was fixed for the System Management
Services (SMS) SAS utility showing very large (incorrect) disk capacity
values depending on the size of the disk or Volume Set/Array. The
problem occurs when the number of blocks on a disk is 2 G or more.
- On a system using PowerVM firmware running a Linux
OS, a problem was fixed for support for Coherent Accelerator
Processor Interface (CAPI) adapters. The CAPI related RTAS
h-calls for the CAPI devices could not be made by the Linux OS,
impacting the CAPI adapter functionality and usability. This
problem involves the following adapters: the PCIe3 LP CAPI
Accelerator Adapter with F/C #EJ16 that is used on the S812L(8247-21L)
and S822L (8247-22L) models; the PCIe3 CAPI FlashSystem
Acclerator Adapter with F/C #EJ17 that is used on the
S814(8286-41A) and S824(8286-42A) models; and the PCIe3 CAPI
FlashSystem Accelerator Adapter with F/C #EJ18 that is used on the
S822(8284-22A), E870(9119-MME), and E880(9119-MHE) models. This
problem does not pertain to PowerVM AIX partitions using CAPI adapters.
- On a system using PowerVM firmware, a problem was fixed for
Live Partition Mobility (LPM) migrations to FW860.10 or FW860.11 from
any other level of firmware (i.e. not FW 860.10 or FW860.11) that
caused errors in the output of the AIX "lsattr -El mem0" command and
Dynamic LPAR (DLPAR) operations. The "lsattr" command will report
the partition only has one logical memory block (LMB) of memory
assigned to it, even though there is more memory assigned to the
partition. Also, as a result of this problem, DLPAR operations
will fail with an error indicating the request could not be
completed. This issue affects AIX 5.3, AIX 6.1, AIX 7.1, AIX 7.2
TL 0, and may result in AIX DLPAR error message "0931-032 Firmware
failure. Data may be out of sync and the system may require
a reboot." This issue also affect all levels of Linux. Not
affected by this issue are AIX 7.2 TL 1, VIOS and IBM i
partitions.
In addition, after performing LPM from FW860 to earlier versions of
firmware, the DLPAR of Virtual Adapters will fail with HMC error
message HSCL294C, which contains text similar to the following:
"0931-007 You have specified an invalid drc_name."
Without the fix, a reboot of the migrated partition will correct the
problem.
- On a system using PowerVM firmware, a problem was fixed for
I/O DLPARs that result in partition hangs. To trigger the
problem, the DLPAR operation must be performed on a partition which has
been migrated via a Live Partition Mobility (LPM) operation from a P6
or P7 system to a P8 system. Additionally, DLPAR of I/O will fail
when performed on a partition which has been migrated via an LPM
operation from a P8 system to a P6 or P7 system. The failure will
produce HMC error message HSCL2928, which contains text similar to the
following: "0931-011 Unable to allocate the resource to the
partition." DLPAR operations for memory or CPU are not affected.
This issue affects all Linux and AIX partitions. IBMi partitions
are not affected.
|
SC860_063_056 / FW860.11
12/05/16 |
Impact:
N/A
Severity: N/A
- This Service Pack contained updates for MANUFACTURING
ONLY.
|
SC860_056_056 / FW860.10
11/18/16 |
Impact:
New
Severity: New
New features and functions
- Support enabled for Live Partition Mobility (LPM)
operations.
- Support enabled for partition Suspend and Resume from the
HMC.
- Support enabled for partition Remote Restart.
- Support enabled for PowerVM vNIC. PowerVM vNIC combined
many of the best features of SR-IOV and PowerVM SEA to provide a
network solution with options for advanced functions such as Live
Partition Mobility along with better performance and I/O efficiency
when compared to PowerVM SEA. In addition PowerVM vNIC provided
users with bandwidth control (QoS) capability by leveraging SR-IOV
logical ports as the physical interface to the network.
- Support for dynamic setting of the Simplified Remote
Restart VM property, which enables this property to be turned on or off
dynamically with the partition running.
- Support for PowerVM and HMC to get and set the boot
list of a partition.
- Support for PowerVM partition restart in a Disaster
Recovery (DR) environment.
- Support on PowerVM for a partition with 32 TB
memory. AIX, IBM i and Linux are supported but IBM i must be IBM
i 7.3. TR1 IBM i 7.2 has a limit of 16 TB per partition and IBM i
7.1 has a limit of 8 TB per partition. AIX level must be 7.1S or
later. Linux distributions supported are RHEL 7.2 P8, SLES
12 SP1, Ubuntu 16.04 LTS, RHEL 7.3 P8, SLES 12 SP2, Ubuntu
16.04.1, and SLES 11 SP4 for SAP HANA.
- Support for PowerVM and PowerNV (non-virtualized or OPAL
bare-metal) booting from a PCIe Non-Volatile Memory express (NVMe)
flash adapter. The adapters include feature codes #EC54 and #EC55
- 1.6 TB, and #EC56 and #EC57 - 3.2 TB NVMe flash adapters
with CCIN 58CB and 58CC respectively.
- Support for PowerVM NovaLink V1.0.0.4 which includes the
following features:
- IBM i network boot
- Live Partition Mobility (LPM) support for inactive source VIOS
- Support for SR-IOV configurations, vNIC, and vNIC failover
- Partition support for Red Hat Enterprise Linux
- Support for a decrease in the amount of PowerVM memory
needed to support Huge Dynamic DMA Window (HDDW) for a PCI slot by
using 64K pages instead of 4K pages. The hypervisor only
allocates enough storage for the Enlarged IO Capacity (Huge Dynamic DMA
Window) capable slots to map every page in main storage with 64K pages
rather than 4K pages as was done previously. This affects only
the Linux OS as AIX and IBM i do not use HDDW.
- Support added to reduce the number of error logs and
call homes for the non-critical FRUs for the power and thermal faults
of the system.
- Support for redundancy in the the transfer of partition
state for Live Partition Mobility (LPM) migration operations.
Redundant VIOS Mover Service Partitons (MSPs) can be defined along with
redundant network paths at the VIOS/MSP level. When redundant MSP
pairs are used, the migrating memory pages of the logical partition are
transferred from the source system to the target system by using two
MSP pairs simultaneously. If one of the MSP pair fails, the migration
operation continues by using the other MSP pair. In some scenarios,
where a common shared Ethernet adapter is not used, use redundant MSP
pairs to improve performance and reliability.
Note: For a LPM migration for a partition using Advanced Memory
Sharing (AMS) in a dual (redundant) MSP configuration the LPM operation
may hang if the MSP connection fails during the LPM migration. To avoid
this issue that applies only to AMS partitions, the AMS
migrations should only be done from the HMC command line using the
migrlpar command and specifying --redundentmsp 0 to disable the
redundant MSPs.
Note: To use redundant MSP pairs, all VIOS MSPs must be at version
2.2.5.00 or later, the HMC at version 8.6.0 or later, and the firmware
level FW860 or later.
For more information on LPM and VIOS supported levels and restrictions,
refer to the following links on the IBM Knowledge Center:
http://www.ibm.com/support/knowledgecenter/PurePower/p8hc3/p8hc3_firmwaresupportmatrix.htm
https://www.ibm.com/support/knowledgecenter/HW4L4/p8eeo/p8eeo_ipeeo_main.htm
- Support for failover capability for vNIC client adapters in
the PowerVM hypervisor, rather than requiring the failover
configuration to be done in the client OS. To create a redundant
connection, the HMC adds another vNIC server with the same remote lpar
ID and remote DRC as the first, giving each server its own priority.
- Support for SAP HANA with Solution edition with feature
code #EPVR on 3.65 GHZ processors and 12-core activations and 512 GB
memory activations on SUSE Linux.. SAP HANA is an in-memory
platform for processing high volumes of data in real-time. HANA allows
data analysts to query large volumes of data in real-time. HANA's
in-memory database infrastructure frees analysts from having to load or
write-back data.
- Support for the Hardware Management Console (HMC) to
access the service processor IPMI credentials and to retrieve
Performance and Capacity Monitor (PCM) data for viewing in a tabular
format or for exporting as CSV values. The enhanced HMC interface can
now start and stop VIOS Shared Storage Pool (SSP) monitoring from the
HMC and start and stop SSP historical data aggregation.
- Support for the Advanced System Management Interface (ASMI)
was changed to not create VPD deconfiguration records and call home
alerts for hardware FRUs that have one VPD chip of a redundant pair
broken or inaccessible. The backup VPD chip for the FRU allows
continued use of the hardware resource. The notification of the
need for service for the FRU VPD is not provided until both of the
redundant VPD chips have failed for a FRU.
System firmware changes that affect all systems
- A problem was fixed
for a failed IPL with SRC UE BC8A090F that does not have a hardware
callout or a guard of the failing hardware. The system may be
recovered by guarding out the processor associated with the error and
re-IPLing the system. With the fix, the bad processor core is
guarded and the system is able to IPL.
- A problem was fixed for an infrequent service processor
failover hang that results in a reset of the backup service processor
that is trying to become the new primary. This error occurs more
often on a failover to a backup service processor that has been in that
role for a long period of time (many months). This error can
cause a concurrent firmware update to fail. To reduce the chance
of a firmware update failure because of a bad failover, an
Administrative Failover (AFO) can be requested from the HMC prior to
the start of the firmware update. When the AFO has completed, the
firmware update can be started as normally done.
- A problem was fixed for an Operations Panel Function 04
(Lamp test) during an IPL causing the IPL to fail. With the fix,
the lamp test request is rejected during the IPL until the hypervisor
is available. The lamp test can be requested without problems
anytime after the system is powered on to hypervisor ready or an OS is
running in a partition.
- A problem was fixed for On-Chip Controller (OCC) errors
that had excessive callouts for processor FRUs. Many of the OCC
errors are recoverable and do not required that the processor be called
out and guarded. With the fix, the processors will only be called
out for OCC errors if there are three or more OCC failures during a
time period of a week.
- A problem was fixed for the loss of the setting for the
disable of a periodic notification for a call home error log after a
failover to the backup service processor on a redundant service
processor system. The call home for the presence of a failed
resource can get re-enabled (if manually disabled in ASMI on the
primary service processor) after a concurrent firmware update or any
scenario that causes the service processor to fail over and change
roles. With the fix, the periodic notification flag is
synchronized between the service processors when the flag value is
changed.
- A problem was fixed for the On-Chip Controller (OCC)
incorrectly calling out processors with SRC B1112A16 for L4 Cache DIMM
failures with SRC B124E504. This false error logging can occur if
the DIMM slot that is failing is adjacent to two unoccupied DIMM slots.
- A problem was fixed for CEC drawer deconfiguration during a
IPL due to SRCs BC8A0307 and BC8A1701 that did not have the correct
hardware callout for the failing SCM. With the fix, the failing
SCM is called out and guarded so the CEC drawer will IPL even though
there is a failed processor.
- A problem was fixed for device time outs during a IPL
logged with a SRC B18138B4. This error is intermittent and no
action is needed for the error log. The service processor
hardware server has allotted more time of the device transactions to
allow the transactions to complete without a time-out error.
System firmware changes that affect certain systems
- DISRUPTIVE:
On systems using the PowerVM
firmware, a problem was fixed for an "Incomplete" state caused by
initiating a resource dump with selector macros from NovaLink (vio
-dump -lp 1 -fr). The failure causes a communication
process stack
frame, HVHMCCMDRTRTASK, size to be exceeded with a hypervisor page
fault that disrupts the NovalLink and/or HMC communications. The
recovery action is to re-IPL the CEC but that will need to be done
without the assistance of the management console. For each
partition
that has a OS running on the system, shut down each partition from the
OS. Then from the Advanced System Management Interface
(ASMI), power
off the managed system. Alternatively, the system power button
may
also be used to do the power off. If the management console
Incomplete
state persists after the power off, the managed system should be
rebuilt from the management console. For more information on
management console recovery steps, refer to this IBM Knowledge Center
link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm.
The fix is disruptive because the size of the PowerVM hypervisor must
be increased to accommodate the over-sized stack frame of the failing
task.
- DEFERRED: On
systems using the PowerVM
firmware, a problem was fixed for a CAPI function unavailable condition
on a system with the maximum number of CAPI adapters and
partitions.
Not enough bytes were allocated for CAPI for the maximum configuration
case. The problem may be circumvented by reducing the number of
active
partitions or CAPI adapters. The fix is deferred because
the size of
the hypervisor must be increased to provide the additional CAPI space.
- DEFERRED:
On systems using PowerVM
firmware, a problem was fixed for cable card capable PCI slots that
fail during the IPL. Hypervisor I/O Bus Interface UE B7006A84 is
reported for each cable card capable PCI slot that doesn't
contain a
PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code
#EJ05). PCI slots containing a cable card will not report an
error but
will not be functional. The problem can be resolved by performing
an
AC cycle of the system. The trigger for the failure is the I2C
devices
used to detect the cable cards are not coming out of the power on reset
process in the correct state due to a race condition.
- On systems using PowerVM firmware, a problem was fixed for
network issues, causing critical situations for customers, when an
SR-IOV logical port or vNIC is configured with a non-zero Port VLAN ID
(PVID). This fix updates adapter firmware to 10.2.252.1922, for
the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EL38,
EN0M, EN0N, EN0K, EN0L, and EL3C.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using the PowerVM firmware, a problem was fixed
for a Live Partition Mobility migration that resulted in the source
managed system going to the management console Incomplete state after
the migration to the target system was completed. This problem is
very rare and has only been detected once.. The problem trigger is that
the source partition does not halt execution after the migration to the
target system. The management console went to the
Incomplete state for the source managed system when it failed to delete
the source partition because the partition would not stop
running. When this problem occurred, the customer network was
running very slowly and this may have contributed to the failure.
The recovery action is to re-IPL the source system but that will need
to be done without the assistance of the management console. For
each partition that has a OS running on the source system, shut down
each partition from the OS. Then from the Advanced System
Management Interface (ASMI), power off the managed system.
Alternatively, the system power button may also be used to do the power
off. If the management console Incomplete state persists after
the power off, the managed system should be rebuilt from the management
console. For more information on management console recovery
steps, refer to this IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm
- On systems using PowerVM firmware, a problem was
fixed for a shared processor pool partition showing an incorrect zero
"Available Pool Processor" (APP) value after a concurrent firmware
update. The zero APP value means that no idle cycles are present
in the shared processor pool but in this case it stays zero even when
idle cycles are available. This value can be displayed using the
AIX "lparstat" command. If this problem is encountered, the
partitions in the affected shared processor pool can be dynamically
moved to a different shared processor pool. Before the dynamic
move, the "uncapped" partitions should be changed to "capped" to
avoid a system hang. The old affected pool would continue to have the
APP error until the system is re-IPLed.
- On systems using PowerVM firmware, a problem was fixed for
a latency time of about 2 seconds being added to a target Live
Partition Mobility (LPM) migration system when there is a latency time
check failure. With the fix, in the case of a latency time check
failure, a much smaller default latency is used instead of two
seconds. This error would not be noticed if the customer system
is using a NTP time server to maintain the time.
- On multi-node systems with a incorrect memory configuration
of DDR3 and DDR4 DIMMs, a problem was fixed for the IPL hanging for
four hours instead of terminating immediately.
- On systems using PowerVM firmware, a rare problem was
fixed for a system hang that can occur when dynamically moving
"uncapped" partitions to a different shared processor pool. To
prevent a system hang, the "uncapped" partitions should be changed to
"capped" before doing the move.
- On systems using the PowerVM firmware, support was added
fora new utility option for the System Management Services (SMS)
menus. This is the SMS SAS I/O Information Utility. It has
been introduced to allow an user to get additional information about
the attached SAS devices. The utility is accessed by selecting
option 3 (I/O Device Information) from the main SMS menu, and then
selecting the option for "SAS Device Information".
- On systems using the PowerVM hypervisor firmware and
Novalink, a problem was fixed for a NovaLink installation error where
the hypervisor was unable to get the maximum logical memory buffer
(LMB) size from the service processor. The maximum supported LMB
size should be 0xFFFFFFFF but in some cases it was initialized to a
value that was less than the amount of configured memory, causing the
service processor read failure with error code 0X00000134.
- On systems using the PowerVM hypervisor firmware and CAPI
adapters, a problem was fixed for CAPI adapter error recovery.
When the CAPI adapter goes into the error recovery state, the Memory
Mapped I/O (MMIO) traffic to the adapter from the OS continues,
disrupting the recovery. With the fix, the MMIO and DMA traffic
to the adapter are now frozen until the CAPI adapter is fully
recovered. If the adapter becomes unusable because of this
error, it can be recovered using concurrent maintenance steps from the
HMC, keeping the adapter in place during the repair. The error
has a low frequency since it only occurs when the adapter has failed
for another reason and needs recovery.
- On systems using the PowerVM hypervisor firmware, when
using affinity groups, if the group includes a VIOS, ensure the group
is placed in the same drawer where the VIOS physical I/O is
located. Prior to this change, if the VIOS was in an
affinity group with other partitions, the partitions placement could
over-ride the VIOS adapter placement rules and the VIOS could end up in
a different drawer from the IO adapters.
- On systems using PowerVM firmware, a problem was
fixed to improve error recovery when attempting to boot an iSCSI target
backed by a drive formatted with a block size other than 512
bytes. Instead of stopping on this error, the boot attempt fails
and then continues with the next potential boot device.
Information regarding the reason for the boot failure is available in
an error log entry. The 512 byte block size for backing devices
for iSCSI targets is a partition firmware requirement.
- On systems using PowerVM firmware, a problem was fixed for
extra resources being assigned in a Power Enterprise Pool
(PEP). This only occurs if all of these things happen:
o Power server is in a PEP pool
o Power server has PEP resources assigned to it
o Power server powered down
o User uses HMC to 'remove' resources from the powered-down
server
o Power server is then restarted. It should come up with no
PEP resources, but it starts up and shows it still is using PEP
resources it should not have.
To recover from this problem, the HMC 'remove' of the PEP resources
from the server can be performed again.
- On systems using PowerVM firmware, a problem was fixed for
a false thermal alarm in the active optical cables (AOC) for the PCIe3
expansion drawer with SRCs B7006AA6 and B7006AA7 being logged every 24
hours. The AOC cables have feature codes of #ECC6 through #ECC9,
depending on the length of the cable. The SRCs should be ignored
as they call for the replacement of the cable, cable card, or the
expansion drawer module. With the fix, the false AOC thermal
alarms are no longer reported.
- On systems using PowerVM firmware that have an attached
HMC, a problem was fixed for a Live Partition Mobility migration
that resulted in a system hang when an EEH error occurred
simultaneously with a request for a page migration operation. On
the HMC, it shows an incomplete state for the managed system with
reference code A181D000. The recovery action is to re-IPL the
source system but that will need to be done without the assistance of
the HMC. From the Advanced System Management Interface
(ASMI), power off the managed system. Alternatively, the
system power button may also be used to do the power off. If the
HMC Incomplete state persists after the power off, the managed system
should be rebuilt from the HMC. For more information on HMC
recovery steps, refer to this IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm
|